Document (#32273)

Author
Egghe, L.
Title
Untangling Herdan's law and Heaps' law : mathematical and informetric arguments
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702-709
Year
2007
Abstract
Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
Theme
Informetrie
Object
Herdan-Gesetz
Heaps-Gesetz

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.73
    4.727482 = sum of:
      4.727482 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
        4.727482 = fieldWeight in 6883, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.563971 = idf(docFreq=60, maxDocs=43254)
          0.625 = fieldNorm(doc=6883)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.73
    4.727482 = sum of:
      4.727482 = weight(author_txt:egghe in 119) [ClassicSimilarity], result of:
        4.727482 = fieldWeight in 119, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.563971 = idf(docFreq=60, maxDocs=43254)
          0.625 = fieldNorm(doc=119)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.73
    4.727482 = sum of:
      4.727482 = weight(author_txt:egghe in 2979) [ClassicSimilarity], result of:
        4.727482 = fieldWeight in 2979, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.563971 = idf(docFreq=60, maxDocs=43254)
          0.625 = fieldNorm(doc=2979)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.73
    4.727482 = sum of:
      4.727482 = weight(author_txt:egghe in 5463) [ClassicSimilarity], result of:
        4.727482 = fieldWeight in 5463, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.563971 = idf(docFreq=60, maxDocs=43254)
          0.625 = fieldNorm(doc=5463)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.73
    4.727482 = sum of:
      4.727482 = weight(author_txt:egghe in 6137) [ClassicSimilarity], result of:
        4.727482 = fieldWeight in 6137, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.563971 = idf(docFreq=60, maxDocs=43254)
          0.625 = fieldNorm(doc=6137)
    

Similar documents (content)

  1. Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.22
    0.2206271 = sum of:
      0.2206271 = product of:
        0.9192796 = sum of:
          0.0865288 = weight(abstract_txt:lotkaian in 5467) [ClassicSimilarity], result of:
            0.0865288 = score(doc=5467,freq=1.0), product of:
              0.14923663 = queryWeight, product of:
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.01608682 = queryNorm
              0.57980937 = fieldWeight in 5467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
          0.049652338 = weight(abstract_txt:increasing in 5467) [ClassicSimilarity], result of:
            0.049652338 = score(doc=5467,freq=1.0), product of:
              0.14862843 = queryWeight, product of:
                1.7285179 = boost
                5.3451242 = idf(docFreq=560, maxDocs=43254)
                0.01608682 = queryNorm
              0.33407027 = fieldWeight in 5467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3451242 = idf(docFreq=560, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
          0.014562991 = weight(abstract_txt:that in 5467) [ClassicSimilarity], result of:
            0.014562991 = score(doc=5467,freq=2.0), product of:
              0.06907031 = queryWeight, product of:
                1.7999358 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.01608682 = queryNorm
              0.210843 = fieldWeight in 5467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
          0.16923258 = weight(abstract_txt:argument in 5467) [ClassicSimilarity], result of:
            0.16923258 = score(doc=5467,freq=3.0), product of:
              0.2333947 = queryWeight, product of:
                2.1660497 = boost
                6.698111 = idf(docFreq=144, maxDocs=43254)
                0.01608682 = queryNorm
              0.72509176 = fieldWeight in 5467, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.698111 = idf(docFreq=144, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
          0.23799686 = weight(abstract_txt:laws in 5467) [ClassicSimilarity], result of:
            0.23799686 = score(doc=5467,freq=4.0), product of:
              0.2661764 = queryWeight, product of:
                2.3131707 = boost
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.01608682 = queryNorm
              0.8941321 = fieldWeight in 5467, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
          0.36130604 = weight(abstract_txt:informetric in 5467) [ClassicSimilarity], result of:
            0.36130604 = score(doc=5467,freq=2.0), product of:
              0.5252086 = queryWeight, product of:
                4.1948185 = boost
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.01608682 = queryNorm
              0.6879287 = fieldWeight in 5467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.0625 = fieldNorm(doc=5467)
        0.24 = coord(6/25)
    
  2. Ye, F.Y.: ¬A theoretical approach to the unification of informetric models by wave-heat equations (2011) 0.19
    0.18661284 = sum of:
      0.18661284 = product of:
        1.1663303 = sum of:
          0.018020783 = weight(abstract_txt:that in 929) [ClassicSimilarity], result of:
            0.018020783 = score(doc=929,freq=1.0), product of:
              0.06907031 = queryWeight, product of:
                1.7999358 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.01608682 = queryNorm
              0.2609049 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.109375 = fieldNorm(doc=929)
          0.09952946 = weight(abstract_txt:function in 929) [ClassicSimilarity], result of:
            0.09952946 = score(doc=929,freq=1.0), product of:
              0.16271132 = queryWeight, product of:
                1.8085554 = boost
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.01608682 = queryNorm
              0.6116935 = fieldWeight in 929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.109375 = fieldNorm(doc=929)
          0.4164945 = weight(abstract_txt:laws in 929) [ClassicSimilarity], result of:
            0.4164945 = score(doc=929,freq=4.0), product of:
              0.2661764 = queryWeight, product of:
                2.3131707 = boost
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.01608682 = queryNorm
              1.5647311 = fieldWeight in 929, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.109375 = fieldNorm(doc=929)
          0.6322856 = weight(abstract_txt:informetric in 929) [ClassicSimilarity], result of:
            0.6322856 = score(doc=929,freq=2.0), product of:
              0.5252086 = queryWeight, product of:
                4.1948185 = boost
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.01608682 = queryNorm
              1.2038752 = fieldWeight in 929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.109375 = fieldNorm(doc=929)
        0.16 = coord(4/25)
    
  3. Burrell, Q.L.: "Ambiguity" ans scientometric measurement : a dissenting view (2001) 0.13
    0.12662634 = sum of:
      0.12662634 = product of:
        0.7914147 = sum of:
          0.025743974 = weight(abstract_txt:that in 1982) [ClassicSimilarity], result of:
            0.025743974 = score(doc=1982,freq=4.0), product of:
              0.06907031 = queryWeight, product of:
                1.7999358 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.01608682 = queryNorm
              0.37272128 = fieldWeight in 1982, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.078125 = fieldNorm(doc=1982)
          0.103676654 = weight(abstract_txt:mathematical in 1982) [ClassicSimilarity], result of:
            0.103676654 = score(doc=1982,freq=1.0), product of:
              0.20924546 = queryWeight, product of:
                2.0509305 = boost
                6.3421264 = idf(docFreq=206, maxDocs=43254)
                0.01608682 = queryNorm
              0.49547863 = fieldWeight in 1982, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3421264 = idf(docFreq=206, maxDocs=43254)
                0.078125 = fieldNorm(doc=1982)
          0.2103615 = weight(abstract_txt:laws in 1982) [ClassicSimilarity], result of:
            0.2103615 = score(doc=1982,freq=2.0), product of:
              0.2661764 = queryWeight, product of:
                2.3131707 = boost
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.01608682 = queryNorm
              0.7903086 = fieldWeight in 1982, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1530566 = idf(docFreq=91, maxDocs=43254)
                0.078125 = fieldNorm(doc=1982)
          0.45163256 = weight(abstract_txt:informetric in 1982) [ClassicSimilarity], result of:
            0.45163256 = score(doc=1982,freq=2.0), product of:
              0.5252086 = queryWeight, product of:
                4.1948185 = boost
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.01608682 = queryNorm
              0.85991085 = fieldWeight in 1982, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.078125 = fieldNorm(doc=1982)
        0.16 = coord(4/25)
    
  4. Egghe, L.; Rousseau, R.: ¬The Hirsch index of a shifted Lotka function and its relation with the impact factor (2012) 0.11
    0.1149017 = sum of:
      0.1149017 = product of:
        0.47875708 = sum of:
          0.03495924 = weight(abstract_txt:sources in 1708) [ClassicSimilarity], result of:
            0.03495924 = score(doc=1708,freq=1.0), product of:
              0.07841975 = queryWeight, product of:
                1.0251561 = boost
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.01608682 = queryNorm
              0.44579637 = fieldWeight in 1708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
          0.060900256 = weight(abstract_txt:total in 1708) [ClassicSimilarity], result of:
            0.060900256 = score(doc=1708,freq=1.0), product of:
              0.11353512 = queryWeight, product of:
                1.2335092 = boost
                5.7216015 = idf(docFreq=384, maxDocs=43254)
                0.01608682 = queryNorm
              0.53640014 = fieldWeight in 1708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7216015 = idf(docFreq=384, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
          0.105328515 = weight(abstract_txt:increasing in 1708) [ClassicSimilarity], result of:
            0.105328515 = score(doc=1708,freq=2.0), product of:
              0.14862843 = queryWeight, product of:
                1.7285179 = boost
                5.3451242 = idf(docFreq=560, maxDocs=43254)
                0.01608682 = queryNorm
              0.70867 = fieldWeight in 1708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3451242 = idf(docFreq=560, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
          0.021844488 = weight(abstract_txt:that in 1708) [ClassicSimilarity], result of:
            0.021844488 = score(doc=1708,freq=2.0), product of:
              0.06907031 = queryWeight, product of:
                1.7999358 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.01608682 = queryNorm
              0.3162645 = fieldWeight in 1708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
          0.08510267 = weight(abstract_txt:items in 1708) [ClassicSimilarity], result of:
            0.08510267 = score(doc=1708,freq=1.0), product of:
              0.16244638 = queryWeight, product of:
                1.8070823 = boost
                5.5880704 = idf(docFreq=439, maxDocs=43254)
                0.01608682 = queryNorm
              0.5238816 = fieldWeight in 1708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5880704 = idf(docFreq=439, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
          0.17062192 = weight(abstract_txt:function in 1708) [ClassicSimilarity], result of:
            0.17062192 = score(doc=1708,freq=4.0), product of:
              0.16271132 = queryWeight, product of:
                1.8085554 = boost
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.01608682 = queryNorm
              1.0486174 = fieldWeight in 1708, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.09375 = fieldNorm(doc=1708)
        0.24 = coord(6/25)
    
  5. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.11
    0.113330536 = sum of:
      0.113330536 = product of:
        0.4722106 = sum of:
          0.108161 = weight(abstract_txt:lotkaian in 5465) [ClassicSimilarity], result of:
            0.108161 = score(doc=5465,freq=1.0), product of:
              0.14923663 = queryWeight, product of:
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.01608682 = queryNorm
              0.7247617 = fieldWeight in 5465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
          0.041199863 = weight(abstract_txt:sources in 5465) [ClassicSimilarity], result of:
            0.041199863 = score(doc=5465,freq=2.0), product of:
              0.07841975 = queryWeight, product of:
                1.0251561 = boost
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.01608682 = queryNorm
              0.5253761 = fieldWeight in 5465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
          0.030211259 = weight(abstract_txt:practical in 5465) [ClassicSimilarity], result of:
            0.030211259 = score(doc=5465,freq=1.0), product of:
              0.08034352 = queryWeight, product of:
                1.0376544 = boost
                4.8131337 = idf(docFreq=954, maxDocs=43254)
                0.01608682 = queryNorm
              0.37602606 = fieldWeight in 5465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8131337 = idf(docFreq=954, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
          0.01820374 = weight(abstract_txt:that in 5465) [ClassicSimilarity], result of:
            0.01820374 = score(doc=5465,freq=2.0), product of:
              0.06907031 = queryWeight, product of:
                1.7999358 = boost
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.01608682 = queryNorm
              0.26355374 = fieldWeight in 5465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3854163 = idf(docFreq=10822, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
          0.10029445 = weight(abstract_txt:items in 5465) [ClassicSimilarity], result of:
            0.10029445 = score(doc=5465,freq=2.0), product of:
              0.16244638 = queryWeight, product of:
                1.8070823 = boost
                5.5880704 = idf(docFreq=439, maxDocs=43254)
                0.01608682 = queryNorm
              0.61740035 = fieldWeight in 5465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5880704 = idf(docFreq=439, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
          0.17414026 = weight(abstract_txt:function in 5465) [ClassicSimilarity], result of:
            0.17414026 = score(doc=5465,freq=6.0), product of:
              0.16271132 = queryWeight, product of:
                1.8085554 = boost
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.01608682 = queryNorm
              1.0702406 = fieldWeight in 5465, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.592626 = idf(docFreq=437, maxDocs=43254)
                0.078125 = fieldNorm(doc=5465)
        0.24 = coord(6/25)