Document (#32273)

Author
Egghe, L.
Title
Untangling Herdan's law and Heaps' law : mathematical and informetric arguments
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702-709
Year
2007
Abstract
Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
Theme
Informetrie
Object
Herdan-Gesetz
Heaps-Gesetz

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 6883, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=6883)
    
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 7119) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 7119, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=7119)
    
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 1979) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 1979, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=1979)
    
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 4463) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 4463, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=4463)
    
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.71
    4.7136316 = sum of:
      4.7136316 = weight(author_txt:egghe in 5137) [ClassicSimilarity], result of:
        4.7136316 = fieldWeight in 5137, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5418105 = idf(docFreq=60, maxDocs=42306)
          0.625 = fieldNorm(doc=5137)
    

Similar documents (content)

  1. Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.22
    0.22159392 = sum of:
      0.22159392 = product of:
        0.923308 = sum of:
          0.086317144 = weight(abstract_txt:lotkaian in 4467) [ClassicSimilarity], result of:
            0.086317144 = score(doc=4467,freq=1.0), product of:
              0.14922807 = queryWeight, product of:
                1.0493588 = boost
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.015365968 = queryNorm
              0.57842433 = fieldWeight in 4467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
          0.050284527 = weight(abstract_txt:increasing in 4467) [ClassicSimilarity], result of:
            0.050284527 = score(doc=4467,freq=1.0), product of:
              0.15012366 = queryWeight, product of:
                1.8229887 = boost
                5.359265 = idf(docFreq=540, maxDocs=42306)
                0.015365968 = queryNorm
              0.33495405 = fieldWeight in 4467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.359265 = idf(docFreq=540, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
          0.014992617 = weight(abstract_txt:that in 4467) [ClassicSimilarity], result of:
            0.014992617 = score(doc=4467,freq=2.0), product of:
              0.07053318 = queryWeight, product of:
                1.90873 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015365968 = queryNorm
              0.2125612 = fieldWeight in 4467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
          0.17268226 = weight(abstract_txt:argument in 4467) [ClassicSimilarity], result of:
            0.17268226 = score(doc=4467,freq=3.0), product of:
              0.23692867 = queryWeight, product of:
                2.2901726 = boost
                6.732703 = idf(docFreq=136, maxDocs=42306)
                0.015365968 = queryNorm
              0.7288365 = fieldWeight in 4467, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.732703 = idf(docFreq=136, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
          0.23910603 = weight(abstract_txt:laws in 4467) [ClassicSimilarity], result of:
            0.23910603 = score(doc=4467,freq=4.0), product of:
              0.26742372 = queryWeight, product of:
                2.4330966 = boost
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.015365968 = queryNorm
              0.8941093 = fieldWeight in 4467, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
          0.35992548 = weight(abstract_txt:informetric in 4467) [ClassicSimilarity], result of:
            0.35992548 = score(doc=4467,freq=2.0), product of:
              0.52469575 = queryWeight, product of:
                4.3998466 = boost
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.015365968 = queryNorm
              0.6859699 = fieldWeight in 4467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.0625 = fieldNorm(doc=4467)
        0.24 = coord(6/25)
    
  2. Ye, F.Y.: ¬A theoretical approach to the unification of informetric models by wave-heat equations (2011) 0.19
    0.18676586 = sum of:
      0.18676586 = product of:
        1.1672866 = sum of:
          0.10042902 = weight(abstract_txt:function in 1465) [ClassicSimilarity], result of:
            0.10042902 = score(doc=1465,freq=1.0), product of:
              0.16394833 = queryWeight, product of:
                1.9050786 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.015365968 = queryNorm
              0.6125651 = fieldWeight in 1465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.109375 = fieldNorm(doc=1465)
          0.018552417 = weight(abstract_txt:that in 1465) [ClassicSimilarity], result of:
            0.018552417 = score(doc=1465,freq=1.0), product of:
              0.07053318 = queryWeight, product of:
                1.90873 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015365968 = queryNorm
              0.26303107 = fieldWeight in 1465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.109375 = fieldNorm(doc=1465)
          0.41843557 = weight(abstract_txt:laws in 1465) [ClassicSimilarity], result of:
            0.41843557 = score(doc=1465,freq=4.0), product of:
              0.26742372 = queryWeight, product of:
                2.4330966 = boost
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.015365968 = queryNorm
              1.5646913 = fieldWeight in 1465, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.109375 = fieldNorm(doc=1465)
          0.62986964 = weight(abstract_txt:informetric in 1465) [ClassicSimilarity], result of:
            0.62986964 = score(doc=1465,freq=2.0), product of:
              0.52469575 = queryWeight, product of:
                4.3998466 = boost
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.015365968 = queryNorm
              1.2004473 = fieldWeight in 1465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.109375 = fieldNorm(doc=1465)
        0.16 = coord(4/25)
    
  3. Burrell, Q.L.: "Ambiguity" ans scientometric measurement : a dissenting view (2001) 0.13
    0.12668632 = sum of:
      0.12668632 = product of:
        0.79178953 = sum of:
          0.026503455 = weight(abstract_txt:that in 982) [ClassicSimilarity], result of:
            0.026503455 = score(doc=982,freq=4.0), product of:
              0.07053318 = queryWeight, product of:
                1.90873 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015365968 = queryNorm
              0.37575868 = fieldWeight in 982, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=982)
          0.10403734 = weight(abstract_txt:mathematical in 982) [ClassicSimilarity], result of:
            0.10403734 = score(doc=982,freq=1.0), product of:
              0.21006113 = queryWeight, product of:
                2.1564145 = boost
                6.339478 = idf(docFreq=202, maxDocs=42306)
                0.015365968 = queryNorm
              0.4952717 = fieldWeight in 982, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.339478 = idf(docFreq=202, maxDocs=42306)
                0.078125 = fieldNorm(doc=982)
          0.21134187 = weight(abstract_txt:laws in 982) [ClassicSimilarity], result of:
            0.21134187 = score(doc=982,freq=2.0), product of:
              0.26742372 = queryWeight, product of:
                2.4330966 = boost
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.015365968 = queryNorm
              0.79028845 = fieldWeight in 982, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1528745 = idf(docFreq=89, maxDocs=42306)
                0.078125 = fieldNorm(doc=982)
          0.44990686 = weight(abstract_txt:informetric in 982) [ClassicSimilarity], result of:
            0.44990686 = score(doc=982,freq=2.0), product of:
              0.52469575 = queryWeight, product of:
                4.3998466 = boost
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.015365968 = queryNorm
              0.85746235 = fieldWeight in 982, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.760864 = idf(docFreq=48, maxDocs=42306)
                0.078125 = fieldNorm(doc=982)
        0.16 = coord(4/25)
    
  4. Egghe, L.; Rousseau, R.: ¬The Hirsch index of a shifted Lotka function and its relation with the impact factor (2012) 0.12
    0.11602949 = sum of:
      0.11602949 = product of:
        0.48345622 = sum of:
          0.03541452 = weight(abstract_txt:sources in 2244) [ClassicSimilarity], result of:
            0.03541452 = score(doc=2244,freq=1.0), product of:
              0.079223834 = queryWeight, product of:
                1.0812888 = boost
                4.7681975 = idf(docFreq=976, maxDocs=42306)
                0.015365968 = queryNorm
              0.4470185 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7681975 = idf(docFreq=976, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
          0.061494175 = weight(abstract_txt:total in 2244) [ClassicSimilarity], result of:
            0.061494175 = score(doc=2244,freq=1.0), product of:
              0.114452235 = queryWeight, product of:
                1.2996485 = boost
                5.731106 = idf(docFreq=372, maxDocs=42306)
                0.015365968 = queryNorm
              0.53729117 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.731106 = idf(docFreq=372, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
          0.10666959 = weight(abstract_txt:increasing in 2244) [ClassicSimilarity], result of:
            0.10666959 = score(doc=2244,freq=2.0), product of:
              0.15012366 = queryWeight, product of:
                1.8229887 = boost
                5.359265 = idf(docFreq=540, maxDocs=42306)
                0.015365968 = queryNorm
              0.7105448 = fieldWeight in 2244, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.359265 = idf(docFreq=540, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
          0.08522498 = weight(abstract_txt:items in 2244) [ClassicSimilarity], result of:
            0.08522498 = score(doc=2244,freq=1.0), product of:
              0.16285834 = queryWeight, product of:
                1.8987352 = boost
                5.5819464 = idf(docFreq=432, maxDocs=42306)
                0.015365968 = queryNorm
              0.52330744 = fieldWeight in 2244, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5819464 = idf(docFreq=432, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
          0.17216402 = weight(abstract_txt:function in 2244) [ClassicSimilarity], result of:
            0.17216402 = score(doc=2244,freq=4.0), product of:
              0.16394833 = queryWeight, product of:
                1.9050786 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.015365968 = queryNorm
              1.0501115 = fieldWeight in 2244, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
          0.022488927 = weight(abstract_txt:that in 2244) [ClassicSimilarity], result of:
            0.022488927 = score(doc=2244,freq=2.0), product of:
              0.07053318 = queryWeight, product of:
                1.90873 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015365968 = queryNorm
              0.31884181 = fieldWeight in 2244, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.09375 = fieldNorm(doc=2244)
        0.24 = coord(6/25)
    
  5. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.11
    0.11409057 = sum of:
      0.11409057 = product of:
        0.47537738 = sum of:
          0.10789643 = weight(abstract_txt:lotkaian in 4465) [ClassicSimilarity], result of:
            0.10789643 = score(doc=4465,freq=1.0), product of:
              0.14922807 = queryWeight, product of:
                1.0493588 = boost
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.015365968 = queryNorm
              0.72303045 = fieldWeight in 4465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
          0.041736413 = weight(abstract_txt:sources in 4465) [ClassicSimilarity], result of:
            0.041736413 = score(doc=4465,freq=2.0), product of:
              0.079223834 = queryWeight, product of:
                1.0812888 = boost
                4.7681975 = idf(docFreq=976, maxDocs=42306)
                0.015365968 = queryNorm
              0.52681637 = fieldWeight in 4465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7681975 = idf(docFreq=976, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
          0.03085097 = weight(abstract_txt:practical in 4465) [ClassicSimilarity], result of:
            0.03085097 = score(doc=4465,freq=1.0), product of:
              0.08160216 = queryWeight, product of:
                1.0973991 = boost
                4.8392396 = idf(docFreq=909, maxDocs=42306)
                0.015365968 = queryNorm
              0.3780656 = fieldWeight in 4465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8392396 = idf(docFreq=909, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
          0.10043861 = weight(abstract_txt:items in 4465) [ClassicSimilarity], result of:
            0.10043861 = score(doc=4465,freq=2.0), product of:
              0.16285834 = queryWeight, product of:
                1.8987352 = boost
                5.5819464 = idf(docFreq=432, maxDocs=42306)
                0.015365968 = queryNorm
              0.6167238 = fieldWeight in 4465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5819464 = idf(docFreq=432, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
          0.1757142 = weight(abstract_txt:function in 4465) [ClassicSimilarity], result of:
            0.1757142 = score(doc=4465,freq=6.0), product of:
              0.16394833 = queryWeight, product of:
                1.9050786 = boost
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.015365968 = queryNorm
              1.0717657 = fieldWeight in 4465, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.600595 = idf(docFreq=424, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
          0.018740771 = weight(abstract_txt:that in 4465) [ClassicSimilarity], result of:
            0.018740771 = score(doc=4465,freq=2.0), product of:
              0.07053318 = queryWeight, product of:
                1.90873 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.015365968 = queryNorm
              0.2657015 = fieldWeight in 4465, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=4465)
        0.24 = coord(6/25)