Document (#32272)

Egghe, L.
Untangling Herdan's law and Heaps' law : mathematical and informetric arguments
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702-709
Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.

Similar documents (author)

  1. Egghe, L.: Little science, big science and beyond (1994) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 6883) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 6883, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=6883)
  2. Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 7119) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 7119, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=7119)
  3. Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 1910) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 1910, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=1910)
  4. Egghe, L.: ¬The amount of actions needed for shelving and reshelving (1996) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 4394) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 4394, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=4394)
  5. Egghe, L.: Special features of the author - publication relationship and a new explanation of Lotka's law based on convolution theory (1994) 4.74
    4.741258 = sum of:
      4.741258 = weight(author_txt:egghe in 5068) [ClassicSimilarity], result of:
        4.741258 = fieldWeight in 5068, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5860133 = idf(docFreq=60, maxDocs=44218)
          0.625 = fieldNorm(doc=5068)

Similar documents (content)

  1. Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.22
    0.22080104 = sum of:
      0.22080104 = product of:
        0.92000437 = sum of:
          0.08732251 = weight(abstract_txt:lotkaian in 3466) [ClassicSimilarity], result of:
            0.08732251 = score(doc=3466,freq=1.0), product of:
              0.15024856 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.01615751 = queryNorm
              0.581187 = fieldWeight in 3466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
          0.049486402 = weight(abstract_txt:increasing in 3466) [ClassicSimilarity], result of:
            0.049486402 = score(doc=3466,freq=1.0), product of:
              0.14839657 = queryWeight, product of:
                1.7213429 = boost
                5.3355846 = idf(docFreq=578, maxDocs=44218)
                0.01615751 = queryNorm
              0.33347404 = fieldWeight in 3466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3355846 = idf(docFreq=578, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
          0.014301713 = weight(abstract_txt:that in 3466) [ClassicSimilarity], result of:
            0.014301713 = score(doc=3466,freq=2.0), product of:
              0.06828745 = queryWeight, product of:
                1.7836692 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01615751 = queryNorm
              0.20943399 = fieldWeight in 3466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
          0.16524583 = weight(abstract_txt:argument in 3466) [ClassicSimilarity], result of:
            0.16524583 = score(doc=3466,freq=3.0), product of:
              0.22986871 = queryWeight, product of:
                2.1423745 = boost
                6.640641 = idf(docFreq=156, maxDocs=44218)
                0.01615751 = queryNorm
              0.71887046 = fieldWeight in 3466, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.640641 = idf(docFreq=156, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
          0.23852968 = weight(abstract_txt:laws in 3466) [ClassicSimilarity], result of:
            0.23852968 = score(doc=3466,freq=4.0), product of:
              0.26675233 = queryWeight, product of:
                2.3078606 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.01615751 = queryNorm
              0.8941991 = fieldWeight in 3466, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
          0.36511824 = weight(abstract_txt:informetric in 3466) [ClassicSimilarity], result of:
            0.36511824 = score(doc=3466,freq=2.0), product of:
              0.5292512 = queryWeight, product of:
                4.1967278 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.01615751 = queryNorm
              0.689877 = fieldWeight in 3466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.0625 = fieldNorm(doc=3466)
        0.24 = coord(6/25)
  2. Ye, F.Y.: ¬A theoretical approach to the unification of informetric models by wave-heat equations (2011) 0.19
    0.18767291 = sum of:
      0.18767291 = product of:
        1.1729558 = sum of:
          0.017697467 = weight(abstract_txt:that in 4464) [ClassicSimilarity], result of:
            0.017697467 = score(doc=4464,freq=1.0), product of:
              0.06828745 = queryWeight, product of:
                1.7836692 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01615751 = queryNorm
              0.25916135 = fieldWeight in 4464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.109375 = fieldNorm(doc=4464)
          0.098874405 = weight(abstract_txt:function in 4464) [ClassicSimilarity], result of:
            0.098874405 = score(doc=4464,freq=1.0), product of:
              0.16210528 = queryWeight, product of:
                1.7990948 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.01615751 = queryNorm
              0.60993946 = fieldWeight in 4464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.109375 = fieldNorm(doc=4464)
          0.41742697 = weight(abstract_txt:laws in 4464) [ClassicSimilarity], result of:
            0.41742697 = score(doc=4464,freq=4.0), product of:
              0.26675233 = queryWeight, product of:
                2.3078606 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.01615751 = queryNorm
              1.5648484 = fieldWeight in 4464, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.109375 = fieldNorm(doc=4464)
          0.6389569 = weight(abstract_txt:informetric in 4464) [ClassicSimilarity], result of:
            0.6389569 = score(doc=4464,freq=2.0), product of:
              0.5292512 = queryWeight, product of:
                4.1967278 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.01615751 = queryNorm
              1.2072847 = fieldWeight in 4464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.109375 = fieldNorm(doc=4464)
        0.16 = coord(4/25)
  3. Burrell, Q.L.: "Ambiguity" ans scientometric measurement : a dissenting view (2001) 0.13
    0.12748387 = sum of:
      0.12748387 = product of:
        0.79677427 = sum of:
          0.025282096 = weight(abstract_txt:that in 6981) [ClassicSimilarity], result of:
            0.025282096 = score(doc=6981,freq=4.0), product of:
              0.06828745 = queryWeight, product of:
                1.7836692 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01615751 = queryNorm
              0.3702305 = fieldWeight in 6981, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=6981)
          0.104261935 = weight(abstract_txt:mathematical in 6981) [ClassicSimilarity], result of:
            0.104261935 = score(doc=6981,freq=1.0), product of:
              0.21017309 = queryWeight, product of:
                2.048538 = boost
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.01615751 = queryNorm
              0.49607652 = fieldWeight in 6981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.078125 = fieldNorm(doc=6981)
          0.21083245 = weight(abstract_txt:laws in 6981) [ClassicSimilarity], result of:
            0.21083245 = score(doc=6981,freq=2.0), product of:
              0.26675233 = queryWeight, product of:
                2.3078606 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.01615751 = queryNorm
              0.7903678 = fieldWeight in 6981, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.078125 = fieldNorm(doc=6981)
          0.4563978 = weight(abstract_txt:informetric in 6981) [ClassicSimilarity], result of:
            0.4563978 = score(doc=6981,freq=2.0), product of:
              0.5292512 = queryWeight, product of:
                4.1967278 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.01615751 = queryNorm
              0.86234623 = fieldWeight in 6981, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.078125 = fieldNorm(doc=6981)
        0.16 = coord(4/25)
  4. Egghe, L.; Rousseau, R.: ¬The Hirsch index of a shifted Lotka function and its relation with the impact factor (2012) 0.11
    0.11428643 = sum of:
      0.11428643 = product of:
        0.47619346 = sum of:
          0.034829147 = weight(abstract_txt:sources in 243) [ClassicSimilarity], result of:
            0.034829147 = score(doc=243,freq=1.0), product of:
              0.07827755 = queryWeight, product of:
                1.0207714 = boost
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.01615751 = queryNorm
              0.44494426 = fieldWeight in 243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
          0.060586352 = weight(abstract_txt:total in 243) [ClassicSimilarity], result of:
            0.060586352 = score(doc=243,freq=1.0), product of:
              0.113220535 = queryWeight, product of:
                1.227644 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.01615751 = queryNorm
              0.53511804 = fieldWeight in 243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
          0.10497651 = weight(abstract_txt:increasing in 243) [ClassicSimilarity], result of:
            0.10497651 = score(doc=243,freq=2.0), product of:
              0.14839657 = queryWeight, product of:
                1.7213429 = boost
                5.3355846 = idf(docFreq=578, maxDocs=44218)
                0.01615751 = queryNorm
              0.70740527 = fieldWeight in 243, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3355846 = idf(docFreq=578, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
          0.021452568 = weight(abstract_txt:that in 243) [ClassicSimilarity], result of:
            0.021452568 = score(doc=243,freq=2.0), product of:
              0.06828745 = queryWeight, product of:
                1.7836692 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01615751 = queryNorm
              0.314151 = fieldWeight in 243, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
          0.169499 = weight(abstract_txt:function in 243) [ClassicSimilarity], result of:
            0.169499 = score(doc=243,freq=4.0), product of:
              0.16210528 = queryWeight, product of:
                1.7990948 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.01615751 = queryNorm
              1.0456105 = fieldWeight in 243, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
          0.08484986 = weight(abstract_txt:items in 243) [ClassicSimilarity], result of:
            0.08484986 = score(doc=243,freq=1.0), product of:
              0.16223323 = queryWeight, product of:
                1.7998047 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.01615751 = queryNorm
              0.52301157 = fieldWeight in 243, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.09375 = fieldNorm(doc=243)
        0.24 = coord(6/25)
  5. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.11
    0.11304037 = sum of:
      0.11304037 = product of:
        0.47100157 = sum of:
          0.10915314 = weight(abstract_txt:lotkaian in 3464) [ClassicSimilarity], result of:
            0.10915314 = score(doc=3464,freq=1.0), product of:
              0.15024856 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.01615751 = queryNorm
              0.72648376 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.041046545 = weight(abstract_txt:sources in 3464) [ClassicSimilarity], result of:
            0.041046545 = score(doc=3464,freq=2.0), product of:
              0.07827755 = queryWeight, product of:
                1.0207714 = boost
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.01615751 = queryNorm
              0.52437186 = fieldWeight in 3464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.029934024 = weight(abstract_txt:practical in 3464) [ClassicSimilarity], result of:
            0.029934024 = score(doc=3464,freq=1.0), product of:
              0.0799048 = queryWeight, product of:
                1.0313268 = boost
                4.79515 = idf(docFreq=993, maxDocs=44218)
                0.01615751 = queryNorm
              0.3746211 = fieldWeight in 3464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.79515 = idf(docFreq=993, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.017877141 = weight(abstract_txt:that in 3464) [ClassicSimilarity], result of:
            0.017877141 = score(doc=3464,freq=2.0), product of:
              0.06828745 = queryWeight, product of:
                1.7836692 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01615751 = queryNorm
              0.26179248 = fieldWeight in 3464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.17299418 = weight(abstract_txt:function in 3464) [ClassicSimilarity], result of:
            0.17299418 = score(doc=3464,freq=6.0), product of:
              0.16210528 = queryWeight, product of:
                1.7990948 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.01615751 = queryNorm
              1.0671718 = fieldWeight in 3464, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
          0.09999652 = weight(abstract_txt:items in 3464) [ClassicSimilarity], result of:
            0.09999652 = score(doc=3464,freq=2.0), product of:
              0.16223323 = queryWeight, product of:
                1.7998047 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.01615751 = queryNorm
              0.6163751 = fieldWeight in 3464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=3464)
        0.24 = coord(6/25)