Search (34 results, page 1 of 2)

Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.03

0.0252955 = product of:
  0.06323875 = sum of:
    0.009138121 = weight(_text_:a in 2558) [ClassicSimilarity], result of:
      0.009138121 = score(doc=2558,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1709182 = fieldWeight in 2558, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2558)
    0.054100625 = sum of:
      0.016407004 = weight(_text_:information in 2558) [ClassicSimilarity], result of:
        0.016407004 = score(doc=2558,freq=6.0), product of:
          0.08139861 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.046368346 = queryNorm
          0.20156369 = fieldWeight in 2558, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.046875 = fieldNorm(doc=2558)
      0.037693623 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
        0.037693623 = score(doc=2558,freq=2.0), product of:
          0.16237405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046368346 = queryNorm
          0.23214069 = fieldWeight in 2558, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2558)
  0.4 = coord(2/5)

Abstract: The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.
Date: 14. 8.2004 19:17:22
Source: Information processing and management. 40(2004) no.1, S.21-30
Type: a

Egghe, L.: Untangling Herdan's law and Heaps' law : mathematical and informetric arguments (2007) 0.01
```
0.008012165 = product of:
  0.020030413 = sum of:
    0.014448637 = weight(_text_:a in 271) [ClassicSimilarity], result of:
      0.014448637 = score(doc=271,freq=36.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.27024537 = fieldWeight in 271, product of:
          6.0 = tf(freq=36.0), with freq of:
            36.0 = termFreq=36.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=271)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 271) [ClassicSimilarity], result of:
          0.011163551 = score(doc=271,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 271, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=271)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702-709

Type

a

Egghe, L.: ¬A rationale for the Hirsch-index rank-order distribution and a comparison with the impact factor rank-order distribution (2009) 0.01

0.007931639 = product of:
  0.019829098 = sum of:
    0.014303422 = weight(_text_:a in 3124) [ClassicSimilarity], result of:
      0.014303422 = score(doc=3124,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.26752928 = fieldWeight in 3124, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3124)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 3124) [ClassicSimilarity], result of:
          0.011051352 = score(doc=3124,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 3124, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3124)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: We present a rationale for the Hirsch-index rank-order distribution and prove that it is a power law (hence a straight line in the log-log scale). This is confirmed by experimental data of Pyykkö and by data produced in this article on 206 mathematics journals. This distribution is of a completely different nature than the impact factor (IF) rank-order distribution which (as proved in a previous article) is S-shaped. This is also confirmed by our example. Only in the log-log scale of the h-index distribution do we notice a concave deviation of the straight line for higher ranks. This phenomenon is discussed.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.10, S.2142-2144
Type: a

Egghe, L.: Dynamic h-index : the Hirsch index in function of time (2007) 0.01

0.007864855 = product of:
  0.019662138 = sum of:
    0.013347079 = weight(_text_:a in 147) [ClassicSimilarity], result of:
      0.013347079 = score(doc=147,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.24964198 = fieldWeight in 147, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=147)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 147) [ClassicSimilarity], result of:
          0.012630116 = score(doc=147,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 147, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=147)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: When there are a group of articles and the present time is fixed we can determine the unique number h being the number of articles that received h or more citations while the other articles received a number of citations which is not larger than h. In this article, the time dependence of the h-index is determined. This is important to describe the expected career evolution of a scientist's work or of a journal's production in a fixed year.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.3, S.452-454
Type: a

Egghe, L.; Liang, L.; Rousseau, R.: ¬A relation between h-index and impact factor in the power-law model (2009) 0.01

0.007399688 = product of:
  0.01849922 = sum of:
    0.012184162 = weight(_text_:a in 6759) [ClassicSimilarity], result of:
      0.012184162 = score(doc=6759,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22789092 = fieldWeight in 6759, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=6759)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 6759) [ClassicSimilarity], result of:
          0.012630116 = score(doc=6759,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 6759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=6759)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Using a power-law model, the two best-known topics in citation analysis, namely the impact factor and the Hirsch index, are unified into one relation (not a function). The validity of our model is, at least in a qualitative way, confirmed by real data.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.11, S.2362-2365
Type: a

Egghe, L.: Expansion of the field of informetrics : the second special issue (2006) 0.01

0.007058388 = product of:
  0.01764597 = sum of:
    0.008173384 = weight(_text_:a in 7119) [ClassicSimilarity], result of:
      0.008173384 = score(doc=7119,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 7119, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=7119)
    0.009472587 = product of:
      0.018945174 = sum of:
        0.018945174 = weight(_text_:information in 7119) [ClassicSimilarity], result of:
          0.018945174 = score(doc=7119,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23274569 = fieldWeight in 7119, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=7119)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 42(2006) no.6, S.1405-1407
Type: a

Egghe, L.: Expansion of the field of informetrics : origins and consequences (2005) 0.01

0.007058388 = product of:
  0.01764597 = sum of:
    0.008173384 = weight(_text_:a in 1910) [ClassicSimilarity], result of:
      0.008173384 = score(doc=1910,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 1910, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.09375 = fieldNorm(doc=1910)
    0.009472587 = product of:
      0.018945174 = sum of:
        0.018945174 = weight(_text_:information in 1910) [ClassicSimilarity], result of:
          0.018945174 = score(doc=1910,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.23274569 = fieldWeight in 1910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=1910)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 41(2005) no.6, S.1311-1316
Type: a

Egghe, L.: ¬The influence of transformations on the h-index and the g-index (2008) 0.01

0.0069400403 = product of:
  0.0173501 = sum of:
    0.009535614 = weight(_text_:a in 1881) [ClassicSimilarity], result of:
      0.009535614 = score(doc=1881,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17835285 = fieldWeight in 1881, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1881)
    0.007814486 = product of:
      0.015628971 = sum of:
        0.015628971 = weight(_text_:information in 1881) [ClassicSimilarity], result of:
          0.015628971 = score(doc=1881,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1920054 = fieldWeight in 1881, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1881)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: In a previous article, we introduced a general transformation on sources and one on items in an arbitrary information production process (IPP). In this article, we investigate the influence of these transformations on the h-index and on the g-index. General formulae that describe this influence are presented. These are applied to the case that the size-frequency function is Lotkaian (i.e., is a decreasing power function). We further show that the h-index of the transformed IPP belongs to the interval bounded by the two transformations of the h-index of the original IPP, and we also show that this property is not true for the g-index.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.8, S.1304-1312
Type: a

Egghe, L.; Rousseau, R.: ¬The influence of publication delays on the observed aging distribution of scientific literature (2000) 0.01

0.0068851607 = product of:
  0.017212901 = sum of:
    0.010897844 = weight(_text_:a in 4385) [ClassicSimilarity], result of:
      0.010897844 = score(doc=4385,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20383182 = fieldWeight in 4385, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=4385)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 4385) [ClassicSimilarity], result of:
          0.012630116 = score(doc=4385,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 4385, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=4385)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Observed aging curves are influenced by publication delays. In this article, we show how the 'undisturbed' aging function and the publication delay combine to give the observed aging function. This combination is performed by a mathematical operation known as convolution. Examples are given, such as the convolution of 2 Poisson distributions, 2 exponential distributions, a 2 lognormal distributions. A paradox is observed between theory and real data
Source: Journal of the American Society for Information Science. 51(2000) no.2, S.158-165
Type: a

Egghe, L.; Rousseau, R.: ¬An h-index weighted by citation impact (2008) 0.01

0.0068851607 = product of:
  0.017212901 = sum of:
    0.010897844 = weight(_text_:a in 695) [ClassicSimilarity], result of:
      0.010897844 = score(doc=695,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20383182 = fieldWeight in 695, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=695)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 695) [ClassicSimilarity], result of:
          0.012630116 = score(doc=695,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 695, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=695)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: An h-type index is proposed which depends on the obtained citations of articles belonging to the h-core. This weighted h-index, denoted as hw, is presented in a continuous setting and in a discrete one. It is shown that in a continuous setting the new index enjoys many good properties. In the discrete setting some small deviations from the ideal may occur.
Source: Information processing and management. 44(2008) no.2, S.770-780
Type: a

Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.01

0.0067985477 = product of:
  0.016996369 = sum of:
    0.012260076 = weight(_text_:a in 2708) [ClassicSimilarity], result of:
      0.012260076 = score(doc=2708,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.22931081 = fieldWeight in 2708, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2708)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 2708) [ClassicSimilarity], result of:
          0.009472587 = score(doc=2708,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 2708, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2708)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the trajectories of the form [X]=a[Y], where a > 0 is a constant and [·] denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.2, S.232-239
Type: a

Egghe, L.: Existence theorem of the quadruple (P, R, F, M) : precision, recall, fallout and miss (2007) 0.01

0.006550755 = product of:
  0.016376887 = sum of:
    0.008173384 = weight(_text_:a in 2011) [ClassicSimilarity], result of:
      0.008173384 = score(doc=2011,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 2011, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2011)
    0.008203502 = product of:
      0.016407004 = sum of:
        0.016407004 = weight(_text_:information in 2011) [ClassicSimilarity], result of:
          0.016407004 = score(doc=2011,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20156369 = fieldWeight in 2011, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2011)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: In an earlier paper [Egghe, L. (2004). A universal method of information retrieval evaluation: the "missing" link M and the universal IR surface. Information Processing and Management, 40, 21-30] we showed that, given an IR system, and if P denotes precision, R recall, F fallout and M miss (re-introduced in the paper mentioned above), we have the following relationship between P, R, F and M: P/(1-P)*(1-R)/R*F/(1-F)*(1-M)/M = 1. In this paper we prove the (more difficult) converse: given any four rational numbers in the interval ]0, 1[ satisfying the above equation, then there exists an IR system such that these four numbers (in any order) are the precision, recall, fallout and miss of this IR system. As a consequence we show that any three rational numbers in ]0, 1[ represent any three measures taken from precision, recall, fallout and miss of a certain IR system. We also show that this result is also true for two numbers instead of three.
Source: Information processing and management. 43(2007) no.1, S.265-272
Type: a

Egghe, L.: Type/Token-Taken informetrics (2003) 0.01
```
0.006540462 = product of:
  0.016351154 = sum of:
    0.010769378 = weight(_text_:a in 1608) [ClassicSimilarity], result of:
      0.010769378 = score(doc=1608,freq=20.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20142901 = fieldWeight in 1608, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1608)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 1608) [ClassicSimilarity], result of:
          0.011163551 = score(doc=1608,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 1608, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1608)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals producing articles, authors producing papers, etc.). In linguistics a source is also called a type (e.g., a word), and an item a token (e.g., the use of words in texts). In informetrics, types that occur often, for example, in a database will also be requested often, for example, in information retrieval. The relative use of these occurrences will be higher than their relative occurrences itself; hence, the name Type/ Token-Taken informetrics. This article studies the frequency distribution of Type/Token-Taken informetrics, starting from the one of Type/Token informetrics (i.e., source-item relationships). We are also studying the average number my* of item uses in Type/Token-Taken informetrics and compare this with the classical average number my in Type/Token informetrics. We show that my* >= my always, and that my* is an increasing function of my. A method is presented to actually calculate my* from my, and a given a, which is the exponent in Lotka's frequency distribution of Type/Token informetrics. We leave open the problem of developing non-Lotkaian Type/TokenTaken informetrics.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.7, S.603-610

Type

a

Egghe, L.: Sampling and concentration values of incomplete bibliographies (2002) 0.01

0.006474727 = product of:
  0.016186817 = sum of:
    0.010661141 = weight(_text_:a in 450) [ClassicSimilarity], result of:
      0.010661141 = score(doc=450,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.19940455 = fieldWeight in 450, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=450)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 450) [ClassicSimilarity], result of:
          0.011051352 = score(doc=450,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 450, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=450)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: This article studies concentration aspects of bibliographies. More, in particular, we study the impact of incompleteness of such a bibliography on its concentration values (i.e., its degree of inequality of production of its sources). Incompleteness is modeled by sampling in the complete bibliography. The model is general enough to comprise truncation of a bibliography as well as a systematic sample on sources or items. In all cases we prove that the sampled bibliography (or incomplete one) has a higher concentration value than the complete one. These models, hence, shed some light on the measurement of production inequality in incomplete bibliographies.
Source: Journal of the American Society for Information Science and technology. 53(2002) no.4, S.271-281
Type: a

Egghe, L.; Liang, L.; Rousseau, R.: Fundamental properties of rhythm sequences (2008) 0.01

0.006474727 = product of:
  0.016186817 = sum of:
    0.010661141 = weight(_text_:a in 1965) [ClassicSimilarity], result of:
      0.010661141 = score(doc=1965,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.19940455 = fieldWeight in 1965, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1965)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 1965) [ClassicSimilarity], result of:
          0.011051352 = score(doc=1965,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 1965, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1965)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Fundamental mathematical properties of rhythm sequences are studied. In particular, a set of three axioms for valid rhythm indicators is proposed, and it is shown that the R-indicator satisfies only two out of three but that the R-indicator satisfies all three. This fills a critical, logical gap in the study of these indicator sequences. Matrices leading to a constant R-sequence are called baseline matrices. They are characterized as matrices with constant w-year diachronous impact factors. The relation with classical impact factors is clarified. Using regression analysis matrices with a rhythm sequence that is on average equal to 1 (smaller than 1, larger than 1) are characterized.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1469-1478
Type: a

Egghe, L.: ¬A noninformetric analysis of the relationship between citation age and journal productivity (2001) 0.01

0.006219466 = product of:
  0.015548665 = sum of:
    0.010812371 = weight(_text_:a in 5685) [ClassicSimilarity], result of:
      0.010812371 = score(doc=5685,freq=14.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20223314 = fieldWeight in 5685, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5685)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 5685) [ClassicSimilarity], result of:
          0.009472587 = score(doc=5685,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 5685, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5685)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A problem, raised by Wallace (JASIS, 37,136-145,1986), on the relation between the journal's median citation age and its number of articles is studied. Leaving open the problem as such, we give a statistical explanation of this relationship, when replacing "median" by "mean" in Wallace's problem. The cloud of points, found by Wallace, is explained in this sense that the points are scattered over the area in first quadrant, limited by a curve of the form y=1 + E/x**2 where E is a constant. This curve is obtained by using the Central Limit Theorem in statistics and, hence, has no intrinsic informetric foundation. The article closes with some reflections on explanations of regularities in informetrics, based on statistical, probabilistic or informetric results, or on a combination thereof
Source: Journal of the American Society for Information Science and technology. 52(2001) no.5, S.371-377
Type: a

Egghe, L.; Rousseau, R.; Rousseau, S.: TOP-curves (2007) 0.01

0.0060245167 = product of:
  0.015061291 = sum of:
    0.009535614 = weight(_text_:a in 50) [ClassicSimilarity], result of:
      0.009535614 = score(doc=50,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.17835285 = fieldWeight in 50, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=50)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 50) [ClassicSimilarity], result of:
          0.011051352 = score(doc=50,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 50, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=50)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of topperformers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty studies, are shown to possess the properties necessary for adequate empirical ranking of various data arrays, based on the properties of the highest performers (i.e., the core). TOP-curves and essential TOP-curves, also introduced in this article, simultaneously represent the incidence, intensity, and inequality among the top. It is shown that TOPdominance partial order, introduced in this article, is stronger than Lorenz dominance order. In this way, this article contributes to the study of cores, a central issue in applied informetrics.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.6, S.777-785
Type: a

Egghe, L.: Vector retrieval, fuzzy retrieval and the universal fuzzy IR surface for IR evaluation (2004) 0.01

0.005822873 = product of:
  0.014557183 = sum of:
    0.0067426977 = weight(_text_:a in 2531) [ClassicSimilarity], result of:
      0.0067426977 = score(doc=2531,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12611452 = fieldWeight in 2531, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2531)
    0.007814486 = product of:
      0.015628971 = sum of:
        0.015628971 = weight(_text_:information in 2531) [ClassicSimilarity], result of:
          0.015628971 = score(doc=2531,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1920054 = fieldWeight in 2531, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2531)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: It is shown that vector information retrieval (IR) and general fuzzy IR uses two types of fuzzy set operations: the original "Zadeh min-max operations" and the so-called "probabilistic sum and algebraic product operations". The universal IR surface, valid for classical 0-1 IR (i.e. where ordinary sets are used) and used in IR evaluation, is extended to and reproved for vector IR, using the probabilistic sum and algebraic product model. We also show (by counterexample) that, using the "Zadeh min-max" fuzzy model, yields a breakdown of this IR surface.
Source: Information processing and management. 40(2004) no.4, S.603-618
Type: a

Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.01
```
0.00556948 = product of:
  0.0139237 = sum of:
    0.008341924 = weight(_text_:a in 3466) [ClassicSimilarity], result of:
      0.008341924 = score(doc=3466,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 3466, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3466)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 3466) [ClassicSimilarity], result of:
          0.011163551 = score(doc=3466,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 3466, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3466)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Power laws as defined in 1926 by A. Lotka are increasing in importance because they have been found valid in varied social networks including the Internet. In this article some unique properties of power laws are proven. They are shown to characterize functions with the scalefree property (also called seif-similarity property) as weIl as functions with the product property. Power laws have other desirable properties that are not shared by exponential laws, as we indicate in this paper. Specifically, Naranan (1970) proves the validity of Lotka's law based on the exponential growth of articles in journals and of the number of journals. His argument is reproduced here and a discrete-time argument is also given, yielding the same law as that of Lotka. This argument makes it possible to interpret the information production process as a seif-similar fractal and show the relation between Lotka's exponent and the (seif-similar) fractal dimension of the system. Lotkaian informetric systems are seif-similar fractals, a fact revealed by Mandelbrot (1977) in relation to nature, but is also true for random texts, which exemplify a very special type of informetric system.

Source

Journal of the American Society for Information Science and Technology. 56(2005) no.7, S.669-675

Type

a

Egghe, L.; Ravichandra Rao, I.K.: Duality revisited : construction of fractional frequency distributions based on two dual Lotka laws (2002) 0.01

0.005549766 = product of:
  0.013874415 = sum of:
    0.009138121 = weight(_text_:a in 1006) [ClassicSimilarity], result of:
      0.009138121 = score(doc=1006,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1709182 = fieldWeight in 1006, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=1006)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 1006) [ClassicSimilarity], result of:
          0.009472587 = score(doc=1006,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 1006, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1006)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n E N. Based an an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law.
Source: Journal of the American Society for Information Science and technology. 53(2002) no.10, S.789-801
Type: a

Search (34 results, page 1 of 2)

Authors

Themes