Search (59 results, page 1 of 3)

  • × author_ss:"Egghe, L."
  1. Egghe, L.: ¬A rationale for the Hirsch-index rank-order distribution and a comparison with the impact factor rank-order distribution (2009) 0.08
    0.07642611 = product of:
      0.26749137 = sum of:
        0.018568728 = weight(_text_:of in 3124) [ClassicSimilarity], result of:
          0.018568728 = score(doc=3124,freq=10.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.2704316 = fieldWeight in 3124, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3124)
        0.24892265 = weight(_text_:distribution in 3124) [ClassicSimilarity], result of:
          0.24892265 = score(doc=3124,freq=12.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            1.03632 = fieldWeight in 3124, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3124)
      0.2857143 = coord(2/7)
    
    Abstract
    We present a rationale for the Hirsch-index rank-order distribution and prove that it is a power law (hence a straight line in the log-log scale). This is confirmed by experimental data of Pyykkö and by data produced in this article on 206 mathematics journals. This distribution is of a completely different nature than the impact factor (IF) rank-order distribution which (as proved in a previous article) is S-shaped. This is also confirmed by our example. Only in the log-log scale of the h-index distribution do we notice a concave deviation of the straight line for higher ranks. This phenomenon is discussed.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.10, S.2142-2144
  2. Egghe, L.; Rousseau, R.: Duality in information retrieval and the hypegeometric distribution (1997) 0.05
    0.049639102 = product of:
      0.17373686 = sum of:
        0.0094905 = weight(_text_:of in 647) [ClassicSimilarity], result of:
          0.0094905 = score(doc=647,freq=2.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.13821793 = fieldWeight in 647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=647)
        0.16424635 = weight(_text_:distribution in 647) [ClassicSimilarity], result of:
          0.16424635 = score(doc=647,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.68379384 = fieldWeight in 647, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0625 = fieldNorm(doc=647)
      0.2857143 = coord(2/7)
    
    Abstract
    Asserts that duality is an important topic in informetrics, especially in connection with the classical informetric laws. Yet this concept is less studied in information retrieval. It deals with the unification or symmetry between queries and documents, search formulation versus indexing, and relevant versus retrieved documents. Elaborates these ideas and highlights the connection with the hypergeometric distribution
    Source
    Journal of documentation. 53(1997) no.5, S.499-496
  3. Egghe, L.; Rousseau, R.: Aging, obsolescence, impact, growth, and utilization : definitions and relations (2000) 0.04
    0.040177125 = product of:
      0.14061993 = sum of:
        0.01743516 = weight(_text_:of in 5154) [ClassicSimilarity], result of:
          0.01743516 = score(doc=5154,freq=12.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.25392252 = fieldWeight in 5154, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=5154)
        0.12318477 = weight(_text_:distribution in 5154) [ClassicSimilarity], result of:
          0.12318477 = score(doc=5154,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.5128454 = fieldWeight in 5154, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=5154)
      0.2857143 = coord(2/7)
    
    Abstract
    The notions aging, obsolescence, impact, growth, utilization, and their relations are studied. It is shown how to correct an observed citation distribution for growth, once the growth distribution is known. The relation of this correction procedure with the calculation of impact measures is explained. More interestingly, we have shown how the influence of growth on aging can be studied over a complete period as a whole. Here, the difference between the so-called average and global aging distributions is the main factor. Our main result is that growth can influence aging but that it does not cause aging. A short overview of some classical articles on this topic is given. Results of these earlier works are placed in the framework set up in this article
    Source
    Journal of the American Society for Information Science. 51(2000) no.11, S.1004-1017
  4. Egghe, L.; Rousseau, R.: ¬The influence of publication delays on the observed aging distribution of scientific literature (2000) 0.04
    0.038605917 = product of:
      0.1351207 = sum of:
        0.018981 = weight(_text_:of in 4385) [ClassicSimilarity], result of:
          0.018981 = score(doc=4385,freq=8.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.27643585 = fieldWeight in 4385, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=4385)
        0.11613971 = weight(_text_:distribution in 4385) [ClassicSimilarity], result of:
          0.11613971 = score(doc=4385,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.48351526 = fieldWeight in 4385, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0625 = fieldNorm(doc=4385)
      0.2857143 = coord(2/7)
    
    Abstract
    Observed aging curves are influenced by publication delays. In this article, we show how the 'undisturbed' aging function and the publication delay combine to give the observed aging function. This combination is performed by a mathematical operation known as convolution. Examples are given, such as the convolution of 2 Poisson distributions, 2 exponential distributions, a 2 lognormal distributions. A paradox is observed between theory and real data
    Source
    Journal of the American Society for Information Science. 51(2000) no.2, S.158-165
  5. Egghe, L.: Type/Token-Taken informetrics (2003) 0.03
    0.034950495 = product of:
      0.12232673 = sum of:
        0.019672766 = weight(_text_:of in 1608) [ClassicSimilarity], result of:
          0.019672766 = score(doc=1608,freq=22.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.28651062 = fieldWeight in 1608, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1608)
        0.102653965 = weight(_text_:distribution in 1608) [ClassicSimilarity], result of:
          0.102653965 = score(doc=1608,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.42737114 = fieldWeight in 1608, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1608)
      0.2857143 = coord(2/7)
    
    Abstract
    Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals producing articles, authors producing papers, etc.). In linguistics a source is also called a type (e.g., a word), and an item a token (e.g., the use of words in texts). In informetrics, types that occur often, for example, in a database will also be requested often, for example, in information retrieval. The relative use of these occurrences will be higher than their relative occurrences itself; hence, the name Type/ Token-Taken informetrics. This article studies the frequency distribution of Type/Token-Taken informetrics, starting from the one of Type/Token informetrics (i.e., source-item relationships). We are also studying the average number my* of item uses in Type/Token-Taken informetrics and compare this with the classical average number my in Type/Token informetrics. We show that my* >= my always, and that my* is an increasing function of my. A method is presented to actually calculate my* from my, and a given a, which is the exponent in Lotka's frequency distribution of Type/Token informetrics. We leave open the problem of developing non-Lotkaian Type/TokenTaken informetrics.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.7, S.603-610
  6. Egghe, L.; Guns, R.: Applications of the generalized law of Benford to informetric data (2012) 0.03
    0.031632032 = product of:
      0.1107121 = sum of:
        0.023607321 = weight(_text_:of in 376) [ClassicSimilarity], result of:
          0.023607321 = score(doc=376,freq=22.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.34381276 = fieldWeight in 376, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=376)
        0.08710478 = weight(_text_:distribution in 376) [ClassicSimilarity], result of:
          0.08710478 = score(doc=376,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.36263645 = fieldWeight in 376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=376)
      0.2857143 = coord(2/7)
    
    Abstract
    In a previous work (Egghe, 2011), the first author showed that Benford's law (describing the logarithmic distribution of the numbers 1, 2, ... , 9 as first digits of data in decimal form) is related to the classical law of Zipf with exponent 1. The work of Campanario and Coslado (2011), however, shows that Benford's law does not always fit practical data in a statistical sense. In this article, we use a generalization of Benford's law related to the general law of Zipf with exponent ? > 0. Using data from Campanario and Coslado, we apply nonlinear least squares to determine the optimal ? and show that this generalized law of Benford fits the data better than the classical law of Benford.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.8, S.1662-1665
  7. Egghe, L.: Mathematical theory of the h- and g-index in case of fractional counting of authorship (2008) 0.03
    0.03131814 = product of:
      0.10961348 = sum of:
        0.022508696 = weight(_text_:of in 2004) [ClassicSimilarity], result of:
          0.022508696 = score(doc=2004,freq=20.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.32781258 = fieldWeight in 2004, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2004)
        0.08710478 = weight(_text_:distribution in 2004) [ClassicSimilarity], result of:
          0.08710478 = score(doc=2004,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.36263645 = fieldWeight in 2004, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=2004)
      0.2857143 = coord(2/7)
    
    Abstract
    This article studies the h-index (Hirsch index) and the g-index of authors, in case one counts authorship of the cited articles in a fractional way. There are two ways to do this: One counts the citations to these papers in a fractional way or one counts the ranks of the papers in a fractional way as credit for an author. In both cases, we define the fractional h- and g-indexes, and we present inequalities (both upper and lower bounds) between these fractional h- and g-indexes and their corresponding unweighted values (also involving, of course, the coauthorship distribution). Wherever applicable, examples and counterexamples are provided. In a concrete example (the publication citation list of the present author), we make explicit calculations of these fractional h- and g-indexes and show that they are not very different from the unweighted ones.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.10, S.1608-1616
  8. Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.03
    0.027080342 = product of:
      0.09478119 = sum of:
        0.022193875 = weight(_text_:of in 194) [ClassicSimilarity], result of:
          0.022193875 = score(doc=194,freq=28.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.32322758 = fieldWeight in 194, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=194)
        0.07258732 = weight(_text_:distribution in 194) [ClassicSimilarity], result of:
          0.07258732 = score(doc=194,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.30219704 = fieldWeight in 194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=194)
      0.2857143 = coord(2/7)
    
    Abstract
    In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.1165-1177
  9. Egghe, L.: Empirical and combinatorial study of country occurrences in multi-authored papers (2006) 0.02
    0.022014532 = product of:
      0.07705086 = sum of:
        0.018981 = weight(_text_:of in 81) [ClassicSimilarity], result of:
          0.018981 = score(doc=81,freq=32.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.27643585 = fieldWeight in 81, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=81)
        0.058069855 = weight(_text_:distribution in 81) [ClassicSimilarity], result of:
          0.058069855 = score(doc=81,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.24175763 = fieldWeight in 81, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.03125 = fieldNorm(doc=81)
      0.2857143 = coord(2/7)
    
    Abstract
    Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a m-authored paper, j =1, ..., m-1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a non-decreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m-1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m-1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.
  10. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.01
    0.012975623 = product of:
      0.04541468 = sum of:
        0.027567413 = weight(_text_:of in 7659) [ClassicSimilarity], result of:
          0.027567413 = score(doc=7659,freq=30.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.4014868 = fieldWeight in 7659, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=7659)
        0.017847266 = product of:
          0.035694532 = sum of:
            0.035694532 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.035694532 = score(doc=7659,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    It is possible, using ISI's Journal Citation Report (JCR), to calculate average impact factors (AIF) for LCR's subject categories but it can be more useful to know the global Impact Factor (GIF) of a subject category and compare the 2 values. Reports results of a study to compare the relationships between AIFs and GIFs of subjects, based on the particular case of the average impact factor of a subfield versus the impact factor of this subfield as a whole, the difference being studied between an average of quotients, denoted as AQ, and a global average, obtained as a quotient of averages, and denoted as GQ. In the case of impact factors, AQ becomes the average impact factor of a field, and GQ becomes its global impact factor. Discusses a number of applications of this technique in the context of informetrics and scientometrics
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  11. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01
    0.011888163 = product of:
      0.04160857 = sum of:
        0.011863125 = weight(_text_:of in 4992) [ClassicSimilarity], result of:
          0.011863125 = score(doc=4992,freq=2.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.17277241 = fieldWeight in 4992, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.078125 = fieldNorm(doc=4992)
        0.029745443 = product of:
          0.059490886 = sum of:
            0.059490886 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.059490886 = score(doc=4992,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    14. 2.2012 12:53:22
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.429
  12. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.01
    0.01085133 = product of:
      0.037979655 = sum of:
        0.020132389 = weight(_text_:of in 2558) [ClassicSimilarity], result of:
          0.020132389 = score(doc=2558,freq=16.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.2932045 = fieldWeight in 2558, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2558)
        0.017847266 = product of:
          0.035694532 = sum of:
            0.035694532 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
              0.035694532 = score(doc=2558,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.23214069 = fieldWeight in 2558, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2558)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.
    Date
    14. 8.2004 19:17:22
  13. Egghe, L.: On the law of Zipf-Mandelbrot for multi-word phrases (1999) 0.00
    0.004067357 = product of:
      0.028471498 = sum of:
        0.028471498 = weight(_text_:of in 3058) [ClassicSimilarity], result of:
          0.028471498 = score(doc=3058,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.41465375 = fieldWeight in 3058, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=3058)
      0.14285715 = coord(1/7)
    
    Abstract
    This article studies the probabilities of the occurence of multi-word (m-word) phrases (m=2,3,...) in relation to the probabilities of occurence of the single words. It is well known that, in the latter case, the lae of Zipf is valid (i.e., a power law). We prove that in the case of m-word phrases (m>=2), this is not the case. We present 2 independent proof of this
    Source
    Journal of the American Society for Information Science. 50(1999) no.3, S.233-241
  14. Egghe, L.: Mathematical theories of citation (1998) 0.00
    0.004067357 = product of:
      0.028471498 = sum of:
        0.028471498 = weight(_text_:of in 5125) [ClassicSimilarity], result of:
          0.028471498 = score(doc=5125,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.41465375 = fieldWeight in 5125, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=5125)
      0.14285715 = coord(1/7)
    
    Abstract
    Focuses on possible mathematical theories of citation and on the intrinsic problems related to it. Sheds light on aspects of mathematical complexity as encountered in, for example, fractal theory and Mandelbrot's law. Also discusses dynamical aspects of citation theory as reflected in evolutions of journal rankings, centres of gravity or of the set of source journals. Makes some comments in this connection on growth and obsolescence
    Footnote
    Contribution to a thematic issue devoted to 'Theories of citation?'
  15. Egghe, L.: ¬A model for the size-frequency function of coauthor pairs (2008) 0.00
    0.003938202 = product of:
      0.027567413 = sum of:
        0.027567413 = weight(_text_:of in 2366) [ClassicSimilarity], result of:
          0.027567413 = score(doc=2366,freq=30.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.4014868 = fieldWeight in 2366, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2366)
      0.14285715 = coord(1/7)
    
    Abstract
    Lotka's law was formulated to describe the number of authors with a certain number of publications. Empirical results (Morris & Goldstein, 2007) indicate that Lotka's law is also valid if one counts the number of publications of coauthor pairs. This article gives a simple model proving this to be true, with the same Lotka exponent, if the number of coauthored papers is proportional to the number of papers of the individual coauthors. Under the assumption that this number of coauthored papers is more than proportional to the number of papers of the individual authors (to be explained in the article), we can prove that the size-frequency function of coauthor pairs is Lotkaian with an exponent that is higher than that of the Lotka function of individual authors, a fact that is confirmed in experimental results.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.13, S.2133-2137
  16. Egghe, L.: Dynamic h-index : the Hirsch index in function of time (2007) 0.00
    0.0038347412 = product of:
      0.026843186 = sum of:
        0.026843186 = weight(_text_:of in 147) [ClassicSimilarity], result of:
          0.026843186 = score(doc=147,freq=16.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.39093933 = fieldWeight in 147, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=147)
      0.14285715 = coord(1/7)
    
    Abstract
    When there are a group of articles and the present time is fixed we can determine the unique number h being the number of articles that received h or more citations while the other articles received a number of citations which is not larger than h. In this article, the time dependence of the h-index is determined. This is important to describe the expected career evolution of a scientist's work or of a journal's production in a fixed year.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.3, S.452-454
  17. Egghe, L.: Zipfian and Lotkaian continuous concentration theory (2005) 0.00
    0.0038046641 = product of:
      0.026632648 = sum of:
        0.026632648 = weight(_text_:of in 3678) [ClassicSimilarity], result of:
          0.026632648 = score(doc=3678,freq=28.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.38787308 = fieldWeight in 3678, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3678)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article concentration (i.e., inequality) aspects of the functions of Zipf and of Lotka are studied. Since both functions are power laws (i.e., they are mathematically the same) it suffices to develop one concentration theory for power laws and apply it twice for the different interpretations of the laws of Zipf and Lotka. After a brief repetition of the functional relationships between Zipf's law and Lotka's law, we prove that Price's law of concentration is equivalent with Zipf's law. A major part of this article is devoted to the development of continuous concentration theory, based an Lorenz curves. The Lorenz curve for power functions is calculated and, based an this, some important concentration measures such as the ones of Gini, Theil, and the variation coefficient. Using Lorenz curves, it is shown that the concentration of a power law increases with its exponent and this result is interpreted in terms of the functions of Zipf and Lotka.
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.9, S.935-945
  18. Egghe, L.: Sampling and concentration values of incomplete bibliographies (2002) 0.00
    0.0037514495 = product of:
      0.026260145 = sum of:
        0.026260145 = weight(_text_:of in 450) [ClassicSimilarity], result of:
          0.026260145 = score(doc=450,freq=20.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.38244802 = fieldWeight in 450, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=450)
      0.14285715 = coord(1/7)
    
    Abstract
    This article studies concentration aspects of bibliographies. More, in particular, we study the impact of incompleteness of such a bibliography on its concentration values (i.e., its degree of inequality of production of its sources). Incompleteness is modeled by sampling in the complete bibliography. The model is general enough to comprise truncation of a bibliography as well as a systematic sample on sources or items. In all cases we prove that the sampled bibliography (or incomplete one) has a higher concentration value than the complete one. These models, hence, shed some light on the measurement of production inequality in incomplete bibliographies.
    Source
    Journal of the American Society for Information Science and technology. 53(2002) no.4, S.271-281
  19. Egghe, L.; Rousseau, R.; Rousseau, S.: TOP-curves (2007) 0.00
    0.0035589375 = product of:
      0.02491256 = sum of:
        0.02491256 = weight(_text_:of in 50) [ClassicSimilarity], result of:
          0.02491256 = score(doc=50,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.36282203 = fieldWeight in 50, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=50)
      0.14285715 = coord(1/7)
    
    Abstract
    Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of topperformers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty studies, are shown to possess the properties necessary for adequate empirical ranking of various data arrays, based on the properties of the highest performers (i.e., the core). TOP-curves and essential TOP-curves, also introduced in this article, simultaneously represent the incidence, intensity, and inequality among the top. It is shown that TOPdominance partial order, introduced in this article, is stronger than Lorenz dominance order. In this way, this article contributes to the study of cores, a central issue in applied informetrics.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.6, S.777-785
  20. Egghe, L.: Theory of the topical coverage of multiple databases (2013) 0.00
    0.0035589375 = product of:
      0.02491256 = sum of:
        0.02491256 = weight(_text_:of in 526) [ClassicSimilarity], result of:
          0.02491256 = score(doc=526,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.36282203 = fieldWeight in 526, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=526)
      0.14285715 = coord(1/7)
    
    Abstract
    We present a model that describes which fraction of the literature on a certain topic we will find when we use n (n = 1, 2, .) databases. It is a generalization of the theory of discovering usability problems. We prove that, in all practical cases, this fraction is a concave function of n, the number of used databases, thereby explaining some graphs that exist in the literature. We also study limiting features of this fraction for n very high and we characterize the case that we find all literature on a certain topic for n high enough.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.1, S.126-131