Search (19 results, page 1 of 1)

  • × author_ss:"Egghe, L."
  1. Egghe, L.: Vector retrieval, fuzzy retrieval and the universal fuzzy IR surface for IR evaluation (2004) 0.15
    0.1514608 = product of:
      0.3029216 = sum of:
        0.2644282 = weight(_text_:vector in 2531) [ClassicSimilarity], result of:
          0.2644282 = score(doc=2531,freq=6.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.8625983 = fieldWeight in 2531, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2531)
        0.038493384 = product of:
          0.07698677 = sum of:
            0.07698677 = weight(_text_:model in 2531) [ClassicSimilarity], result of:
              0.07698677 = score(doc=2531,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.4205716 = fieldWeight in 2531, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2531)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    It is shown that vector information retrieval (IR) and general fuzzy IR uses two types of fuzzy set operations: the original "Zadeh min-max operations" and the so-called "probabilistic sum and algebraic product operations". The universal IR surface, valid for classical 0-1 IR (i.e. where ordinary sets are used) and used in IR evaluation, is extended to and reproved for vector IR, using the probabilistic sum and algebraic product model. We also show (by counterexample) that, using the "Zadeh min-max" fuzzy model, yields a breakdown of this IR surface.
  2. Egghe, L.; Leydesdorff, L.: ¬The relation between Pearson's correlation coefficient r and Salton's cosine measure (2009) 0.14
    0.13549922 = product of:
      0.27099845 = sum of:
        0.18506117 = weight(_text_:vector in 2803) [ClassicSimilarity], result of:
          0.18506117 = score(doc=2803,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 2803, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=2803)
        0.08593727 = weight(_text_:space in 2803) [ClassicSimilarity], result of:
          0.08593727 = score(doc=2803,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 2803, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=2803)
      0.5 = coord(2/4)
    
    Abstract
    The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which together form a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm that provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space.
  3. Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.06
    0.060959876 = product of:
      0.2438395 = sum of:
        0.2438395 = weight(_text_:vector in 194) [ClassicSimilarity], result of:
          0.2438395 = score(doc=194,freq=10.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.79543537 = fieldWeight in 194, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.0390625 = fieldNorm(doc=194)
      0.25 = coord(1/4)
    
    Abstract
    In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
  4. Egghe, L.; Rousseau, R.: Topological aspects of information retrieval (1998) 0.05
    0.050130073 = product of:
      0.20052029 = sum of:
        0.20052029 = weight(_text_:space in 2157) [ClassicSimilarity], result of:
          0.20052029 = score(doc=2157,freq=8.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.8071766 = fieldWeight in 2157, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2157)
      0.25 = coord(1/4)
    
    Abstract
    Let (DS, DQ, sim) be a retrieval system consisting of a document space DS, a query space QS, and a function sim, expressing the similarity between a document and a query. Following D.M. Everett and S.C. Cater (1992), we introduce topologies on the document space. These topologies are generated by the similarity function sim and the query space QS. 3 topologies will be studied: the retrieval topology, the similarity topology and the (pseudo-)metric one. It is shown that the retrieval topology is the coarsest of the three, while the (pseudo-)metric is the strongest. These 3 topologies are generally different, reflecting distinct topological aspects of information retrieval. We present necessary and sufficient conditions for these topological aspects to be equal
  5. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.05
    0.046265293 = product of:
      0.18506117 = sum of:
        0.18506117 = weight(_text_:vector in 2708) [ClassicSimilarity], result of:
          0.18506117 = score(doc=2708,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 2708, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=2708)
      0.25 = coord(1/4)
    
    Abstract
    The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the trajectories of the form [X]=a[Y], where a > 0 is a constant and [·] denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
  6. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.05
    0.046265293 = product of:
      0.18506117 = sum of:
        0.18506117 = weight(_text_:vector in 3993) [ClassicSimilarity], result of:
          0.18506117 = score(doc=3993,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 3993, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=3993)
      0.25 = coord(1/4)
    
    Abstract
    Similarity measures, such as the ones of Jaccard, Dice, or Cosine, measure the similarity between two vectors. A good property for similarity measures would be that, if we add a constant vector to both vectors, then the similarity must increase. We show that Dice and Jaccard satisfy this property while Cosine and both overlap measures do not. Adding a constant vector is called, in Lorenz concentration theory, "nominal increase" and we show that the stronger "transfer principle" is not a required good property for similarity measures. Another good property is that, when we have two vectors and if we add one of these vectors to both vectors, then the similarity must increase. Now Dice, Jaccard, Cosine, and one of the overlap measures satisfy this property, while the other overlap measure does not. Also a variant of this latter property is studied.
  7. Egghe, L.; Liang, L.; Rousseau, R.: ¬A relation between h-index and impact factor in the power-law model (2009) 0.01
    0.0134698795 = product of:
      0.053879518 = sum of:
        0.053879518 = product of:
          0.107759036 = sum of:
            0.107759036 = weight(_text_:model in 6759) [ClassicSimilarity], result of:
              0.107759036 = score(doc=6759,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.58867764 = fieldWeight in 6759, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6759)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Using a power-law model, the two best-known topics in citation analysis, namely the impact factor and the Hirsch index, are unified into one relation (not a function). The validity of our model is, at least in a qualitative way, confirmed by real data.
  8. Egghe, L.; Ravichandra Rao, I.K.: Duality revisited : construction of fractional frequency distributions based on two dual Lotka laws (2002) 0.01
    0.0082485825 = product of:
      0.03299433 = sum of:
        0.03299433 = product of:
          0.06598866 = sum of:
            0.06598866 = weight(_text_:model in 1006) [ClassicSimilarity], result of:
              0.06598866 = score(doc=1006,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36048993 = fieldWeight in 1006, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1006)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n E N. Based an an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law.
  9. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.01
    0.0082485825 = product of:
      0.03299433 = sum of:
        0.03299433 = product of:
          0.06598866 = sum of:
            0.06598866 = weight(_text_:model in 3464) [ClassicSimilarity], result of:
              0.06598866 = score(doc=3464,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36048993 = fieldWeight in 3464, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3, ... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
  10. Egghe, L.: ¬A model for the size-frequency function of coauthor pairs (2008) 0.01
    0.0082485825 = product of:
      0.03299433 = sum of:
        0.03299433 = product of:
          0.06598866 = sum of:
            0.06598866 = weight(_text_:model in 2366) [ClassicSimilarity], result of:
              0.06598866 = score(doc=2366,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36048993 = fieldWeight in 2366, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2366)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Lotka's law was formulated to describe the number of authors with a certain number of publications. Empirical results (Morris & Goldstein, 2007) indicate that Lotka's law is also valid if one counts the number of publications of coauthor pairs. This article gives a simple model proving this to be true, with the same Lotka exponent, if the number of coauthored papers is proportional to the number of papers of the individual coauthors. Under the assumption that this number of coauthored papers is more than proportional to the number of papers of the individual authors (to be explained in the article), we can prove that the size-frequency function of coauthor pairs is Lotkaian with an exponent that is higher than that of the Lotka function of individual authors, a fact that is confirmed in experimental results.
  11. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01
    0.008062307 = product of:
      0.032249227 = sum of:
        0.032249227 = product of:
          0.064498454 = sum of:
            0.064498454 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.064498454 = score(doc=4992,freq=2.0), product of:
                0.16670525 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047605187 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    14. 2.2012 12:53:22
  12. Egghe, L.: Informetric explanation of some Leiden Ranking graphs (2014) 0.01
    0.007776838 = product of:
      0.031107351 = sum of:
        0.031107351 = product of:
          0.062214702 = sum of:
            0.062214702 = weight(_text_:model in 1236) [ClassicSimilarity], result of:
              0.062214702 = score(doc=1236,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.33987316 = fieldWeight in 1236, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1236)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The S-shaped functional relation between the mean citation score and the proportion of top 10% publications for the 500 Leiden Ranking universities is explained using results of the shifted Lotka function. Also the concave or convex relation between the proportion of top 100?% publications, for different fractions ?, is explained using the obtained new informetric model.
  13. Egghe, L.: Sampling and concentration values of incomplete bibliographies (2002) 0.01
    0.006804733 = product of:
      0.027218932 = sum of:
        0.027218932 = product of:
          0.054437865 = sum of:
            0.054437865 = weight(_text_:model in 450) [ClassicSimilarity], result of:
              0.054437865 = score(doc=450,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.29738903 = fieldWeight in 450, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=450)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article studies concentration aspects of bibliographies. More, in particular, we study the impact of incompleteness of such a bibliography on its concentration values (i.e., its degree of inequality of production of its sources). Incompleteness is modeled by sampling in the complete bibliography. The model is general enough to comprise truncation of a bibliography as well as a systematic sample on sources or items. In all cases we prove that the sampled bibliography (or incomplete one) has a higher concentration value than the complete one. These models, hence, shed some light on the measurement of production inequality in incomplete bibliographies.
  14. Egghe, L.: Theory of the topical coverage of multiple databases (2013) 0.01
    0.006804733 = product of:
      0.027218932 = sum of:
        0.027218932 = product of:
          0.054437865 = sum of:
            0.054437865 = weight(_text_:model in 526) [ClassicSimilarity], result of:
              0.054437865 = score(doc=526,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.29738903 = fieldWeight in 526, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=526)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We present a model that describes which fraction of the literature on a certain topic we will find when we use n (n = 1, 2, .) databases. It is a generalization of the theory of discovering usability problems. We prove that, in all practical cases, this fraction is a concave function of n, the number of used databases, thereby explaining some graphs that exist in the literature. We also study limiting features of this fraction for n very high and we characterize the case that we find all literature on a certain topic for n high enough.
  15. Egghe, L.: Influence of adding or deleting items and sources on the h-index (2010) 0.01
    0.0058326283 = product of:
      0.023330513 = sum of:
        0.023330513 = product of:
          0.046661027 = sum of:
            0.046661027 = weight(_text_:model in 3336) [ClassicSimilarity], result of:
              0.046661027 = score(doc=3336,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.25490487 = fieldWeight in 3336, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3336)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Adding or deleting items such as self-citations has an influence on the h-index of an author. This influence will be proved mathematically in this article. We hereby prove the experimental finding in E. Gianoli and M.A. Molina-Montenegro ([2009]) that the influence of adding or deleting self-citations on the h-index is greater for low values of the h-index. Why this is logical also is shown by a simple theoretical example. Adding or deleting sources such as adding or deleting minor contributions of an author also has an influence on the h-index of this author; this influence is modeled in this article. This model explains some practical examples found in X. Hu, R. Rousseau, and J. Chen (in press).
  16. Egghe, L.; Guns, R.; Rousseau, R.: Thoughts on uncitedness : Nobel laureates and Fields medalists as case studies (2011) 0.01
    0.0058326283 = product of:
      0.023330513 = sum of:
        0.023330513 = product of:
          0.046661027 = sum of:
            0.046661027 = weight(_text_:model in 4994) [ClassicSimilarity], result of:
              0.046661027 = score(doc=4994,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.25490487 = fieldWeight in 4994, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4994)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Contrary to what one might expect, Nobel laureates and Fields medalists have a rather large fraction (10% or more) of uncited publications. This is the case for (in total) 75 examined researchers from the fields of mathematics (Fields medalists), physics, chemistry, and physiology or medicine (Nobel laureates). We study several indicators for these researchers, including the h-index, total number of publications, average number of citations per publication, the number (and fraction) of uncited publications, and their interrelations. The most remarkable result is a positive correlation between the h-index and the number of uncited articles. We also present a Lotkaian model, which partially explains the empirically found regularities.
  17. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.00
    0.004837384 = product of:
      0.019349536 = sum of:
        0.019349536 = product of:
          0.03869907 = sum of:
            0.03869907 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.03869907 = score(doc=7659,freq=2.0), product of:
                0.16670525 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047605187 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  18. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.00
    0.004837384 = product of:
      0.019349536 = sum of:
        0.019349536 = product of:
          0.03869907 = sum of:
            0.03869907 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
              0.03869907 = score(doc=2558,freq=2.0), product of:
                0.16670525 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047605187 = queryNorm
                0.23214069 = fieldWeight in 2558, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2558)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    14. 8.2004 19:17:22
  19. Egghe, L.: Empirical and combinatorial study of country occurrences in multi-authored papers (2006) 0.00
    0.003888419 = product of:
      0.015553676 = sum of:
        0.015553676 = product of:
          0.031107351 = sum of:
            0.031107351 = weight(_text_:model in 81) [ClassicSimilarity], result of:
              0.031107351 = score(doc=81,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.16993658 = fieldWeight in 81, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.03125 = fieldNorm(doc=81)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a m-authored paper, j =1, ..., m-1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a non-decreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m-1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m-1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.