Search (13 results, page 1 of 1)

  • × author_ss:"Egghe, L."
  1. Egghe, L.: Empirical and combinatorial study of country occurrences in multi-authored papers (2006) 0.04
    0.042776465 = product of:
      0.10694116 = sum of:
        0.050096523 = weight(_text_:m in 81) [ClassicSimilarity], result of:
          0.050096523 = score(doc=81,freq=42.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.5039684 = fieldWeight in 81, product of:
              6.4807405 = tf(freq=42.0), with freq of:
                42.0 = termFreq=42.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03125 = fieldNorm(doc=81)
        0.05684464 = weight(_text_:n in 81) [ClassicSimilarity], result of:
          0.05684464 = score(doc=81,freq=6.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.33004245 = fieldWeight in 81, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03125 = fieldNorm(doc=81)
      0.4 = coord(2/5)
    
    Abstract
    Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a m-authored paper, j =1, ..., m-1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a non-decreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m-1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m-1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.
  2. Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.03
    0.032819267 = product of:
      0.16409633 = sum of:
        0.16409633 = weight(_text_:n in 194) [ClassicSimilarity], result of:
          0.16409633 = score(doc=194,freq=32.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.95275044 = fieldWeight in 194, product of:
              5.656854 = tf(freq=32.0), with freq of:
                32.0 = termFreq=32.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=194)
      0.2 = coord(1/5)
    
    Abstract
    In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
  3. Egghe, L.: Theory of the topical coverage of multiple databases (2013) 0.03
    0.025685137 = product of:
      0.12842569 = sum of:
        0.12842569 = weight(_text_:n in 526) [ClassicSimilarity], result of:
          0.12842569 = score(doc=526,freq=10.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.74564517 = fieldWeight in 526, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0546875 = fieldNorm(doc=526)
      0.2 = coord(1/5)
    
    Abstract
    We present a model that describes which fraction of the literature on a certain topic we will find when we use n (n = 1, 2, .) databases. It is a generalization of the theory of discovering usability problems. We prove that, in all practical cases, this fraction is a concave function of n, the number of used databases, thereby explaining some graphs that exist in the literature. We also study limiting features of this fraction for n very high and we characterize the case that we find all literature on a certain topic for n high enough.
  4. Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.02
    0.020097617 = product of:
      0.10048808 = sum of:
        0.10048808 = weight(_text_:n in 2009) [ClassicSimilarity], result of:
          0.10048808 = score(doc=2009,freq=12.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.58343816 = fieldWeight in 2009, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2009)
      0.2 = coord(1/5)
    
    Abstract
    The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
  5. Egghe, L.; Ravichandra Rao, I.K.: Duality revisited : construction of fractional frequency distributions based on two dual Lotka laws (2002) 0.02
    0.01969156 = product of:
      0.0984578 = sum of:
        0.0984578 = weight(_text_:n in 1006) [ClassicSimilarity], result of:
          0.0984578 = score(doc=1006,freq=8.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.57165027 = fieldWeight in 1006, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=1006)
      0.2 = coord(1/5)
    
    Abstract
    Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n E N. Based an an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law.
  6. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.02
    0.019612942 = product of:
      0.049032353 = sum of:
        0.032795873 = weight(_text_:m in 2558) [ClassicSimilarity], result of:
          0.032795873 = score(doc=2558,freq=8.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.3299248 = fieldWeight in 2558, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.046875 = fieldNorm(doc=2558)
        0.016236478 = product of:
          0.032472957 = sum of:
            0.032472957 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
              0.032472957 = score(doc=2558,freq=2.0), product of:
                0.13988481 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03994621 = queryNorm
                0.23214069 = fieldWeight in 2558, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2558)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.
    Date
    14. 8.2004 19:17:22
  7. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.01
    0.013924035 = product of:
      0.06962018 = sum of:
        0.06962018 = weight(_text_:n in 3464) [ClassicSimilarity], result of:
          0.06962018 = score(doc=3464,freq=4.0), product of:
            0.17223433 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.03994621 = queryNorm
            0.40421778 = fieldWeight in 3464, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.2 = coord(1/5)
    
    Abstract
    The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3, ... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
  8. Egghe, L.: On the law of Zipf-Mandelbrot for multi-word phrases (1999) 0.01
    0.008745566 = product of:
      0.04372783 = sum of:
        0.04372783 = weight(_text_:m in 3058) [ClassicSimilarity], result of:
          0.04372783 = score(doc=3058,freq=8.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.4398997 = fieldWeight in 3058, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.0625 = fieldNorm(doc=3058)
      0.2 = coord(1/5)
    
    Abstract
    This article studies the probabilities of the occurence of multi-word (m-word) phrases (m=2,3,...) in relation to the probabilities of occurence of the single words. It is well known that, in the latter case, the lae of Zipf is valid (i.e., a power law). We prove that in the case of m-word phrases (m>=2), this is not the case. We present 2 independent proof of this
  9. Egghe, L.: Existence theorem of the quadruple (P, R, F, M) : precision, recall, fallout and miss (2007) 0.01
    0.008033316 = product of:
      0.040166575 = sum of:
        0.040166575 = weight(_text_:m in 2011) [ClassicSimilarity], result of:
          0.040166575 = score(doc=2011,freq=12.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.40407366 = fieldWeight in 2011, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.046875 = fieldNorm(doc=2011)
      0.2 = coord(1/5)
    
    Abstract
    In an earlier paper [Egghe, L. (2004). A universal method of information retrieval evaluation: the "missing" link M and the universal IR surface. Information Processing and Management, 40, 21-30] we showed that, given an IR system, and if P denotes precision, R recall, F fallout and M miss (re-introduced in the paper mentioned above), we have the following relationship between P, R, F and M: P/(1-P)*(1-R)/R*F/(1-F)*(1-M)/M = 1. In this paper we prove the (more difficult) converse: given any four rational numbers in the interval ]0, 1[ satisfying the above equation, then there exists an IR system such that these four numbers (in any order) are the precision, recall, fallout and miss of this IR system. As a consequence we show that any three rational numbers in ]0, 1[ represent any three measures taken from precision, recall, fallout and miss of a certain IR system. We also show that this result is also true for two numbers instead of three.
  10. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01
    0.00541216 = product of:
      0.0270608 = sum of:
        0.0270608 = product of:
          0.0541216 = sum of:
            0.0541216 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.0541216 = score(doc=4992,freq=2.0), product of:
                0.13988481 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03994621 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Date
    14. 2.2012 12:53:22
  11. Egghe, L.; Rousseau, R.: Introduction to informetrics : quantitative methods in library, documentation and information science (1990) 0.00
    0.0038261854 = product of:
      0.019130927 = sum of:
        0.019130927 = weight(_text_:m in 1515) [ClassicSimilarity], result of:
          0.019130927 = score(doc=1515,freq=2.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.19245613 = fieldWeight in 1515, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1515)
      0.2 = coord(1/5)
    
    Type
    m
  12. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.00
    0.0032472957 = product of:
      0.016236478 = sum of:
        0.016236478 = product of:
          0.032472957 = sum of:
            0.032472957 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.032472957 = score(doc=7659,freq=2.0), product of:
                0.13988481 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03994621 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  13. Rousseau, R.; Egghe, L.; Guns, R.: Becoming metric-wise : a bibliometric guide for researchers (2018) 0.00
    0.0027329896 = product of:
      0.013664948 = sum of:
        0.013664948 = weight(_text_:m in 5226) [ClassicSimilarity], result of:
          0.013664948 = score(doc=5226,freq=2.0), product of:
            0.09940409 = queryWeight, product of:
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.03994621 = queryNorm
            0.13746867 = fieldWeight in 5226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4884486 = idf(docFreq=9980, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.2 = coord(1/5)
    
    Type
    m