Search (10 results, page 1 of 1)

  • × author_ss:"Egghe, L."
  1. Egghe, L.: Properties of the n-overlap vector and n-overlap similarity theory (2006) 0.03
    0.03370899 = product of:
      0.06741798 = sum of:
        0.06741798 = product of:
          0.13483596 = sum of:
            0.13483596 = weight(_text_:e.g in 194) [ClassicSimilarity], result of:
              0.13483596 = score(doc=194,freq=8.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.57638514 = fieldWeight in 194, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=194)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n 5 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
  2. Egghe, L.: Type/Token-Taken informetrics (2003) 0.03
    0.029192839 = product of:
      0.058385678 = sum of:
        0.058385678 = product of:
          0.116771355 = sum of:
            0.116771355 = weight(_text_:e.g in 1608) [ClassicSimilarity], result of:
              0.116771355 = score(doc=1608,freq=6.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.49916416 = fieldWeight in 1608, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1608)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals producing articles, authors producing papers, etc.). In linguistics a source is also called a type (e.g., a word), and an item a token (e.g., the use of words in texts). In informetrics, types that occur often, for example, in a database will also be requested often, for example, in information retrieval. The relative use of these occurrences will be higher than their relative occurrences itself; hence, the name Type/ Token-Taken informetrics. This article studies the frequency distribution of Type/Token-Taken informetrics, starting from the one of Type/Token informetrics (i.e., source-item relationships). We are also studying the average number my* of item uses in Type/Token-Taken informetrics and compare this with the classical average number my in Type/Token informetrics. We show that my* >= my always, and that my* is an increasing function of my. A method is presented to actually calculate my* from my, and a given a, which is the exponent in Lotka's frequency distribution of Type/Token informetrics. We leave open the problem of developing non-Lotkaian Type/TokenTaken informetrics.
  3. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.03
    0.028603025 = product of:
      0.05720605 = sum of:
        0.05720605 = product of:
          0.1144121 = sum of:
            0.1144121 = weight(_text_:e.g in 3464) [ClassicSimilarity], result of:
              0.1144121 = score(doc=3464,freq=4.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.489079 = fieldWeight in 3464, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3, ... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
  4. Egghe, L.: On the relation between the association strength and other similarity measures (2010) 0.03
    0.026967188 = product of:
      0.053934377 = sum of:
        0.053934377 = product of:
          0.10786875 = sum of:
            0.10786875 = weight(_text_:e.g in 3598) [ClassicSimilarity], result of:
              0.10786875 = score(doc=3598,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.4611081 = fieldWeight in 3598, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3598)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A graph in van Eck and Waltman [JASIST, 60(8), 2009, p. 1644], representing the relation between the association strength and the cosine, is partially explained as a sheaf of parabolas, each parabola being the functional relation between these similarity measures on the trajectories x*y=a, a constant. Based on earlier obtained relations between cosine and other similarity measures (e.g., Jaccard index), we can prove new relations between the association strength and these other measures.
  5. Egghe, L.: Untangling Herdan's law and Heaps' law : mathematical and informetric arguments (2007) 0.02
    0.023835853 = product of:
      0.047671705 = sum of:
        0.047671705 = product of:
          0.09534341 = sum of:
            0.09534341 = weight(_text_:e.g in 271) [ClassicSimilarity], result of:
              0.09534341 = score(doc=271,freq=4.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.40756583 = fieldWeight in 271, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=271)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
  6. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.02
    0.020225393 = product of:
      0.040450785 = sum of:
        0.040450785 = product of:
          0.08090157 = sum of:
            0.08090157 = weight(_text_:e.g in 2708) [ClassicSimilarity], result of:
              0.08090157 = score(doc=2708,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.34583107 = fieldWeight in 2708, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2708)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the trajectories of the form [X]=a[Y], where a > 0 is a constant and [·] denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
  7. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.02
    0.0151886875 = product of:
      0.030377375 = sum of:
        0.030377375 = product of:
          0.06075475 = sum of:
            0.06075475 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.06075475 = score(doc=4992,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 2.2012 12:53:22
  8. Egghe, L.: Empirical and combinatorial study of country occurrences in multi-authored papers (2006) 0.01
    0.013483594 = product of:
      0.026967188 = sum of:
        0.026967188 = product of:
          0.053934377 = sum of:
            0.053934377 = weight(_text_:e.g in 81) [ClassicSimilarity], result of:
              0.053934377 = score(doc=81,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23055404 = fieldWeight in 81, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.03125 = fieldNorm(doc=81)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a m-authored paper, j =1, ..., m-1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a non-decreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m-1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m-1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.
  9. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.01
    0.009113212 = product of:
      0.018226424 = sum of:
        0.018226424 = product of:
          0.03645285 = sum of:
            0.03645285 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.03645285 = score(doc=7659,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  10. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.01
    0.009113212 = product of:
      0.018226424 = sum of:
        0.018226424 = product of:
          0.03645285 = sum of:
            0.03645285 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
              0.03645285 = score(doc=2558,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23214069 = fieldWeight in 2558, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2558)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 8.2004 19:17:22