Search (8 results, page 1 of 1)

  • × author_ss:"Egghe, L."
  1. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.02
    0.018418994 = sum of:
      0.0053165695 = product of:
        0.047849126 = sum of:
          0.047849126 = weight(_text_:p in 2558) [ClassicSimilarity], result of:
            0.047849126 = score(doc=2558,freq=6.0), product of:
              0.115903415 = queryWeight, product of:
                3.5955126 = idf(docFreq=3298, maxDocs=44218)
                0.032235574 = queryNorm
              0.4128362 = fieldWeight in 2558, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.5955126 = idf(docFreq=3298, maxDocs=44218)
                0.046875 = fieldNorm(doc=2558)
        0.11111111 = coord(1/9)
      0.013102425 = product of:
        0.02620485 = sum of:
          0.02620485 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
            0.02620485 = score(doc=2558,freq=2.0), product of:
              0.112883486 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.032235574 = queryNorm
              0.23214069 = fieldWeight in 2558, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2558)
        0.5 = coord(1/2)
    
    Abstract
    The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.
    Date
    14. 8.2004 19:17:22
  2. Egghe, L.: Untangling Herdan's law and Heaps' law : mathematical and informetric arguments (2007) 0.02
    0.018279383 = product of:
      0.036558766 = sum of:
        0.036558766 = product of:
          0.07311753 = sum of:
            0.07311753 = weight(_text_:t in 271) [ClassicSimilarity], result of:
              0.07311753 = score(doc=271,freq=14.0), product of:
                0.1269891 = queryWeight, product of:
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.032235574 = queryNorm
                0.575778 = fieldWeight in 271, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=271)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
  3. Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.01
    0.013817915 = product of:
      0.02763583 = sum of:
        0.02763583 = product of:
          0.05527166 = sum of:
            0.05527166 = weight(_text_:t in 2009) [ClassicSimilarity], result of:
              0.05527166 = score(doc=2009,freq=8.0), product of:
                0.1269891 = queryWeight, product of:
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.032235574 = queryNorm
                0.43524727 = fieldWeight in 2009, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2009)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
  4. Egghe, L.; Rousseau, R.: ¬The Hirsch index of a shifted Lotka function and its relation with the impact factor (2012) 0.01
    0.013679037 = product of:
      0.027358074 = sum of:
        0.027358074 = product of:
          0.054716147 = sum of:
            0.054716147 = weight(_text_:t in 243) [ClassicSimilarity], result of:
              0.054716147 = score(doc=243,freq=4.0), product of:
                0.1269891 = queryWeight, product of:
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.032235574 = queryNorm
                0.4308728 = fieldWeight in 243, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9394085 = idf(docFreq=2338, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=243)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Based on earlier results about the shifted Lotka function, we prove an implicit functional relation between the Hirsch index (h-index) and the total number of sources (T). It is shown that the corresponding function, h(T), is concavely increasing. Next, we construct an implicit relation between the h-index and the impact factor IF (an average number of items per source). The corresponding function h(IF) is increasing and we show that if the parameter C in the numerator of the shifted Lotka function is high, then the relation between the h-index and the impact factor is almost linear.
  5. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01
    0.010918688 = product of:
      0.021837376 = sum of:
        0.021837376 = product of:
          0.043674752 = sum of:
            0.043674752 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.043674752 = score(doc=4992,freq=2.0), product of:
                0.112883486 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032235574 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 2.2012 12:53:22
  6. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.01
    0.0065512126 = product of:
      0.013102425 = sum of:
        0.013102425 = product of:
          0.02620485 = sum of:
            0.02620485 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.02620485 = score(doc=7659,freq=2.0), product of:
                0.112883486 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032235574 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  7. Egghe, L.: Existence theorem of the quadruple (P, R, F, M) : precision, recall, fallout and miss (2007) 0.00
    0.0034318303 = product of:
      0.0068636606 = sum of:
        0.0068636606 = product of:
          0.061772946 = sum of:
            0.061772946 = weight(_text_:p in 2011) [ClassicSimilarity], result of:
              0.061772946 = score(doc=2011,freq=10.0), product of:
                0.115903415 = queryWeight, product of:
                  3.5955126 = idf(docFreq=3298, maxDocs=44218)
                  0.032235574 = queryNorm
                0.5329692 = fieldWeight in 2011, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.5955126 = idf(docFreq=3298, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2011)
          0.11111111 = coord(1/9)
      0.5 = coord(1/2)
    
    Abstract
    In an earlier paper [Egghe, L. (2004). A universal method of information retrieval evaluation: the "missing" link M and the universal IR surface. Information Processing and Management, 40, 21-30] we showed that, given an IR system, and if P denotes precision, R recall, F fallout and M miss (re-introduced in the paper mentioned above), we have the following relationship between P, R, F and M: P/(1-P)*(1-R)/R*F/(1-F)*(1-M)/M = 1. In this paper we prove the (more difficult) converse: given any four rational numbers in the interval ]0, 1[ satisfying the above equation, then there exists an IR system such that these four numbers (in any order) are the precision, recall, fallout and miss of this IR system. As a consequence we show that any three rational numbers in ]0, 1[ represent any three measures taken from precision, recall, fallout and miss of a certain IR system. We also show that this result is also true for two numbers instead of three.
  8. Egghe, L.: On the relation between the association strength and other similarity measures (2010) 0.00
    0.0020463483 = product of:
      0.0040926966 = sum of:
        0.0040926966 = product of:
          0.03683427 = sum of:
            0.03683427 = weight(_text_:p in 3598) [ClassicSimilarity], result of:
              0.03683427 = score(doc=3598,freq=2.0), product of:
                0.115903415 = queryWeight, product of:
                  3.5955126 = idf(docFreq=3298, maxDocs=44218)
                  0.032235574 = queryNorm
                0.31780142 = fieldWeight in 3598, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5955126 = idf(docFreq=3298, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3598)
          0.11111111 = coord(1/9)
      0.5 = coord(1/2)
    
    Abstract
    A graph in van Eck and Waltman [JASIST, 60(8), 2009, p. 1644], representing the relation between the association strength and the cosine, is partially explained as a sheaf of parabolas, each parabola being the functional relation between these similarity measures on the trajectories x*y=a, a constant. Based on earlier obtained relations between cosine and other similarity measures (e.g., Jaccard index), we can prove new relations between the association strength and these other measures.