Search (6 results, page 1 of 1)

Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.03
```
0.03197872 = product of:
  0.09593615 = sum of:
    0.09593615 = sum of:
      0.054807637 = weight(_text_:database in 2558) [ClassicSimilarity], result of:
        0.054807637 = score(doc=2558,freq=2.0), product of:
          0.20452234 = queryWeight, product of:
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.050593734 = queryNorm
          0.26797873 = fieldWeight in 2558, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.042444 = idf(docFreq=2109, maxDocs=44218)
            0.046875 = fieldNorm(doc=2558)
      0.041128512 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
        0.041128512 = score(doc=2558,freq=2.0), product of:
          0.17717063 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050593734 = queryNorm
          0.23214069 = fieldWeight in 2558, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2558)
  0.33333334 = coord(1/3)
```
Abstract

The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.

Date

14. 8.2004 19:17:22

Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01

0.011424588 = product of:
  0.034273762 = sum of:
    0.034273762 = product of:
      0.068547525 = sum of:
        0.068547525 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
          0.068547525 = score(doc=4992,freq=2.0), product of:
            0.17717063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050593734 = queryNorm
            0.38690117 = fieldWeight in 4992, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4992)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 14. 2.2012 12:53:22

Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.01
```
0.010765236 = product of:
  0.032295708 = sum of:
    0.032295708 = product of:
      0.064591415 = sum of:
        0.064591415 = weight(_text_:database in 2009) [ClassicSimilarity], result of:
          0.064591415 = score(doc=2009,freq=4.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.31581596 = fieldWeight in 2009, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2009)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
Egghe, L.: Type/Token-Taken informetrics (2003) 0.01
```
0.0076121716 = product of:
  0.022836514 = sum of:
    0.022836514 = product of:
      0.045673028 = sum of:
        0.045673028 = weight(_text_:database in 1608) [ClassicSimilarity], result of:
          0.045673028 = score(doc=1608,freq=2.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.2233156 = fieldWeight in 1608, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1608)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals producing articles, authors producing papers, etc.). In linguistics a source is also called a type (e.g., a word), and an item a token (e.g., the use of words in texts). In informetrics, types that occur often, for example, in a database will also be requested often, for example, in information retrieval. The relative use of these occurrences will be higher than their relative occurrences itself; hence, the name Type/ Token-Taken informetrics. This article studies the frequency distribution of Type/Token-Taken informetrics, starting from the one of Type/Token informetrics (i.e., source-item relationships). We are also studying the average number my* of item uses in Type/Token-Taken informetrics and compare this with the classical average number my in Type/Token informetrics. We show that my* >= my always, and that my* is an increasing function of my. A method is presented to actually calculate my* from my, and a given a, which is the exponent in Lotka's frequency distribution of Type/Token informetrics. We leave open the problem of developing non-Lotkaian Type/TokenTaken informetrics.

Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.01

0.006854752 = product of:
  0.020564256 = sum of:
    0.020564256 = product of:
      0.041128512 = sum of:
        0.041128512 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
          0.041128512 = score(doc=7659,freq=2.0), product of:
            0.17717063 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050593734 = queryNorm
            0.23214069 = fieldWeight in 7659, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7659)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Journal of information science. 22(1996) no.3, S.165-170

Egghe, L.: Empirical and combinatorial study of country occurrences in multi-authored papers (2006) 0.01
```
0.006089737 = product of:
  0.018269211 = sum of:
    0.018269211 = product of:
      0.036538422 = sum of:
        0.036538422 = weight(_text_:database in 81) [ClassicSimilarity], result of:
          0.036538422 = score(doc=81,freq=2.0), product of:
            0.20452234 = queryWeight, product of:
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.050593734 = queryNorm
            0.17865248 = fieldWeight in 81, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.042444 = idf(docFreq=2109, maxDocs=44218)
              0.03125 = fieldNorm(doc=81)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a m-authored paper, j =1, ..., m-1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a non-decreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m-1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m-1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.

Search (6 results, page 1 of 1)

Authors

Years