Search (1 results, page 1 of 1)

Did you mean:
themes%3a%22Semantic web%22 1

Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.01
```
0.007663213 = product of:
  0.015326426 = sum of:
    0.015326426 = product of:
      0.030652853 = sum of:
        0.030652853 = weight(_text_:web in 6690) [ClassicSimilarity], result of:
          0.030652853 = score(doc=6690,freq=2.0), product of:
            0.17002425 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.052098576 = queryNorm
            0.18028519 = fieldWeight in 6690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6690)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented