Search (3 results, page 1 of 1)

  • × author_ss:"Kim, M.H."
  1. Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.01
    0.009721047 = product of:
      0.03888419 = sum of:
        0.03888419 = product of:
          0.07776838 = sum of:
            0.07776838 = weight(_text_:model in 6690) [ClassicSimilarity], result of:
              0.07776838 = score(doc=6690,freq=8.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.42484146 = fieldWeight in 6690, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6690)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented
  2. Lee, J.H.; Kim, M.H.; Lee, Y.J.: Information retrieval based on conceptual distance in is-a hierarchies (1993) 0.01
    0.009623346 = product of:
      0.038493384 = sum of:
        0.038493384 = product of:
          0.07698677 = sum of:
            0.07698677 = weight(_text_:model in 6729) [ClassicSimilarity], result of:
              0.07698677 = score(doc=6729,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.4205716 = fieldWeight in 6729, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6729)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    There have been several document ranking methods to calculate the conceptual distance or closeness between a Boolean query and a document. Though they provide good retrieval effectiveness in many cases, they do not support effective weighting schemes for queries and documents and also have several problems resulting from inappropriate evaluation of Boolean operators. We propose a new method called Knowledge-Based Extended Boolean Model (KB-EBM) in which Salton's extended Boolean model is incorporated. KB-EBM evaluates weighted queries and documents effectively, and avoids the problems of the previous methods. KB-EBM provides high quality document rankings by using term dependence information from is-a hierarchies. The performance experiments show that the proposed method closely simulates human behaviour
  3. Lee, J.H.; Kim, M.H.: Ranking documents in thesaurus-based Boolean retrieval systems (1994) 0.01
    0.007776838 = product of:
      0.031107351 = sum of:
        0.031107351 = product of:
          0.062214702 = sum of:
            0.062214702 = weight(_text_:model in 6732) [ClassicSimilarity], result of:
              0.062214702 = score(doc=6732,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.33987316 = fieldWeight in 6732, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6732)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Investigates document ranking methods in thesaurus-based Boolean retrieval systems and proposes a new thesaurus-based ranking algorithm, the Extended Relevance (E-Relevance) algorithm. The E-Relevance algorithm integrates the extended Boolean model and the thesaurus-based relevance algorithm. Since the E-Relevance algorithm has all the desirable properties of previous thesauri-based ranking algorithms. It also ranks documents effectively by uisng terms dependence information from the thesaurus. Through performance comparison shows that the proposed algorithm achieved higher retrieval effectiveness than the others proposed earlier