Search (1 results, page 1 of 1)

Chowdhury, A.; Mccabe, M.C.: Improving information retrieval systems using part of speech tagging (1993) 0.09

0.09150024 = product of:
  0.18300048 = sum of:
    0.119364664 = weight(_text_:storage in 1061) [ClassicSimilarity], result of:
      0.119364664 = score(doc=1061,freq=4.0), product of:
        0.23366846 = queryWeight, product of:
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.04288404 = queryNorm
        0.51082915 = fieldWeight in 1061, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.046875 = fieldNorm(doc=1061)
    0.036786914 = weight(_text_:retrieval in 1061) [ClassicSimilarity], result of:
      0.036786914 = score(doc=1061,freq=4.0), product of:
        0.12972058 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04288404 = queryNorm
        0.2835858 = fieldWeight in 1061, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1061)
    0.026848892 = weight(_text_:systems in 1061) [ClassicSimilarity], result of:
      0.026848892 = score(doc=1061,freq=2.0), product of:
        0.13179013 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.04288404 = queryNorm
        0.2037246 = fieldWeight in 1061, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=1061)
  0.5 = coord(3/6)

Abstract: The object of Information Retrieval is to retrieve all relevant documents for a user query and only those relevant documents. Much research has focused on achieving this objective with little regard for storage overhead or performance. In the paper we evaluate the use of Part of Speech Tagging to improve, the index storage overhead and general speed of the system with only a minimal reduction to precision recall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and 1989 document collection provided by TREC for parts of speech. We then experimented to find the most relevant part of speech to index. We show that 90% of precision recall is achieved with 40% of the document collections terms. We also show that this is a improvement in overhead with only a 1% reduction in precision recall.