Search (5 results, page 1 of 1)

Sparck Jones, K.: ¬A statistical interpretation of term specifity and its application in retrieval (1972) 0.00

0.0031332558 = product of:
  0.0062665115 = sum of:
    0.0062665115 = product of:
      0.012533023 = sum of:
        0.012533023 = weight(_text_:a in 5187) [ClassicSimilarity], result of:
          0.012533023 = score(doc=5187,freq=4.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.28826174 = fieldWeight in 5187, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=5187)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Sparck Jones, K.: IDF term weighting and IR research lessons (2004) 0.00

0.0019582848 = product of:
  0.0039165695 = sum of:
    0.0039165695 = product of:
      0.007833139 = sum of:
        0.007833139 = weight(_text_:a in 4422) [ClassicSimilarity], result of:
          0.007833139 = score(doc=4422,freq=4.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.18016359 = fieldWeight in 4422, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=4422)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and in operational practice.
Type: a

Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.00
```
0.0016788795 = product of:
  0.003357759 = sum of:
    0.003357759 = product of:
      0.006715518 = sum of:
        0.006715518 = weight(_text_:a in 4420) [ClassicSimilarity], result of:
          0.006715518 = score(doc=4420,freq=6.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.1544581 = fieldWeight in 4420, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4420)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Type

a

Sparck Jones, K.: Search term relevance weighting given little relevance information (1979) 0.00

0.0016616598 = product of:
  0.0033233196 = sum of:
    0.0033233196 = product of:
      0.006646639 = sum of:
        0.006646639 = weight(_text_:a in 1939) [ClassicSimilarity], result of:
          0.006646639 = score(doc=1939,freq=2.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.15287387 = fieldWeight in 1939, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=1939)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.00
```
0.0015481601 = product of:
  0.0030963202 = sum of:
    0.0030963202 = product of:
      0.0061926404 = sum of:
        0.0061926404 = weight(_text_:a in 4532) [ClassicSimilarity], result of:
          0.0061926404 = score(doc=4532,freq=10.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.14243183 = fieldWeight in 4532, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4532)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.

Search (5 results, page 1 of 1)

Years

Types