Search (2 results, page 1 of 1)

  • × author_ss:"Robertson, S."
  • × theme_ss:"Retrievalalgorithmen"
  1. Robertson, S.: Understanding inverse document frequency : on theoretical arguments for IDF (2004) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 4421) [ClassicSimilarity], result of:
              0.008202582 = score(doc=4421,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 4421, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4421)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The term-weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon's Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval.
    Type
    a
  2. Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 2129) [ClassicSimilarity], result of:
              0.008118451 = score(doc=2129,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 2129, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2129)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
    Type
    a