Search (1 results, page 1 of 1)

  • × author_ss:"Harter, S.P."
  • × theme_ss:"Retrievalstudien"
  1. Harter, S.P.: Variations in relevance assessments and the measurement of retrieval effectiveness (1996) 0.01
    0.01137503 = product of:
      0.02275006 = sum of:
        0.02275006 = product of:
          0.04550012 = sum of:
            0.04550012 = weight(_text_:i in 3004) [ClassicSimilarity], result of:
              0.04550012 = score(doc=3004,freq=4.0), product of:
                0.15441231 = queryWeight, product of:
                  3.7717297 = idf(docFreq=2765, maxDocs=44218)
                  0.04093939 = queryNorm
                0.29466638 = fieldWeight in 3004, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.7717297 = idf(docFreq=2765, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3004)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The purpose of this article is to bring attention to the problem of variations in relevance assessments and the effects that these may have on measures of retrieval effectiveness. Through an analytical review of the literature, I show that despite known wide variations in relevance assessments in experimental test collections, their effects on the measurement of retrieval performance are almost completely unstudied. I will further argue that what we know about tha many variables that have been found to affect relevance assessments under experimental conditions, as well as our new understanding of psychological, situational, user-based relevance, point to a single conclusion. We can no longer rest the evaluation of information retrieval systems on the assumption that such variations do not significantly affect the measurement of information retrieval performance. A series of thourough, rigorous, and extensive tests is needed, of precisely how, and under what conditions, variations in relevance assessments do, and do not, affect measures of retrieval performance. We need to develop approaches to evaluation that are sensitive to these variations and to human factors and individual differences more generally. Our approaches to evaluation must reflect the real world of real users