Search (83 results, page 1 of 5)

Voorhees, E.M.; Harman, D.: Overview of the Sixth Text REtrieval Conference (TREC-6) (2000) 0.01

0.0127447145 = product of:
  0.0446065 = sum of:
    0.010253297 = product of:
      0.051266484 = sum of:
        0.051266484 = weight(_text_:retrieval in 6438) [ClassicSimilarity], result of:
          0.051266484 = score(doc=6438,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.46789268 = fieldWeight in 6438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=6438)
      0.2 = coord(1/5)
    0.0343532 = product of:
      0.0687064 = sum of:
        0.0687064 = weight(_text_:22 in 6438) [ClassicSimilarity], result of:
          0.0687064 = score(doc=6438,freq=2.0), product of:
            0.12684377 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03622214 = queryNorm
            0.5416616 = fieldWeight in 6438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6438)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 11. 8.2001 16:22:19

Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval (2008) 0.01

0.012560169 = product of:
  0.04396059 = sum of:
    0.03169159 = product of:
      0.07922897 = sum of:
        0.04484883 = weight(_text_:retrieval in 2026) [ClassicSimilarity], result of:
          0.04484883 = score(doc=2026,freq=12.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.40932083 = fieldWeight in 2026, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2026)
        0.034380134 = weight(_text_:system in 2026) [ClassicSimilarity], result of:
          0.034380134 = score(doc=2026,freq=6.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.30135927 = fieldWeight in 2026, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2026)
      0.4 = coord(2/5)
    0.0122690005 = product of:
      0.024538001 = sum of:
        0.024538001 = weight(_text_:22 in 2026) [ClassicSimilarity], result of:
          0.024538001 = score(doc=2026,freq=2.0), product of:
            0.12684377 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03622214 = queryNorm
            0.19345059 = fieldWeight in 2026, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2026)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: This paper discusses the role of user-centred evaluations as an essential method for researching interactive information retrieval. It draws mainly on the work carried out during the Clarity Project where different user-centred evaluations were run during the lifecycle of a cross-language information retrieval system. The iterative testing was not only instrumental to the development of a usable system, but it enhanced our knowledge of the potential, impact, and actual use of cross-language information retrieval technology. Indeed the role of the user evaluation was dual: by testing a specific prototype it was possible to gain a micro-view and assess the effectiveness of each component of the complex system; by cumulating the result of all the evaluations (in total 43 people were involved) it was possible to build a macro-view of how cross-language retrieval would impact on users and their tasks. By showing the richness of results that can be acquired, this paper aims at stimulating researchers into considering user-centred evaluations as a flexible, adaptable and comprehensive technique for investigating non-traditional information access systems.
Footnote: Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems
Source: Information processing and management. 44(2008) no.1, S.22-38

King, D.W.: Blazing new trails : in celebration of an audacious career (2000) 0.01

0.010898593 = product of:
  0.038145073 = sum of:
    0.025876073 = product of:
      0.06469018 = sum of:
        0.03661892 = weight(_text_:retrieval in 1184) [ClassicSimilarity], result of:
          0.03661892 = score(doc=1184,freq=8.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.33420905 = fieldWeight in 1184, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
        0.028071264 = weight(_text_:system in 1184) [ClassicSimilarity], result of:
          0.028071264 = score(doc=1184,freq=4.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.24605882 = fieldWeight in 1184, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
      0.4 = coord(2/5)
    0.0122690005 = product of:
      0.024538001 = sum of:
        0.024538001 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
          0.024538001 = score(doc=1184,freq=2.0), product of:
            0.12684377 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03622214 = queryNorm
            0.19345059 = fieldWeight in 1184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: I had the distinct pleasure of working with Pauline Atherton (Cochrane) during the 1960s, a period that can be considered the heyday of automated information system design and evaluation in the United States. I first met Pauline at the 1962 American Documentation Institute annual meeting in North Hollywood, Florida. My company, Westat Research Analysts, had recently been awarded a contract by the U.S. Patent Office to provide statistical support for the design of experiments with automated information retrieval systems. I was asked to attend the meeting to learn more about information retrieval systems and to begin informing others of U.S. Patent Office activities in this area. At one session, Pauline and I questioned a speaker about the research that he presented. Pauline's questions concerned the logic of their approach and mine, the statistical aspects. After the session, she came over to talk to me and we began a professional and personal friendship that continues to this day. During the 1960s, Pauline was involved in several important information-retrieval projects including a series of studies for the American Institute of Physics, a dissertation examining the relevance of retrieved documents, and development and evaluation of an online information-retrieval system. I had the opportunity to work with Pauline and her colleagues an four of those projects and will briefly describe her work in the 1960s.
Date: 22. 9.1997 19:16:05

Buckley, C.; Voorhees, E.M.: Retrieval system evaluation (2005) 0.01

0.0073188585 = product of:
  0.051232006 = sum of:
    0.051232006 = product of:
      0.12808001 = sum of:
        0.07250175 = weight(_text_:retrieval in 648) [ClassicSimilarity], result of:
          0.07250175 = score(doc=648,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.6617001 = fieldWeight in 648, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.109375 = fieldNorm(doc=648)
        0.055578265 = weight(_text_:system in 648) [ClassicSimilarity], result of:
          0.055578265 = score(doc=648,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.4871716 = fieldWeight in 648, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.109375 = fieldNorm(doc=648)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Source: TREC: experiment and evaluation in information retrieval. Ed.: E.M. Voorhees, u. D.K. Harman

Voorhees, E.M.: On test collections for adaptive information retrieval (2008) 0.01

0.0052466425 = product of:
  0.036726497 = sum of:
    0.036726497 = product of:
      0.09181624 = sum of:
        0.058130726 = weight(_text_:retrieval in 2444) [ClassicSimilarity], result of:
          0.058130726 = score(doc=2444,freq=14.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.5305404 = fieldWeight in 2444, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2444)
        0.033685513 = weight(_text_:system in 2444) [ClassicSimilarity], result of:
          0.033685513 = score(doc=2444,freq=4.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.29527056 = fieldWeight in 2444, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2444)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: Traditional Cranfield test collections represent an abstraction of a retrieval task that Sparck Jones calls the "core competency" of retrieval: a task that is necessary, but not sufficient, for user retrieval tasks. The abstraction facilitates research by controlling for (some) sources of variability, thus increasing the power of experiments that compare system effectiveness while reducing their cost. However, even within the highly-abstracted case of the Cranfield paradigm, meta-analysis demonstrates that the user/topic effect is greater than the system effect, so experiments must include a relatively large number of topics to distinguish systems' effectiveness. The evidence further suggests that changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments. Defining a new, feasible abstraction for supporting adaptive IR research will require winnowing the list of all possible factors that can affect retrieval behavior to a minimum number of essential factors.
Footnote: Beitrag in einem Themenheft "Adaptive information retrieval"

Larsen, B.; Ingwersen, P.; Lund, B.: Data fusion according to the principle of polyrepresentation (2009) 0.00
```
0.0046759406 = product of:
  0.01636579 = sum of:
    0.006550591 = product of:
      0.032752953 = sum of:
        0.032752953 = weight(_text_:retrieval in 2752) [ClassicSimilarity], result of:
          0.032752953 = score(doc=2752,freq=10.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.29892567 = fieldWeight in 2752, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=2752)
      0.2 = coord(1/5)
    0.0098152 = product of:
      0.0196304 = sum of:
        0.0196304 = weight(_text_:22 in 2752) [ClassicSimilarity], result of:
          0.0196304 = score(doc=2752,freq=2.0), product of:
            0.12684377 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03622214 = queryNorm
            0.15476047 = fieldWeight in 2752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2752)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

We report data fusion experiments carried out on the four best-performing retrieval models from TREC 5. Three were conceptually/algorithmically very different from one another; one was algorithmically similar to one of the former. The objective of the test was to observe the performance of the 11 logical data fusion combinations compared to the performance of the four individual models and their intermediate fusions when following the principle of polyrepresentation. This principle is based on cognitive IR perspective (Ingwersen & Järvelin, 2005) and implies that each retrieval model is regarded as a representation of a unique interpretation of information retrieval (IR). It predicts that only fusions of very different, but equally good, IR models may outperform each constituent as well as their intermediate fusions. Two kinds of experiments were carried out. One tested restricted fusions, which entails that only the inner disjoint overlap documents between fused models are ranked. The second set of experiments was based on traditional data fusion methods. The experiments involved the 30 TREC 5 topics that contain more than 44 relevant documents. In all tests, the Borda and CombSUM scoring methods were used. Performance was measured by precision and recall, with document cutoff values (DCVs) at 100 and 15 documents, respectively. Results show that restricted fusions made of two, three, or four cognitively/algorithmically very different retrieval models perform significantly better than do the individual models at DCV100. At DCV15, however, the results of polyrepresentative fusion were less predictable. The traditional fusion method based on polyrepresentation principles demonstrates a clear picture of performance at both DCV levels and verifies the polyrepresentation predictions for data fusion in IR. Data fusion improves retrieval performance over their constituent IR models only if the models all are quite conceptually/algorithmically dissimilar and equally and well performing, in that order of importance.

Date

22. 3.2009 18:48:28
Keenan, S.; Smeaton, A.F.; Keogh, G.: ¬The effect of pool depth on system evaluation in TREC (2001) 0.00
```
0.0043610106 = product of:
  0.030527074 = sum of:
    0.030527074 = product of:
      0.07631768 = sum of:
        0.03661892 = weight(_text_:retrieval in 5908) [ClassicSimilarity], result of:
          0.03661892 = score(doc=5908,freq=8.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.33420905 = fieldWeight in 5908, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5908)
        0.03969876 = weight(_text_:system in 5908) [ClassicSimilarity], result of:
          0.03969876 = score(doc=5908,freq=8.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.3479797 = fieldWeight in 5908, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5908)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

The TREC benchmarking exercise for information retrieval (IR) experiments has provided a forum and an opportunity for IR researchers to evaluate the performance of their approaches to the IR task and has resulted in improvements in IR effectiveness. Typically, retrieval performance has been measured in terms of precision and recall, and comparisons between different IR approaches have been based on these measures. These measures are in turn dependent on the so-called "pool depth" used to discover relevant documents. Whereas there is evidence to suggest that the pool depth size used for TREC evaluations adequately identifies the relevant documents in the entire test data collection, we consider how it affects the evaluations of individual systems. The data used comes from the Sixth TREC conference, TREC-6. By fitting appropriate regression models we explore whether different pool depths confer advantages or disadvantages on different retrieval systems when they are compared. As a consequence of this model fitting, a pair of measures for each retrieval run, which are related to precision and recall, emerge. For each system, these give an extrapolation for the number of relevant documents the system would have been deemed to have retrieved if an indefinitely large pool size had been used, and also a measure of the sensitivity of each system to pool size. We concur that even on the basis of analyses of individual systems, the pool depth of 100 used by TREC is adequate

Amitay, E.; Carmel, D.; Lempel, R.; Soffer, A.: Scaling IR-system evaluation using Term Relevance Sets (2004) 0.00

0.0043610106 = product of:
  0.030527074 = sum of:
    0.030527074 = product of:
      0.07631768 = sum of:
        0.03661892 = weight(_text_:retrieval in 4118) [ClassicSimilarity], result of:
          0.03661892 = score(doc=4118,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.33420905 = fieldWeight in 4118, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=4118)
        0.03969876 = weight(_text_:system in 4118) [ClassicSimilarity], result of:
          0.03969876 = score(doc=4118,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.3479797 = fieldWeight in 4118, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.078125 = fieldNorm(doc=4118)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.00
```
0.0038721121 = product of:
  0.027104784 = sum of:
    0.027104784 = product of:
      0.06776196 = sum of:
        0.0439427 = weight(_text_:retrieval in 2342) [ClassicSimilarity], result of:
          0.0439427 = score(doc=2342,freq=8.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.40105087 = fieldWeight in 2342, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
        0.023819257 = weight(_text_:system in 2342) [ClassicSimilarity], result of:
          0.023819257 = score(doc=2342,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.20878783 = fieldWeight in 2342, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.

Baillie, M.; Azzopardi, L.; Ruthven, I.: Evaluating epistemic uncertainty under incomplete assessments (2008) 0.00

0.0037004396 = product of:
  0.025903076 = sum of:
    0.025903076 = product of:
      0.06475769 = sum of:
        0.03107218 = weight(_text_:retrieval in 2065) [ClassicSimilarity], result of:
          0.03107218 = score(doc=2065,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.2835858 = fieldWeight in 2065, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2065)
        0.033685513 = weight(_text_:system in 2065) [ClassicSimilarity], result of:
          0.033685513 = score(doc=2065,freq=4.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.29527056 = fieldWeight in 2065, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2065)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison.

TREC: experiment and evaluation in information retrieval (2005) 0.00
```
0.0036619888 = product of:
  0.02563392 = sum of:
    0.02563392 = product of:
      0.0640848 = sum of:
        0.054160107 = weight(_text_:retrieval in 636) [ClassicSimilarity], result of:
          0.054160107 = score(doc=636,freq=70.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.49430186 = fieldWeight in 636, product of:
              8.3666 = tf(freq=70.0), with freq of:
                70.0 = termFreq=70.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.01953125 = fieldNorm(doc=636)
        0.00992469 = weight(_text_:system in 636) [ClassicSimilarity], result of:
          0.00992469 = score(doc=636,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.08699492 = fieldWeight in 636, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.01953125 = fieldNorm(doc=636)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

The Text REtrieval Conference (TREC), a yearly workshop hosted by the US government's National Institute of Standards and Technology, provides the infrastructure necessary for large-scale evaluation of text retrieval methodologies. With the goal of accelerating research in this area, TREC created the first large test collections of full-text documents and standardized retrieval evaluation. The impact has been significant; since TREC's beginning in 1992, retrieval effectiveness has approximately doubled. TREC has built a variety of large test collections, including collections for such specialized retrieval tasks as cross-language retrieval and retrieval of speech. Moreover, TREC has accelerated the transfer of research ideas into commercial systems, as demonstrated in the number of retrieval techniques developed in TREC that are now used in Web search engines. This book provides a comprehensive review of TREC research, summarizing the variety of TREC results, documenting the best practices in experimental information retrieval, and suggesting areas for further research. The first part of the book describes TREC's history, test collections, and retrieval methodology. Next, the book provides "track" reports -- describing the evaluations of specific tasks, including routing and filtering, interactive retrieval, and retrieving noisy text. The final part of the book offers perspectives on TREC from such participants as Microsoft Research, University of Massachusetts, Cornell University, University of Waterloo, City University of New York, and IBM. The book will be of interest to researchers in information retrieval and related technologies, including natural language processing.

Content

Enthält die Beiträge: 1. The Text REtrieval Conference - Ellen M. Voorhees and Donna K. Harman 2. The TREC Test Collections - Donna K. Harman 3. Retrieval System Evaluation - Chris Buckley and Ellen M. Voorhees 4. The TREC Ad Hoc Experiments - Donna K. Harman 5. Routing and Filtering - Stephen Robertson and Jamie Callan 6. The TREC Interactive Tracks: Putting the User into Search - Susan T. Dumais and Nicholas J. Belkin 7. Beyond English - Donna K. Harman 8. Retrieving Noisy Text - Ellen M. Voorhees and John S. Garofolo 9.The Very Large Collection and Web Tracks - David Hawking and Nick Craswell 10. Question Answering in TREC - Ellen M. Voorhees 11. The University of Massachusetts and a Dozen TRECs - James Allan, W. Bruce Croft and Jamie Callan 12. How Okapi Came to TREC - Stephen Robertson 13. The SMART Project at TREC - Chris Buckley 14. Ten Years of Ad Hoc Retrieval at TREC Using PIRCS - Kui-Lam Kwok 15. MultiText Experiments for TREC - Gordon V. Cormack, Charles L. A. Clarke, Christopher R. Palmer and Thomas R. Lynam 16. A Language-Modeling Approach to TREC - Djoerd Hiemstra and Wessel Kraaij 17. BM Research Activities at TREC - Eric W. Brown, David Carmel, Martin Franz, Abraham Ittycheriah, Tapas Kanungo, Yoelle Maarek, J. Scott McCarley, Robert L. Mack, John M. Prager, John R. Smith, Aya Soffer, Jason Y. Zien and Alan D. Marwick Epilogue: Metareflections on TREC - Karen Sparck Jones

Footnote

Rez. in: JASIST 58(2007) no.6, S.910-911 (J.L. Vicedo u. J. Gomez): "The Text REtrieval Conference (TREC) is a yearly workshop hosted by the U.S. government's National Institute of Standards and Technology (NIST) that fosters and supports research in information retrieval as well as speeding the transfer of technology between research labs and industry. Since 1992, TREC has provided the infrastructure necessary for large-scale evaluations of different text retrieval methodologies. TREC impact has been very important and its success has been mainly supported by its continuous adaptation to the emerging information retrieval needs. Not in vain, TREC has built evaluation benchmarks for more than 20 different retrieval problems such as Web retrieval, speech retrieval, or question-answering. The large and intense trajectory of annual TREC conferences has resulted in an immense bulk of documents reflecting the different eval uation and research efforts developed. This situation makes it difficult sometimes to observe clearly how research in information retrieval (IR) has evolved over the course of TREC. TREC: Experiment and Evaluation in Information Retrieval succeeds in organizing and condensing all this research into a manageable volume that describes TREC history and summarizes the main lessons learned. The book is organized into three parts. The first part is devoted to the description of TREC's origin and history, the test collections, and the evaluation methodology developed. The second part describes a selection of the major evaluation exercises (tracks), and the third part contains contributions from research groups that had a large and remarkable participation in TREC. Finally, Karen Spark Jones, one of the main promoters of research in IR, closes the book with an epilogue that analyzes the impact of TREC on this research field.
... TREC: Experiment and Evaluation in Information Retrieval is a reliable and comprehensive review of the TREC program and has been adopted by NIST as the official history of TREC (see http://trec.nist.gov). We were favorably surprised by the book. Well structured and written, chapters are self-contained and the existence of references to specialized and more detailed publications is continuous, which makes it easier to expand into the different aspects analyzed in the text. This book succeeds in compiling TREC evolution from its inception in 1992 to 2003 in an adequate and manageable volume. Thanks to the impressive effort performed by the authors and their experience in the field, it can satiate the interests of a great variety of readers. While expert researchers in the IR field and IR-related industrial companies can use it as a reference manual, it seems especially useful for students and non-expert readers willing to approach this research area. Like NIST, we would recommend this reading to anyone who may be interested in textual information retrieval."

LCSH

Information storage and retrieval systems / Congresses
Text REtrieval Conference

RSWK

Information Retrieval / Textverarbeitung / Aufsatzsammlung (BVB)
Kongress / Information Retrieval / Kongress (GBV)

Subject

Information Retrieval / Textverarbeitung / Aufsatzsammlung (BVB)
Kongress / Information Retrieval / Kongress (GBV)
Information storage and retrieval systems / Congresses
Text REtrieval Conference

Morse, E.; Lewis, M.; Olsen, K.A.: Testing visual information retrieval methodologies case study : comparative analysis of textual, icon, graphical, and "spring" displays (2002) 0.00

0.0036594293 = product of:
  0.025616003 = sum of:
    0.025616003 = product of:
      0.064040005 = sum of:
        0.036250874 = weight(_text_:retrieval in 191) [ClassicSimilarity], result of:
          0.036250874 = score(doc=191,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.33085006 = fieldWeight in 191, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=191)
        0.027789133 = weight(_text_:system in 191) [ClassicSimilarity], result of:
          0.027789133 = score(doc=191,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.2435858 = fieldWeight in 191, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=191)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: Although many different visual information retrieval systems have been proposed, few have been tested, and where testing has been performed, results were often inconclusive. Further, there is very little evidence of benchmarking systems against a common standard. An approach for testing novel interfaces is proposed that uses bottom-up, stepwise testing to allow evaluation of a visualization, itself, rather than restricting evaluation to the system instantiating it. This approach not only makes it easier to control variables, but the tests are also easier to perform. The methodology will be presented through a case study, where a new visualization technique is compared to more traditional ways of presenting data

Ahlgren, P.; Grönqvist, L.: Evaluation of retrieval effectiveness with incomplete relevance data : theoretical and experimental comparison of three measures (2008) 0.00

0.0036594293 = product of:
  0.025616003 = sum of:
    0.025616003 = product of:
      0.064040005 = sum of:
        0.036250874 = weight(_text_:retrieval in 2032) [ClassicSimilarity], result of:
          0.036250874 = score(doc=2032,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.33085006 = fieldWeight in 2032, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2032)
        0.027789133 = weight(_text_:system in 2032) [ClassicSimilarity], result of:
          0.027789133 = score(doc=2032,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.2435858 = fieldWeight in 2032, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2032)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.

Blandford, A.; Adams, A.; Attfield, S.; Buchanan, G.; Gow, J.; Makri, S.; Rimmer, J.; Warwick, C.: ¬The PRET A Rapporter framework : evaluating digital libraries from the perspective of information work (2008) 0.00
```
0.003613001 = product of:
  0.025291005 = sum of:
    0.025291005 = product of:
      0.06322751 = sum of:
        0.02197135 = weight(_text_:retrieval in 2021) [ClassicSimilarity], result of:
          0.02197135 = score(doc=2021,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.20052543 = fieldWeight in 2021, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2021)
        0.041256163 = weight(_text_:system in 2021) [ClassicSimilarity], result of:
          0.041256163 = score(doc=2021,freq=6.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.36163113 = fieldWeight in 2021, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2021)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

The strongest tradition of IR systems evaluation has focused on system effectiveness; more recently, there has been a growing interest in evaluation of Interactive IR systems, balancing system and user-oriented evaluation criteria. In this paper we shift the focus to considering how IR systems, and particularly digital libraries, can be evaluated to assess (and improve) their fit with users' broader work activities. Taking this focus, we answer a different set of evaluation questions that reveal more about the design of interfaces, user-system interactions and how systems may be deployed in the information working context. The planning and conduct of such evaluation studies share some features with the established methods for conducting IR evaluation studies, but come with a shift in emphasis; for example, a greater range of ethical considerations may be pertinent. We present the PRET A Rapporter framework for structuring user-centred evaluation studies and illustrate its application to three evaluation studies of digital library systems.

Footnote

Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems

López-Ostenero, F.; Peinado, V.; Gonzalo, J.; Verdejo, F.: Interactive question answering : Is Cross-Language harder than monolingual searching? (2008) 0.00

0.0035357005 = product of:
  0.024749903 = sum of:
    0.024749903 = product of:
      0.061874755 = sum of:
        0.0380555 = weight(_text_:retrieval in 2023) [ClassicSimilarity], result of:
          0.0380555 = score(doc=2023,freq=6.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.34732026 = fieldWeight in 2023, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2023)
        0.023819257 = weight(_text_:system in 2023) [ClassicSimilarity], result of:
          0.023819257 = score(doc=2023,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.20878783 = fieldWeight in 2023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2023)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)

Abstract: Is Cross-Language answer finding harder than Monolingual answer finding for users? In this paper we provide initial quantitative and qualitative evidence to answer this question. In our study, which involves 16 users searching questions under four different system conditions, we find that interactive cross-language answer finding is not substantially harder (in terms of accuracy) than its monolingual counterpart, using general purpose Machine Translation systems and standard Information Retrieval machinery, although it takes more time. We have also seen that users need more context to provide accurate answers (full documents) than what is usually considered by systems (paragraphs or passages). Finally, we also discuss the limitations of standard evaluation methodologies for interactive Information Retrieval experiments in the case of cross-language question answering.
Footnote: Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems

Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 0.00
```
0.0034184116 = product of:
  0.02392888 = sum of:
    0.02392888 = product of:
      0.0598222 = sum of:
        0.043942697 = weight(_text_:retrieval in 2031) [ClassicSimilarity], result of:
          0.043942697 = score(doc=2031,freq=18.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.40105084 = fieldWeight in 2031, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=2031)
        0.015879504 = weight(_text_:system in 2031) [ClassicSimilarity], result of:
          0.015879504 = score(doc=2031,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.13919188 = fieldWeight in 2031, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03125 = fieldNorm(doc=2031)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312-318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283-289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11-21; Yu, C., & Salton, G. (1976). Precision weighting - an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76-88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.
Robins, D.: Shifts of focus on various aspects of user information problems during interactive information retrieval (2000) 0.00
```
0.003136654 = product of:
  0.021956576 = sum of:
    0.021956576 = product of:
      0.054891437 = sum of:
        0.03107218 = weight(_text_:retrieval in 4995) [ClassicSimilarity], result of:
          0.03107218 = score(doc=4995,freq=4.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.2835858 = fieldWeight in 4995, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4995)
        0.023819257 = weight(_text_:system in 4995) [ClassicSimilarity], result of:
          0.023819257 = score(doc=4995,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.20878783 = fieldWeight in 4995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=4995)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

The author presents the results of additional analyses of shifts of focus in IR interaction. Results indicate that users and search intermediaries work toward search goals in nonlinear fashion. Twenty interactions between 20 different users and one of four different search intermediaries were examined. Analysis of discourse between the two parties during interactive information retrieval (IR) shows changes in topic occurs, on average, every seven utterances. These twenty interactions included some 9,858 utterances and 1,439 foci. Utterances are defined as any uninterrupted sound, statement, gesture, etc., made by a participant in the discourse dyad. These utterances are segmented by the researcher according to their intentional focus, i.e., the topic on which the conversation between the user and search intermediary focus until the focus changes (i.e., shifts of focus). In all but two of the 20 interactions, the search intermediary initiated a majority of shifts of focus. Six focus categories were observed. These were foci dealing with: documents; evaluation of search results; search strategies; IR system; topic of the search; and information about the user
Jansen, B.J.; McNeese, M.D.: Evaluating the Effectiveness of and Patterns of Interactions With Automated Searching Assistance (2005) 0.00
```
0.002946417 = product of:
  0.020624919 = sum of:
    0.020624919 = product of:
      0.051562294 = sum of:
        0.031712912 = weight(_text_:retrieval in 4815) [ClassicSimilarity], result of:
          0.031712912 = score(doc=4815,freq=6.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.28943354 = fieldWeight in 4815, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4815)
        0.01984938 = weight(_text_:system in 4815) [ClassicSimilarity], result of:
          0.01984938 = score(doc=4815,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.17398985 = fieldWeight in 4815, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4815)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

We report quantitative and qualitative results of an empirical evaluation to determine whether automated assistance improves searching performance and when searchers desire system intervention in the search process. Forty participants interacted with two fully functional information retrieval systems in a counterbalanced, within-participant study. The systems were identical in all respects except that one offered automated assistance and the other did not. The study used a client-side automated assistance application, an approximately 500,000-document Text REtrieval Conference content collection, and six topics. Results indicate that automated assistance can improve searching performance. However, the improvement is less dramatic than one might expect, with an approximately 20% performance increase, as measured by the number of userselected relevant documents. Concerning patterns of interaction, we identified 1,879 occurrences of searchersystem interactions and classified them into 9 major categories and 27 subcategories or states. Results indicate that there are predictable patterns of times when searchers desire and implement searching assistance. The most common three-state pattern is Execute Query-View Results: With Scrolling-View Assistance. Searchers appear receptive to automated assistance; there is a 71% implementation rate. There does not seem to be a correlation between the use of assistance and previous searching performance. We discuss the implications for the design of information retrieval systems and future research directions.

Borlund, P.: Evaluation of interactive information retrieval systems (2000) 0.00

0.0025110114 = product of:
  0.01757708 = sum of:
    0.01757708 = product of:
      0.087885395 = sum of:
        0.087885395 = weight(_text_:retrieval in 2556) [ClassicSimilarity], result of:
          0.087885395 = score(doc=2556,freq=18.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.8021017 = fieldWeight in 2556, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=2556)
      0.2 = coord(1/5)
  0.14285715 = coord(1/7)

LCSH: Information storage and retrieval systems / Evaluation
RSWK: Information Retrieval / Datenbankverwaltung / Hochschulschrift (GBV)
Information Retrieval / Dialogsystem (SWB)
Information Retrieval / Dialogsystem / Leistungsbewertung (BVB)
Subject: Information Retrieval / Datenbankverwaltung / Hochschulschrift (GBV)
Information Retrieval / Dialogsystem (SWB)
Information Retrieval / Dialogsystem / Leistungsbewertung (BVB)
Information storage and retrieval systems / Evaluation

Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.00
```
0.0021805053 = product of:
  0.015263537 = sum of:
    0.015263537 = product of:
      0.03815884 = sum of:
        0.01830946 = weight(_text_:retrieval in 5225) [ClassicSimilarity], result of:
          0.01830946 = score(doc=5225,freq=2.0), product of:
            0.109568894 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03622214 = queryNorm
            0.16710453 = fieldWeight in 5225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5225)
        0.01984938 = weight(_text_:system in 5225) [ClassicSimilarity], result of:
          0.01984938 = score(doc=5225,freq=2.0), product of:
            0.11408355 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03622214 = queryNorm
            0.17398985 = fieldWeight in 5225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5225)
      0.4 = coord(2/5)
  0.14285715 = coord(1/7)
```
Abstract

Kekalainen and Jarvelin use what they term generalized, nonbinary recall and precision measures where recall is the sum of the relevance scores of the retrieved documents divided by the sum of relevance scores of all documents in the data base, and precision is the sum of the relevance scores of the retrieved documents divided by the number of documents where the relevance scores are real numbers between zero and one. Using the In-Query system and a text data base of 53,893 newspaper articles with 30 queries selected from those for which four relevance categories to provide recall measures were available, search results were evaluated by four judges. Searches were done by average key term weight, Boolean expression, and by average term weight where the terms are grouped by a synonym operator, and for each case with and without expansion of the original terms. Use of higher standards of relevance appears to increase the superiority of the best method. Some methods do a better job of getting the highly relevant documents but do not increase retrieval of marginal ones. There is evidence that generalized precision provides more equitable results, while binary precision provides undeserved merit to some methods. Generally graded relevance measures seem to provide additional insight into IR evaluation.

Search (83 results, page 1 of 5)

Authors

Types

Themes

Subjects

Classifications