Search (33 results, page 1 of 2)

King, D.W.: Blazing new trails : in celebration of an audacious career (2000) 0.04
```
0.036927395 = product of:
  0.07385479 = sum of:
    0.07385479 = sum of:
      0.038503684 = weight(_text_:systems in 1184) [ClassicSimilarity], result of:
        0.038503684 = score(doc=1184,freq=4.0), product of:
          0.16037072 = queryWeight, product of:
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.052184064 = queryNorm
          0.24009174 = fieldWeight in 1184, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1184)
      0.0353511 = weight(_text_:22 in 1184) [ClassicSimilarity], result of:
        0.0353511 = score(doc=1184,freq=2.0), product of:
          0.1827397 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052184064 = queryNorm
          0.19345059 = fieldWeight in 1184, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1184)
  0.5 = coord(1/2)
```
Abstract

I had the distinct pleasure of working with Pauline Atherton (Cochrane) during the 1960s, a period that can be considered the heyday of automated information system design and evaluation in the United States. I first met Pauline at the 1962 American Documentation Institute annual meeting in North Hollywood, Florida. My company, Westat Research Analysts, had recently been awarded a contract by the U.S. Patent Office to provide statistical support for the design of experiments with automated information retrieval systems. I was asked to attend the meeting to learn more about information retrieval systems and to begin informing others of U.S. Patent Office activities in this area. At one session, Pauline and I questioned a speaker about the research that he presented. Pauline's questions concerned the logic of their approach and mine, the statistical aspects. After the session, she came over to talk to me and we began a professional and personal friendship that continues to this day. During the 1960s, Pauline was involved in several important information-retrieval projects including a series of studies for the American Institute of Physics, a dissertation examining the relevance of retrieved documents, and development and evaluation of an online information-retrieval system. I had the opportunity to work with Pauline and her colleagues an four of those projects and will briefly describe her work in the 1960s.

Date

22. 9.1997 19:16:05
Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval (2008) 0.04
```
0.036927395 = product of:
  0.07385479 = sum of:
    0.07385479 = sum of:
      0.038503684 = weight(_text_:systems in 2026) [ClassicSimilarity], result of:
        0.038503684 = score(doc=2026,freq=4.0), product of:
          0.16037072 = queryWeight, product of:
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.052184064 = queryNorm
          0.24009174 = fieldWeight in 2026, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.0731742 = idf(docFreq=5561, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2026)
      0.0353511 = weight(_text_:22 in 2026) [ClassicSimilarity], result of:
        0.0353511 = score(doc=2026,freq=2.0), product of:
          0.1827397 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052184064 = queryNorm
          0.19345059 = fieldWeight in 2026, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2026)
  0.5 = coord(1/2)
```
Abstract

This paper discusses the role of user-centred evaluations as an essential method for researching interactive information retrieval. It draws mainly on the work carried out during the Clarity Project where different user-centred evaluations were run during the lifecycle of a cross-language information retrieval system. The iterative testing was not only instrumental to the development of a usable system, but it enhanced our knowledge of the potential, impact, and actual use of cross-language information retrieval technology. Indeed the role of the user evaluation was dual: by testing a specific prototype it was possible to gain a micro-view and assess the effectiveness of each component of the complex system; by cumulating the result of all the evaluations (in total 43 people were involved) it was possible to build a macro-view of how cross-language retrieval would impact on users and their tasks. By showing the richness of results that can be acquired, this paper aims at stimulating researchers into considering user-centred evaluations as a flexible, adaptable and comprehensive technique for investigating non-traditional information access systems.

Footnote

Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems

Source

Information processing and management. 44(2008) no.1, S.22-38

Voorhees, E.M.; Harman, D.: Overview of the Sixth Text REtrieval Conference (TREC-6) (2000) 0.02

0.024745772 = product of:
  0.049491543 = sum of:
    0.049491543 = product of:
      0.09898309 = sum of:
        0.09898309 = weight(_text_:22 in 6438) [ClassicSimilarity], result of:
          0.09898309 = score(doc=6438,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.5416616 = fieldWeight in 6438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6438)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 11. 8.2001 16:22:19

Borlund, P.: Evaluation of interactive information retrieval systems (2000) 0.02

0.024351869 = product of:
  0.048703738 = sum of:
    0.048703738 = product of:
      0.097407475 = sum of:
        0.097407475 = weight(_text_:systems in 2556) [ClassicSimilarity], result of:
          0.097407475 = score(doc=2556,freq=10.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.6073894 = fieldWeight in 2556, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0625 = fieldNorm(doc=2556)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

LCSH: Information storage and retrieval systems / Evaluation
Interactive computer systems / Evaluation
Subject: Information storage and retrieval systems / Evaluation
Interactive computer systems / Evaluation

Blandford, A.; Adams, A.; Attfield, S.; Buchanan, G.; Gow, J.; Makri, S.; Rimmer, J.; Warwick, C.: ¬The PRET A Rapporter framework : evaluating digital libraries from the perspective of information work (2008) 0.02
```
0.020007102 = product of:
  0.040014204 = sum of:
    0.040014204 = product of:
      0.08002841 = sum of:
        0.08002841 = weight(_text_:systems in 2021) [ClassicSimilarity], result of:
          0.08002841 = score(doc=2021,freq=12.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.4990213 = fieldWeight in 2021, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2021)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The strongest tradition of IR systems evaluation has focused on system effectiveness; more recently, there has been a growing interest in evaluation of Interactive IR systems, balancing system and user-oriented evaluation criteria. In this paper we shift the focus to considering how IR systems, and particularly digital libraries, can be evaluated to assess (and improve) their fit with users' broader work activities. Taking this focus, we answer a different set of evaluation questions that reveal more about the design of interfaces, user-system interactions and how systems may be deployed in the information working context. The planning and conduct of such evaluation studies share some features with the established methods for conducting IR evaluation studies, but come with a shift in emphasis; for example, a greater range of ethical considerations may be pertinent. We present the PRET A Rapporter framework for structuring user-centred evaluation studies and illustrate its application to three evaluation studies of digital library systems.

Footnote

Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems
Newby, G.B.: Cognitive space and information space (2001) 0.02
```
0.01633573 = product of:
  0.03267146 = sum of:
    0.03267146 = product of:
      0.06534292 = sum of:
        0.06534292 = weight(_text_:systems in 6977) [ClassicSimilarity], result of:
          0.06534292 = score(doc=6977,freq=8.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.4074492 = fieldWeight in 6977, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=6977)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This article works towards realization of exosomatic memory for information systems. In exosomatic memory systems, the information spaces of systems will be consistent with the cognitive spaces of their human users. A method for measuring concept relations in human cognitive space is presented: the paired comparison survey with Principal Components Analysis. A study to measure the cognitive spaces of 16 research participants is presented. Items measured include relations among seven TREC topic statements as well as 17 concepts from the topic statements. A method for automatically generating information spaces from document collections is presented that uses term cooccurrence, eigensystems analysis, and Principal Components Analysis. The extent of similarity between the cognitive spaces and the information spaces, which were derived independently from each other, is measured. A strong similarity between the information spaces and the cognitive spaces are found, indicating that the methods described may have good utility for working towards information systems that operate as exosomatic memories

Borlund, P.: ¬The IIR evaluation model : a framework for evaluation of interactive information retrieval systems (2003) 0.02

0.01633573 = product of:
  0.03267146 = sum of:
    0.03267146 = product of:
      0.06534292 = sum of:
        0.06534292 = weight(_text_:systems in 922) [ClassicSimilarity], result of:
          0.06534292 = score(doc=922,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.4074492 = fieldWeight in 922, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.09375 = fieldNorm(doc=922)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Borlund, P.: Experimental components for the evaluation of interactive information retrieval systems (2000) 0.02
```
0.015219918 = product of:
  0.030439837 = sum of:
    0.030439837 = product of:
      0.060879674 = sum of:
        0.060879674 = weight(_text_:systems in 4549) [ClassicSimilarity], result of:
          0.060879674 = score(doc=4549,freq=10.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.37961838 = fieldWeight in 4549, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4549)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This paper presents a set of basic components which constitutes the experimental setting intended for the evaluation of interactive information retrieval (IIR) systems, the aim of which is to facilitate evaluation of IIR systems in a way which is as close as possible to realistic IR processes. The experimental settings consists of 3 components: (1) the involvement of potential users as test persons; (2) the application of dynamic and individual information needs; and (3) the use of multidimensionsal and dynamic relevance judgements. Hidden under the information need component is the essential central sub-component, the simulated work task situation, the tool that triggers the (simulated) dynamic information need. This paper also reports on the empirical findings of the meta-evaluation of the application of this sub-component, the purpose of which is to discover whether the application of simulated work task situations to future evaluation of IIR systems can be recommended. Investigations are carried out to dertermine whether any search behavioural differences exist between test persons' treatment of their own real information needs versus simulated information needs. The hypothesis is that if no difference exist one can correctly substitute real information needs with simulated information needs through the application of simulated work task situations. The empirical results of the meta-evaluation provide positive evidence for the application of simulated work task situations to the evaluation of IIR systems. The results also indicate that tailoring work task situations to the group of test persons is important in motivating them. Furthermore, the results of the evaluation show that different versions of semantic openness of the simulated situations make no difference to the test persons' search treatment
López-Ostenero, F.; Peinado, V.; Gonzalo, J.; Verdejo, F.: Interactive question answering : Is Cross-Language harder than monolingual searching? (2008) 0.01
```
0.014147157 = product of:
  0.028294314 = sum of:
    0.028294314 = product of:
      0.056588627 = sum of:
        0.056588627 = weight(_text_:systems in 2023) [ClassicSimilarity], result of:
          0.056588627 = score(doc=2023,freq=6.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.35286134 = fieldWeight in 2023, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2023)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Is Cross-Language answer finding harder than Monolingual answer finding for users? In this paper we provide initial quantitative and qualitative evidence to answer this question. In our study, which involves 16 users searching questions under four different system conditions, we find that interactive cross-language answer finding is not substantially harder (in terms of accuracy) than its monolingual counterpart, using general purpose Machine Translation systems and standard Information Retrieval machinery, although it takes more time. We have also seen that users need more context to provide accurate answers (full documents) than what is usually considered by systems (paragraphs or passages). Finally, we also discuss the limitations of standard evaluation methodologies for interactive Information Retrieval experiments in the case of cross-language question answering.

Footnote

Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems

Dresel, R.; Hörnig, D.; Kaluza, H.; Peter, A.; Roßmann, A.; Sieber, W.: Evaluation deutscher Web-Suchwerkzeuge : Ein vergleichender Retrievaltest (2001) 0.01

0.014140441 = product of:
  0.028280882 = sum of:
    0.028280882 = product of:
      0.056561764 = sum of:
        0.056561764 = weight(_text_:22 in 261) [ClassicSimilarity], result of:
          0.056561764 = score(doc=261,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.30952093 = fieldWeight in 261, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=261)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Die deutschen Suchmaschinen, Abacho, Acoon, Fireball und Lycos sowie die Web-Kataloge Web.de und Yahoo! werden einem Qualitätstest nach relativem Recall, Precision und Availability unterzogen. Die Methoden der Retrievaltests werden vorgestellt. Im Durchschnitt werden bei einem Cut-Off-Wert von 25 ein Recall von rund 22%, eine Precision von knapp 19% und eine Verfügbarkeit von 24% erreicht

¬The Eleventh Text Retrieval Conference, TREC 2002 (2003) 0.01

0.014140441 = product of:
  0.028280882 = sum of:
    0.028280882 = product of:
      0.056561764 = sum of:
        0.056561764 = weight(_text_:22 in 4049) [ClassicSimilarity], result of:
          0.056561764 = score(doc=4049,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.30952093 = fieldWeight in 4049, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4049)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Proceedings of the llth TREC-conference held in Gaithersburg, Maryland (USA), November 19-22, 2002. Aim of the conference was discussion an retrieval and related information-seeking tasks for large test collection. 93 research groups used different techniques, for information retrieval from the same large database. This procedure makes it possible to compare the results. The tasks are: Cross-language searching, filtering, interactive searching, searching for novelty, question answering, searching for video shots, and Web searching.

Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 0.01
```
0.013613109 = product of:
  0.027226217 = sum of:
    0.027226217 = product of:
      0.054452434 = sum of:
        0.054452434 = weight(_text_:systems in 3456) [ClassicSimilarity], result of:
          0.054452434 = score(doc=3456,freq=8.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.339541 = fieldWeight in 3456, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3456)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research an this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained an this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems an one of these subsets only; systems that have been tested an different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested an these different subsets.
Saracevic, T.: Effects of inconsistent relevance judgments on information retrieval test results : a historical perspective (2008) 0.01
```
0.013613109 = product of:
  0.027226217 = sum of:
    0.027226217 = product of:
      0.054452434 = sum of:
        0.054452434 = weight(_text_:systems in 5585) [ClassicSimilarity], result of:
          0.054452434 = score(doc=5585,freq=8.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.339541 = fieldWeight in 5585, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5585)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The main objective of information retrieval (IR) systems is to retrieve information or information objects relevant to user requests and possible needs. In IR tests, retrieval effectiveness is established by comparing IR systems retrievals (systems relevance) with users' or user surrogates' assessments (user relevance), where user relevance is treated as the gold standard for performance evaluation. Relevance is a human notion, and establishing relevance by humans is fraught with a number of problems-inconsistency in judgment being one of them. The aim of this critical review is to explore the relationship between relevance on the one hand and testing of IR systems and procedures on the other. Critics of IR tests raised the issue of validity of the IR tests because they were based on relevance judgments that are inconsistent. This review traces and synthesizes experimental studies dealing with (1) inconsistency of relevance judgments by people, (2) effects of such inconsistency on results of IR tests and (3) reasons for retrieval failures. A historical context for these studies and for IR testing is provided including an assessment of Lancaster's (1969) evaluation of MEDLARS and its unique place in the history of IR evaluation.
Morse, E.; Lewis, M.; Olsen, K.A.: Testing visual information retrieval methodologies case study : comparative analysis of textual, icon, graphical, and "spring" displays (2002) 0.01
```
0.013476291 = product of:
  0.026952581 = sum of:
    0.026952581 = product of:
      0.053905163 = sum of:
        0.053905163 = weight(_text_:systems in 191) [ClassicSimilarity], result of:
          0.053905163 = score(doc=191,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.33612844 = fieldWeight in 191, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=191)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Although many different visual information retrieval systems have been proposed, few have been tested, and where testing has been performed, results were often inconclusive. Further, there is very little evidence of benchmarking systems against a common standard. An approach for testing novel interfaces is proposed that uses bottom-up, stepwise testing to allow evaluation of a visualization, itself, rather than restricting evaluation to the system instantiating it. This approach not only makes it easier to control variables, but the tests are also easier to perform. The methodology will be presented through a case study, where a new visualization technique is compared to more traditional ways of presenting data
Della Mea, V.; Mizzaro, S.: Measuring retrieval effectiveness : a new proposal and a first experimental validation (2004) 0.01
```
0.013476291 = product of:
  0.026952581 = sum of:
    0.026952581 = product of:
      0.053905163 = sum of:
        0.053905163 = weight(_text_:systems in 2263) [ClassicSimilarity], result of:
          0.053905163 = score(doc=2263,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.33612844 = fieldWeight in 2263, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2263)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Most common effectiveness measures for information retrieval systems are based an the assumptions of binary relevance (either a document is relevant to a given query or it is not) and binary retrieval (either a document is retrieved or it is not). In this article, these assumptions are questioned, and a new measure named ADM (average distance measure) is proposed, discussed from a conceptual point of view, and experimentally validated an Text Retrieval Conference (TREC) data. Both conceptual analysis and experimental evidence demonstrate ADM's adequacy in measuring the effectiveness of information retrieval systems. Some potential problems about precision and recall are also highlighted and discussed.
Keenan, S.; Smeaton, A.F.; Keogh, G.: ¬The effect of pool depth on system evaluation in TREC (2001) 0.01
```
0.011789299 = product of:
  0.023578597 = sum of:
    0.023578597 = product of:
      0.047157194 = sum of:
        0.047157194 = weight(_text_:systems in 5908) [ClassicSimilarity], result of:
          0.047157194 = score(doc=5908,freq=6.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.29405114 = fieldWeight in 5908, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5908)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The TREC benchmarking exercise for information retrieval (IR) experiments has provided a forum and an opportunity for IR researchers to evaluate the performance of their approaches to the IR task and has resulted in improvements in IR effectiveness. Typically, retrieval performance has been measured in terms of precision and recall, and comparisons between different IR approaches have been based on these measures. These measures are in turn dependent on the so-called "pool depth" used to discover relevant documents. Whereas there is evidence to suggest that the pool depth size used for TREC evaluations adequately identifies the relevant documents in the entire test data collection, we consider how it affects the evaluations of individual systems. The data used comes from the Sixth TREC conference, TREC-6. By fitting appropriate regression models we explore whether different pool depths confer advantages or disadvantages on different retrieval systems when they are compared. As a consequence of this model fitting, a pair of measures for each retrieval run, which are related to precision and recall, emerge. For each system, these give an extrapolation for the number of relevant documents the system would have been deemed to have retrieved if an indefinitely large pool size had been used, and also a measure of the sensitivity of each system to pool size. We concur that even on the basis of analyses of individual systems, the pool depth of 100 used by TREC is adequate
Jansen, B.J.; McNeese, M.D.: Evaluating the Effectiveness of and Patterns of Interactions With Automated Searching Assistance (2005) 0.01
```
0.011789299 = product of:
  0.023578597 = sum of:
    0.023578597 = product of:
      0.047157194 = sum of:
        0.047157194 = weight(_text_:systems in 4815) [ClassicSimilarity], result of:
          0.047157194 = score(doc=4815,freq=6.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.29405114 = fieldWeight in 4815, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4815)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We report quantitative and qualitative results of an empirical evaluation to determine whether automated assistance improves searching performance and when searchers desire system intervention in the search process. Forty participants interacted with two fully functional information retrieval systems in a counterbalanced, within-participant study. The systems were identical in all respects except that one offered automated assistance and the other did not. The study used a client-side automated assistance application, an approximately 500,000-document Text REtrieval Conference content collection, and six topics. Results indicate that automated assistance can improve searching performance. However, the improvement is less dramatic than one might expect, with an approximately 20% performance increase, as measured by the number of userselected relevant documents. Concerning patterns of interaction, we identified 1,879 occurrences of searchersystem interactions and classified them into 9 major categories and 27 subcategories or states. Results indicate that there are predictable patterns of times when searchers desire and implement searching assistance. The most common three-state pattern is Execute Query-View Results: With Scrolling-View Assistance. Searchers appear receptive to automated assistance; there is a 71% implementation rate. There does not seem to be a correlation between the use of assistance and previous searching performance. We discuss the implications for the design of information retrieval systems and future research directions.
Baillie, M.; Azzopardi, L.; Ruthven, I.: Evaluating epistemic uncertainty under incomplete assessments (2008) 0.01
```
0.011551105 = product of:
  0.02310221 = sum of:
    0.02310221 = product of:
      0.04620442 = sum of:
        0.04620442 = weight(_text_:systems in 2065) [ClassicSimilarity], result of:
          0.04620442 = score(doc=2065,freq=4.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.28811008 = fieldWeight in 2065, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=2065)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison.

Carterette, B.: Test collections (2009) 0.01

0.010890487 = product of:
  0.021780973 = sum of:
    0.021780973 = product of:
      0.043561947 = sum of:
        0.043561947 = weight(_text_:systems in 3891) [ClassicSimilarity], result of:
          0.043561947 = score(doc=3891,freq=2.0), product of:
            0.16037072 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.052184064 = queryNorm
            0.2716328 = fieldWeight in 3891, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0625 = fieldNorm(doc=3891)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Research and development of search engines and other information retrieval (IR) systems proceeds by a cycle of design, implementation, and experimentation, with the results of each experiment influencing design decisions in the next iteration of the cycle. Batch experiments on test collections help ensure that this process goes as smoothly and as quickly as possible. A test collection comprises a collection of documents, a set of information needs, and judgments of the relevance of documents to those needs.

Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.01

0.010605331 = product of:
  0.021210661 = sum of:
    0.021210661 = product of:
      0.042421322 = sum of:
        0.042421322 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
          0.042421322 = score(doc=2552,freq=2.0), product of:
            0.1827397 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052184064 = queryNorm
            0.23214069 = fieldWeight in 2552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2552)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 9. 2.1997 18:44:22

Search (33 results, page 1 of 2)

Authors

Languages

Types

Themes

Subjects

Classifications