Search (14 results, page 1 of 1)

Ravana, S.D.; Taheri, M.S.; Rajagopal, P.: Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems (2015) 0.02
```
0.022882156 = product of:
  0.034323234 = sum of:
    0.019173026 = product of:
      0.038346052 = sum of:
        0.038346052 = weight(_text_:t in 2587) [ClassicSimilarity], result of:
          0.038346052 = score(doc=2587,freq=2.0), product of:
            0.17620352 = queryWeight, product of:
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.04472842 = queryNorm
            0.21762364 = fieldWeight in 2587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2587)
      0.5 = coord(1/2)
    0.015150209 = product of:
      0.030300418 = sum of:
        0.030300418 = weight(_text_:22 in 2587) [ClassicSimilarity], result of:
          0.030300418 = score(doc=2587,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.19345059 = fieldWeight in 2587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2587)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document's weight, which play the role of the mean average precision (MAP) score of the systems as a significance test's statics. The experiments were conducted using the TREC 9 Web track collection. Findings The p-values generated through the two types of significance tests, namely the Student's t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.

Date

20. 1.2015 18:30:22
Kutlu, M.; Elsayed, T.; Lease, M.: Intelligent topic selection for low-cost information retrieval evaluation : a new perspective on deep vs. shallow judging (2018) 0.02
```
0.019599257 = product of:
  0.029398885 = sum of:
    0.01533842 = product of:
      0.03067684 = sum of:
        0.03067684 = weight(_text_:t in 5092) [ClassicSimilarity], result of:
          0.03067684 = score(doc=5092,freq=2.0), product of:
            0.17620352 = queryWeight, product of:
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.04472842 = queryNorm
            0.17409891 = fieldWeight in 5092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.03125 = fieldNorm(doc=5092)
      0.5 = coord(1/2)
    0.014060466 = product of:
      0.028120931 = sum of:
        0.028120931 = weight(_text_:i in 5092) [ClassicSimilarity], result of:
          0.028120931 = score(doc=5092,freq=2.0), product of:
            0.16870351 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04472842 = queryNorm
            0.16668847 = fieldWeight in 5092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.03125 = fieldNorm(doc=5092)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today's massive document collections (e.g., ClueWeb12's 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.

Järvelin, K.: Evaluation (2011) 0.02

0.016403876 = product of:
  0.04921163 = sum of:
    0.04921163 = product of:
      0.09842326 = sum of:
        0.09842326 = weight(_text_:i in 548) [ClassicSimilarity], result of:
          0.09842326 = score(doc=548,freq=2.0), product of:
            0.16870351 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04472842 = queryNorm
            0.58340967 = fieldWeight in 548, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.109375 = fieldNorm(doc=548)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Interactive information seeking, behaviour and retrieval. Eds.: Ruthven, I. u. D. Kelly

Becks, D.; Mandl, T.; Womser-Hacker, C.: Spezielle Anforderungen bei der Evaluierung von Patent-Retrieval-Systemen (2010) 0.01

0.012653551 = product of:
  0.037960652 = sum of:
    0.037960652 = product of:
      0.075921305 = sum of:
        0.075921305 = weight(_text_:t in 4667) [ClassicSimilarity], result of:
          0.075921305 = score(doc=4667,freq=4.0), product of:
            0.17620352 = queryWeight, product of:
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.04472842 = queryNorm
            0.4308728 = fieldWeight in 4667, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4667)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information und Wissen: global, sozial und frei? Proceedings des 12. Internationalen Symposiums für Informationswissenschaft (ISI 2011) ; Hildesheim, 9. - 11. März 2011. Hrsg.: J. Griesbaum, T. Mandl u. C. Womser-Hacker

Mandl, T.: Evaluierung im Information Retrieval : die Hildesheimer Antwort auf aktuelle Herausforderungen der globalisierten Informationsgesellschaft (2010) 0.01

0.010225614 = product of:
  0.03067684 = sum of:
    0.03067684 = product of:
      0.06135368 = sum of:
        0.06135368 = weight(_text_:t in 4011) [ClassicSimilarity], result of:
          0.06135368 = score(doc=4011,freq=2.0), product of:
            0.17620352 = queryWeight, product of:
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.04472842 = queryNorm
            0.34819782 = fieldWeight in 4011, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.0625 = fieldNorm(doc=4011)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Tamine, L.; Chouquet, C.; Palmer, T.: Analysis of biomedical and health queries : lessons learned from TREC and CLEF evaluation benchmarks (2015) 0.01

0.006391009 = product of:
  0.019173026 = sum of:
    0.019173026 = product of:
      0.038346052 = sum of:
        0.038346052 = weight(_text_:t in 2341) [ClassicSimilarity], result of:
          0.038346052 = score(doc=2341,freq=2.0), product of:
            0.17620352 = queryWeight, product of:
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.04472842 = queryNorm
            0.21762364 = fieldWeight in 2341, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9394085 = idf(docFreq=2338, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2341)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Reichert, S.; Mayr, P.: Untersuchung von Relevanzeigenschaften in einem kontrollierten Eyetracking-Experiment (2012) 0.01

0.0060600834 = product of:
  0.01818025 = sum of:
    0.01818025 = product of:
      0.0363605 = sum of:
        0.0363605 = weight(_text_:22 in 328) [ClassicSimilarity], result of:
          0.0363605 = score(doc=328,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.23214069 = fieldWeight in 328, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=328)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 7.2012 19:25:54

Ruthven, I.: Relevance behaviour in TREC (2014) 0.01

0.005858527 = product of:
  0.01757558 = sum of:
    0.01757558 = product of:
      0.03515116 = sum of:
        0.03515116 = weight(_text_:i in 1785) [ClassicSimilarity], result of:
          0.03515116 = score(doc=1785,freq=2.0), product of:
            0.16870351 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04472842 = queryNorm
            0.20836058 = fieldWeight in 1785, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1785)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Sarigil, E.; Sengor Altingovde, I.; Blanco, R.; Barla Cambazoglu, B.; Ozcan, R.; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results (2018) 0.01

0.005858527 = product of:
  0.01757558 = sum of:
    0.01757558 = product of:
      0.03515116 = sum of:
        0.03515116 = weight(_text_:i in 4039) [ClassicSimilarity], result of:
          0.03515116 = score(doc=4039,freq=2.0), product of:
            0.16870351 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04472842 = queryNorm
            0.20836058 = fieldWeight in 4039, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4039)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Angelini, M.; Fazzini, V.; Ferro, N.; Santucci, G.; Silvello, G.: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation (2018) 0.01
```
0.005858527 = product of:
  0.01757558 = sum of:
    0.01757558 = product of:
      0.03515116 = sum of:
        0.03515116 = weight(_text_:i in 5049) [ClassicSimilarity], result of:
          0.03515116 = score(doc=5049,freq=2.0), product of:
            0.16870351 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04472842 = queryNorm
            0.20836058 = fieldWeight in 5049, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5049)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as "black boxes", since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together. We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together. The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.

Pal, S.; Mitra, M.; Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval (2011) 0.01

0.0050500697 = product of:
  0.015150209 = sum of:
    0.015150209 = product of:
      0.030300418 = sum of:
        0.030300418 = weight(_text_:22 in 4197) [ClassicSimilarity], result of:
          0.030300418 = score(doc=4197,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.19345059 = fieldWeight in 4197, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4197)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 1.2011 14:20:56

Chu, H.: Factors affecting relevance judgment : a report from TREC Legal track (2011) 0.01

0.0050500697 = product of:
  0.015150209 = sum of:
    0.015150209 = product of:
      0.030300418 = sum of:
        0.030300418 = weight(_text_:22 in 4540) [ClassicSimilarity], result of:
          0.030300418 = score(doc=4540,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.19345059 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4540)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 12. 7.2011 18:29:22

Wildemuth, B.; Freund, L.; Toms, E.G.: Untangling search task complexity and difficulty in the context of interactive information retrieval studies (2014) 0.01

0.0050500697 = product of:
  0.015150209 = sum of:
    0.015150209 = product of:
      0.030300418 = sum of:
        0.030300418 = weight(_text_:22 in 1786) [ClassicSimilarity], result of:
          0.030300418 = score(doc=1786,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.19345059 = fieldWeight in 1786, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1786)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 6. 4.2015 19:31:22

Rajagopal, P.; Ravana, S.D.; Koh, Y.S.; Balakrishnan, V.: Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment (2019) 0.01

0.0050500697 = product of:
  0.015150209 = sum of:
    0.015150209 = product of:
      0.030300418 = sum of:
        0.030300418 = weight(_text_:22 in 5287) [ClassicSimilarity], result of:
          0.030300418 = score(doc=5287,freq=2.0), product of:
            0.1566313 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04472842 = queryNorm
            0.19345059 = fieldWeight in 5287, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5287)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 20. 1.2015 18:30:22

Search (14 results, page 1 of 1)

Authors

Languages