Search (9 results, page 1 of 1)

Smithson, S.: Information retrieval evaluation in practice : a case study approach (1994) 0.04

0.037579034 = product of:
  0.09394758 = sum of:
    0.07282796 = weight(_text_:study in 7302) [ClassicSimilarity], result of:
      0.07282796 = score(doc=7302,freq=8.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.502926 = fieldWeight in 7302, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7302)
    0.021119623 = product of:
      0.042239245 = sum of:
        0.042239245 = weight(_text_:22 in 7302) [ClassicSimilarity], result of:
          0.042239245 = score(doc=7302,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.2708308 = fieldWeight in 7302, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7302)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The evaluation of information retrieval systems is an important yet difficult operation. This paper describes an exploratory evaluation study that takes an interpretive approach to evaluation. The longitudinal study examines evaluation through the information-seeking behaviour of 22 case studies of 'real' users. The eclectic approach to data collection produced behavioral data that is compared with relevance judgements and satisfaction ratings. The study demonstrates considerable variations among the cases, among different evaluation measures within the same case, and among the same measures at different stages within a single case. It is argued that those involved in evaluation should be aware of the difficulties, and base any evaluation on a good understanding of the cases in question

Blair, D.C.: STAIRS Redux : thoughts on the STAIRS evaluation, ten years after (1996) 0.03

0.029046709 = product of:
  0.07261677 = sum of:
    0.051497146 = weight(_text_:study in 3002) [ClassicSimilarity], result of:
      0.051497146 = score(doc=3002,freq=4.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.3556224 = fieldWeight in 3002, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3002)
    0.021119623 = product of:
      0.042239245 = sum of:
        0.042239245 = weight(_text_:22 in 3002) [ClassicSimilarity], result of:
          0.042239245 = score(doc=3002,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.2708308 = fieldWeight in 3002, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3002)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The test of retrieval effectiveness performed on IBM's STAIRS and reported in 'Communications of the ACM' 10 years ago, continues to be cited frequently in the information retrieval literature. The reasons for the study's continuing pertinence to today's research are discussed, and the political, legal, and commercial aspects of the study are presented. In addition, the method of calculating recall that was used in the STAIRS study is discussed in some detail, especially how it reduces the 5 major types of uncertainty in recall estimations. It is also suggested that this method of recall estimation may serve as the basis for recall estimations that might be truly comparable between systems
Source: Journal of the American Society for Information Science. 47(1996) no.1, S.4-22

Chu, H.: Factors affecting relevance judgment : a report from TREC Legal track (2011) 0.03
```
0.026842168 = product of:
  0.06710542 = sum of:
    0.052019972 = weight(_text_:study in 4540) [ClassicSimilarity], result of:
      0.052019972 = score(doc=4540,freq=8.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.35923287 = fieldWeight in 4540, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4540)
    0.015085445 = product of:
      0.03017089 = sum of:
        0.03017089 = weight(_text_:22 in 4540) [ClassicSimilarity], result of:
          0.03017089 = score(doc=4540,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.19345059 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4540)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Purpose - This study intends to identify factors that affect relevance judgment of retrieved information as part of the 2007 TREC Legal track interactive task. Design/methodology/approach - Data were gathered and analyzed from the participants of the 2007 TREC Legal track interactive task using a questionnaire which includes not only a list of 80 relevance factors identified in prior research, but also a space for expressing their thoughts on relevance judgment in the process. Findings - This study finds that topicality remains a primary criterion, out of various options, for determining relevance, while specificity of the search request, task, or retrieved results also helps greatly in relevance judgment. Research limitations/implications - Relevance research should focus on the topicality and specificity of what is being evaluated as well as conducted in real environments. Practical implications - If multiple relevance factors are presented to assessors, the total number in a list should be below ten to take account of the limited processing capacity of human beings' short-term memory. Otherwise, the assessors might either completely ignore or inadequately consider some of the relevance factors when making judgment decisions. Originality/value - This study presents a method for reducing the artificiality of relevance research design, an apparent limitation in many related studies. Specifically, relevance judgment was made in this research as part of the 2007 TREC Legal track interactive task rather than a study devised for the sake of it. The assessors also served as searchers so that their searching experience would facilitate their subsequent relevance judgments.

Date

12. 7.2011 18:29:22
Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.02
```
0.024897177 = product of:
  0.06224294 = sum of:
    0.04414041 = weight(_text_:study in 2552) [ClassicSimilarity], result of:
      0.04414041 = score(doc=2552,freq=4.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.3048192 = fieldWeight in 2552, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.046875 = fieldNorm(doc=2552)
    0.018102532 = product of:
      0.036205065 = sum of:
        0.036205065 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
          0.036205065 = score(doc=2552,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.23214069 = fieldWeight in 2552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2552)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Reports results of a study to examine interindexer consistency (the degree to which indexers, when assigning terms to a chosen record, will choose the same terms to reflect that record) in the PsycINFO database using 60 records that were inadvertently processed twice between 1996 and 1998. Five aspects of interindexer consistency were analysed. Two methods were used to calculate interindexer consistency: one posited by Hooper (1965) and the other by Rollin (1981). Aspects analysed were: checktag consistency (66.24% using Hooper's calculation and 77.17% using Rollin's); major-to-all term consistency (49.31% and 62.59% respectively); overall indexing consistency (49.02% and 63.32%); classification code consistency (44.17% and 45.00%); and major-to-major term consistency (43.24% and 56.09%). The average consistency across all categories was 50.4% using Hooper's method and 60.83% using Rollin's. Although comparison with previous studies is difficult due to methodological variations in the overall study of indexing consistency and the specific characteristics of the database, results generally support previous findings when trends and similar studies are analysed.

Date

9. 2.1997 18:44:22

Blagden, J.F.: How much noise in a role-free and link-free co-ordinate indexing system? (1966) 0.02

0.02301344 = product of:
  0.0575336 = sum of:
    0.03641398 = weight(_text_:study in 2718) [ClassicSimilarity], result of:
      0.03641398 = score(doc=2718,freq=2.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.251463 = fieldWeight in 2718, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2718)
    0.021119623 = product of:
      0.042239245 = sum of:
        0.042239245 = weight(_text_:22 in 2718) [ClassicSimilarity], result of:
          0.042239245 = score(doc=2718,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.2708308 = fieldWeight in 2718, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2718)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A study of the number of irrelevant documents retrieved in a co-ordinate indexing system that does not employ eitherr roles or links. These tests were based on one hundred actual inquiries received in the library and therefore an evaluation of recall efficiency is not included. Over half the enquiries produced no noise, but the mean average percentage niose figure was approximately 33 per cent based on a total average retireval figure of eighteen documents per search. Details of the size of the indexed collection, methods of indexing, and an analysis of the reasons for the retrieval of irrelevant documents are discussed, thereby providing information officers who are thinking of installing such a system with some evidence on which to base a decision as to whether or not to utilize these devices
Source: Journal of documentation. 22(1966), S.203-209

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02

0.02301344 = product of:
  0.0575336 = sum of:
    0.03641398 = weight(_text_:study in 5001) [ClassicSimilarity], result of:
      0.03641398 = score(doc=5001,freq=2.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.251463 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.021119623 = product of:
      0.042239245 = sum of:
        0.042239245 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.042239245 = score(doc=5001,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Iivonen, M.: Consistency in the selection of search concepts and search terms (1995) 0.02

0.019725805 = product of:
  0.049314514 = sum of:
    0.031211983 = weight(_text_:study in 1757) [ClassicSimilarity], result of:
      0.031211983 = score(doc=1757,freq=2.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.21553972 = fieldWeight in 1757, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.046875 = fieldNorm(doc=1757)
    0.018102532 = product of:
      0.036205065 = sum of:
        0.036205065 = weight(_text_:22 in 1757) [ClassicSimilarity], result of:
          0.036205065 = score(doc=1757,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.23214069 = fieldWeight in 1757, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1757)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Considers intersearcher and intrasearcher consistency in the selection of search terms. Based on an empirical study where 22 searchers from 4 different types of search environments analyzed altogether 12 search requests of 4 different types in 2 separate test situations between which 2 months elapsed. Statistically very significant differences in consistency were found according to the types of search environments and search requests. Consistency was also considered according to the extent of the scope of search concept. At level I search terms were compared character by character. At level II different search terms were accepted as the same search concept with a rather simple evaluation of linguistic expressions. At level III, in addition to level II, the hierarchical approach of the search request was also controlled. At level IV different search terms were accepted as the same search concept with a broad interpretation of the search concept. Both intersearcher and intrasearcher consistency grew most immediately after a rather simple evaluation of linguistic impressions

Ravana, S.D.; Taheri, M.S.; Rajagopal, P.: Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems (2015) 0.02
```
0.016438173 = product of:
  0.041095432 = sum of:
    0.026009986 = weight(_text_:study in 2587) [ClassicSimilarity], result of:
      0.026009986 = score(doc=2587,freq=2.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.17961644 = fieldWeight in 2587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2587)
    0.015085445 = product of:
      0.03017089 = sum of:
        0.03017089 = weight(_text_:22 in 2587) [ClassicSimilarity], result of:
          0.03017089 = score(doc=2587,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.19345059 = fieldWeight in 2587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2587)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Purpose The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document's weight, which play the role of the mean average precision (MAP) score of the systems as a significance test's statics. The experiments were conducted using the TREC 9 Web track collection. Findings The p-values generated through the two types of significance tests, namely the Student's t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.

Date

20. 1.2015 18:30:22
Rajagopal, P.; Ravana, S.D.; Koh, Y.S.; Balakrishnan, V.: Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment (2019) 0.02
```
0.016438173 = product of:
  0.041095432 = sum of:
    0.026009986 = weight(_text_:study in 5287) [ClassicSimilarity], result of:
      0.026009986 = score(doc=5287,freq=2.0), product of:
        0.1448085 = queryWeight, product of:
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.044537213 = queryNorm
        0.17961644 = fieldWeight in 5287, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2514048 = idf(docFreq=4653, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5287)
    0.015085445 = product of:
      0.03017089 = sum of:
        0.03017089 = weight(_text_:22 in 5287) [ClassicSimilarity], result of:
          0.03017089 = score(doc=5287,freq=2.0), product of:
            0.15596174 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044537213 = queryNorm
            0.19345059 = fieldWeight in 5287, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5287)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges' involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents' effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.

Date

20. 1.2015 18:30:22

Search (9 results, page 1 of 1)

Authors

Years

Themes