Search (144 results, page 1 of 8)

Salton, G.: Thoughts about modern retrieval technologies (1988) 0.05

0.052262664 = product of:
  0.15678799 = sum of:
    0.1323866 = weight(_text_:graphic in 1522) [ClassicSimilarity], result of:
      0.1323866 = score(doc=1522,freq=2.0), product of:
        0.25850594 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.03903913 = queryNorm
        0.51212204 = fieldWeight in 1522, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1522)
    0.024401393 = product of:
      0.048802786 = sum of:
        0.048802786 = weight(_text_:methods in 1522) [ClassicSimilarity], result of:
          0.048802786 = score(doc=1522,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.31093797 = fieldWeight in 1522, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1522)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Paper presented at the 30th Annual Conference of the National Federation of Astracting and Information Services, Philadelphia, 28 Feb-2 Mar 88. In recent years, the amount and the variety of available machine-readable data, new technologies have been introduced, such as high density storage devices, and fancy graphic displays useful for information transformation and access. New approaches have also been considered for processing the stored data based on the construction of knowledge bases representing the contents and structure of the information, and the use of expert system techniques to control the user-system interactions. Provides a brief evaluation of the new information processing technologies, and of the software methods proposed for information manipulation.

Voorhees, E.M.; Harman, D.K.: ¬The Text REtrieval Conference (2005) 0.03
```
0.025177844 = product of:
  0.07553353 = sum of:
    0.009340232 = product of:
      0.018680464 = sum of:
        0.018680464 = weight(_text_:29 in 5082) [ClassicSimilarity], result of:
          0.018680464 = score(doc=5082,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.13602862 = fieldWeight in 5082, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5082)
      0.5 = coord(1/2)
    0.0661933 = weight(_text_:graphic in 5082) [ClassicSimilarity], result of:
      0.0661933 = score(doc=5082,freq=2.0), product of:
        0.25850594 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.03903913 = queryNorm
        0.25606102 = fieldWeight in 5082, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5082)
  0.33333334 = coord(2/6)
```
Abstract

Text retrieval technology targets a problem that is all too familiar: finding relevant information in large stores of electronic documents. The problem is an old one, with the first research conference devoted to the subject held in 1958 [11]. Since then the problem has continued to grow as more information is created in electronic form and more people gain electronic access. The advent of the World Wide Web, where anyone can publish so everyone must search, is a graphic illustration of the need for effective retrieval technology. The Text REtrieval Conference (TREC) is a workshop series designed to build the infrastructure necessary for the large-scale evaluation of text retrieval technology, thereby accelerating its transfer into the commercial sector. The series is sponsored by the U.S. National Institute of Standards and Technology (NIST) and the U.S. Department of Defense. At the time of this writing, there have been twelve TREC workshops and preparations for the thirteenth workshop are under way. Participants in the workshops have been drawn from the academic, commercial, and government sectors, and have included representatives from more than twenty different countries. These collective efforts have accomplished a great deal: a variety of large test collections have been built for both traditional ad hoc retrieval and related tasks such as cross-language retrieval, speech retrieval, and question answering; retrieval effectiveness has approximately doubled; and many commercial retrieval systems now contain technology first developed in TREC.

Date

29. 3.1996 18:16:49

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.02

0.022438403 = product of:
  0.13463041 = sum of:
    0.13463041 = sum of:
      0.09760557 = weight(_text_:methods in 3368) [ClassicSimilarity], result of:
        0.09760557 = score(doc=3368,freq=8.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.62187594 = fieldWeight in 3368, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
      0.037024844 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
        0.037024844 = score(doc=3368,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.2708308 = fieldWeight in 3368, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
  0.16666667 = coord(1/6)

Abstract: The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
Date: 22. 2.1996 13:14:10

Ng, K.B.; Loewenstern, D.; Basu, C.; Hirsh, H.; Kantor, P.B.: Data fusion of machine-learning methods for the TREC5 routing tak (and other work) (1997) 0.02

0.02043515 = product of:
  0.1226109 = sum of:
    0.1226109 = sum of:
      0.069718264 = weight(_text_:methods in 3107) [ClassicSimilarity], result of:
        0.069718264 = score(doc=3107,freq=2.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.4441971 = fieldWeight in 3107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.078125 = fieldNorm(doc=3107)
      0.052892637 = weight(_text_:22 in 3107) [ClassicSimilarity], result of:
        0.052892637 = score(doc=3107,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.38690117 = fieldWeight in 3107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=3107)
  0.16666667 = coord(1/6)

Date: 27. 2.1999 20:59:22

Bar-Ilan, J.: Methods for measuring search engine performance over time (2002) 0.02

0.02026257 = product of:
  0.060787708 = sum of:
    0.021349104 = product of:
      0.04269821 = sum of:
        0.04269821 = weight(_text_:29 in 305) [ClassicSimilarity], result of:
          0.04269821 = score(doc=305,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.31092256 = fieldWeight in 305, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=305)
      0.5 = coord(1/2)
    0.039438605 = product of:
      0.07887721 = sum of:
        0.07887721 = weight(_text_:methods in 305) [ClassicSimilarity], result of:
          0.07887721 = score(doc=305,freq=4.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.5025517 = fieldWeight in 305, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0625 = fieldNorm(doc=305)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: This study introduces methods for evaluating search engine performance over a time period. Several measures are defined, which as a whole describe search engine functionality over time. The necessary setup for such studies is described, and the use of these measures is illustrated through a specific example. The set of measures introduced here may serve as a guideline for the search engines for testing and improving their functionality. We recommend setting up a standard suite of measures for evaluating search engine performance.
Date: 23. 3.2002 9:50:29

Lespinasse, K.: TREC: une conference pour l'evaluation des systemes de recherche d'information (1997) 0.02

0.01634812 = product of:
  0.09808872 = sum of:
    0.09808872 = sum of:
      0.05577461 = weight(_text_:methods in 744) [ClassicSimilarity], result of:
        0.05577461 = score(doc=744,freq=2.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.35535768 = fieldWeight in 744, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0625 = fieldNorm(doc=744)
      0.04231411 = weight(_text_:22 in 744) [ClassicSimilarity], result of:
        0.04231411 = score(doc=744,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.30952093 = fieldWeight in 744, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=744)
  0.16666667 = coord(1/6)

Abstract: TREC ia an annual conference held in the USA devoted to electronic systems for large full text information searching. The conference deals with evaluation and comparison techniques developed since 1992 by participants from the research and industrial fields. The work of the conference is destined for designers (rather than users) of systems which access full text information. Describes the context, objectives, organization, evaluation methods and limits of TREC
Date: 1. 8.1996 22:01:00

Huffman, G.D.; Vital, D.A.; Bivins, R.G.: Generating indices with lexical association methods : term uniqueness (1990) 0.02
```
0.016067442 = product of:
  0.04820232 = sum of:
    0.01334319 = product of:
      0.02668638 = sum of:
        0.02668638 = weight(_text_:29 in 4152) [ClassicSimilarity], result of:
          0.02668638 = score(doc=4152,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.19432661 = fieldWeight in 4152, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.5 = coord(1/2)
    0.034859132 = product of:
      0.069718264 = sum of:
        0.069718264 = weight(_text_:methods in 4152) [ClassicSimilarity], result of:
          0.069718264 = score(doc=4152,freq=8.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.4441971 = fieldWeight in 4152, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.

Date

23.11.1995 11:29:46

Saracevic, T.; Mokros, H.; Su, L.: Nature of interaction between users and intermediaries in online searching : a qualitative analysis (1990) 0.01

0.0144304065 = product of:
  0.04329122 = sum of:
    0.022375738 = product of:
      0.044751476 = sum of:
        0.044751476 = weight(_text_:theory in 4894) [ClassicSimilarity], result of:
          0.044751476 = score(doc=4894,freq=2.0), product of:
            0.16234003 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.03903913 = queryNorm
            0.27566507 = fieldWeight in 4894, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=4894)
      0.5 = coord(1/2)
    0.020915478 = product of:
      0.041830957 = sum of:
        0.041830957 = weight(_text_:methods in 4894) [ClassicSimilarity], result of:
          0.041830957 = score(doc=4894,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.26651827 = fieldWeight in 4894, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=4894)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Reports preliminary results from a study, conducted at Rutgers Univ., School of Communication, Information and Library Studies, to conduct observations and experiments under real-life conditions on the nature, effects and patterns in the discourse between users and intermediary searchers and in the related computer commands in the context of online searching and responses. The study involved videotaping interactions between users and intermediaries and recording the search logs for 40 questions. Users judged the relevance of output and completed a number of other measures. Data is analysed both quantitatively, using standard and innovative statistical techniques, and qualitatively, through a grounded theory approach using microanalytic and observational methods

Drabenstott, K.M.; Vizine-Goetz, D.: Using subject headings for online retrieval : theory, practice and potential (1994) 0.01

0.0144304065 = product of:
  0.04329122 = sum of:
    0.022375738 = product of:
      0.044751476 = sum of:
        0.044751476 = weight(_text_:theory in 386) [ClassicSimilarity], result of:
          0.044751476 = score(doc=386,freq=2.0), product of:
            0.16234003 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.03903913 = queryNorm
            0.27566507 = fieldWeight in 386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=386)
      0.5 = coord(1/2)
    0.020915478 = product of:
      0.041830957 = sum of:
        0.041830957 = weight(_text_:methods in 386) [ClassicSimilarity], result of:
          0.041830957 = score(doc=386,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.26651827 = fieldWeight in 386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=386)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Using subject headings for Online Retrieval is an indispensable tool for online system desingners who are developing new systems or refining exicting ones. The book describes subject analysis and subject searching in online catalogs, including the limitations of retrieval, and demonstrates how such limitations can be overcome through system design and programming. The book describes the Library of Congress Subject headings system and system characteristics, shows how information is stored in machine readable files, and offers examples of and recommendations for successful methods. Tables are included to support these recommendations, and diagrams, graphs, and bar charts are used to provide results of data analyses.

Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.01

0.01436062 = product of:
  0.043081857 = sum of:
    0.018680464 = product of:
      0.03736093 = sum of:
        0.03736093 = weight(_text_:29 in 5907) [ClassicSimilarity], result of:
          0.03736093 = score(doc=5907,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.27205724 = fieldWeight in 5907, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5907)
      0.5 = coord(1/2)
    0.024401393 = product of:
      0.048802786 = sum of:
        0.048802786 = weight(_text_:methods in 5907) [ClassicSimilarity], result of:
          0.048802786 = score(doc=5907,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.31093797 = fieldWeight in 5907, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5907)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515,000 documents
Date: 29. 9.2001 14:01:42

Blagden, J.F.: How much noise in a role-free and link-free co-ordinate indexing system? (1966) 0.01

0.014304606 = product of:
  0.085827634 = sum of:
    0.085827634 = sum of:
      0.048802786 = weight(_text_:methods in 2718) [ClassicSimilarity], result of:
        0.048802786 = score(doc=2718,freq=2.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.31093797 = fieldWeight in 2718, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2718)
      0.037024844 = weight(_text_:22 in 2718) [ClassicSimilarity], result of:
        0.037024844 = score(doc=2718,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.2708308 = fieldWeight in 2718, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2718)
  0.16666667 = coord(1/6)

Abstract: A study of the number of irrelevant documents retrieved in a co-ordinate indexing system that does not employ eitherr roles or links. These tests were based on one hundred actual inquiries received in the library and therefore an evaluation of recall efficiency is not included. Over half the enquiries produced no noise, but the mean average percentage niose figure was approximately 33 per cent based on a total average retireval figure of eighteen documents per search. Details of the size of the indexed collection, methods of indexing, and an analysis of the reasons for the retrieval of irrelevant documents are discussed, thereby providing information officers who are thinking of installing such a system with some evidence on which to base a decision as to whether or not to utilize these devices
Source: Journal of documentation. 22(1966), S.203-209

Rijsbergen, C.J. van: ¬A test for the separation of relevant and non-relevant documents in experimental retrieval collections (1973) 0.01

0.01416872 = product of:
  0.04250616 = sum of:
    0.021349104 = product of:
      0.04269821 = sum of:
        0.04269821 = weight(_text_:29 in 5002) [ClassicSimilarity], result of:
          0.04269821 = score(doc=5002,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.31092256 = fieldWeight in 5002, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=5002)
      0.5 = coord(1/2)
    0.021157054 = product of:
      0.04231411 = sum of:
        0.04231411 = weight(_text_:22 in 5002) [ClassicSimilarity], result of:
          0.04231411 = score(doc=5002,freq=2.0), product of:
            0.1367084 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03903913 = queryNorm
            0.30952093 = fieldWeight in 5002, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5002)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Date: 19. 3.1996 11:22:12
Source: Journal of documentation. 29(1973) no.3, S.251-257

Blair, D.C.: Full text retrieval : Evaluation and implications (1986) 0.01

0.012309102 = product of:
  0.036927305 = sum of:
    0.016011827 = product of:
      0.032023653 = sum of:
        0.032023653 = weight(_text_:29 in 2047) [ClassicSimilarity], result of:
          0.032023653 = score(doc=2047,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.23319192 = fieldWeight in 2047, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=2047)
      0.5 = coord(1/2)
    0.020915478 = product of:
      0.041830957 = sum of:
        0.041830957 = weight(_text_:methods in 2047) [ClassicSimilarity], result of:
          0.041830957 = score(doc=2047,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.26651827 = fieldWeight in 2047, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=2047)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Recently, a detailed evaluation of a large, operational full-text document retrieval system was reported in the literature. Values of precision and recall were estimated usind traditional statistical sampling methods and blind evaluation procedures. The results of this evaluation demonstrated that the system tested was retrieving less then 20% of the relevant documents when the searchers believed it was retrieving over 75% of the relevant documents. This evaluation is described including some data not reported in the original article. Also discussed are the implications which this study has for how the subjects of documents should be represented, as well as the importance of rigorous retrieval evaluations for the furtherhance of information retrieval research
Footnote: Vgl.: Blair, D.C., M.E. Maron: An evaluation ... Comm. ACM 28(1985) S.280-299; Salton, G.: Another look ... Comm. ACM 29(1986) S.648-656; Blair, D.C., M.E. Maron: Full-text information retrieval ... Inf. Proc. Man. 26(1990) S.437-447.

Alemayehu, N.: Analysis of performance variation using quey expansion (2003) 0.01

0.012309102 = product of:
  0.036927305 = sum of:
    0.016011827 = product of:
      0.032023653 = sum of:
        0.032023653 = weight(_text_:29 in 1454) [ClassicSimilarity], result of:
          0.032023653 = score(doc=1454,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.23319192 = fieldWeight in 1454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1454)
      0.5 = coord(1/2)
    0.020915478 = product of:
      0.041830957 = sum of:
        0.041830957 = weight(_text_:methods in 1454) [ClassicSimilarity], result of:
          0.041830957 = score(doc=1454,freq=2.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.26651827 = fieldWeight in 1454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.046875 = fieldNorm(doc=1454)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Information retrieval performance evaluation is commonly made based an the classical recall and precision based figures or graphs. However, important information indicating causes for variation may remain hidden under the average recall and precision figures. Identifying significant causes for variation can help researchers and developers to focus an opportunities for improvement that underlay the averages. This article presents a case study showing the potential of a statistical repeated measures analysis of variance for testing the significance of factors in retrieval performance variation. The TREC-9 Query Track performance data is used as a case study and the factors studied are retrieval method, topic, and their interaction. The results show that retrieval method, topic, and their interaction are all significant. A topic level analysis is also made to see the nature of variation in the performance of retrieval methods across topics. The observed retrieval performances of expansion runs are truly significant improvements for most of the topics. Analyses of the effect of query expansion an document ranking confirm that expansion affects ranking positively.
Date: 29. 3.2003 19:28:33

Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.01
```
0.012261091 = product of:
  0.07356654 = sum of:
    0.07356654 = sum of:
      0.041830957 = weight(_text_:methods in 2552) [ClassicSimilarity], result of:
        0.041830957 = score(doc=2552,freq=2.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.26651827 = fieldWeight in 2552, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.046875 = fieldNorm(doc=2552)
      0.03173558 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
        0.03173558 = score(doc=2552,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.23214069 = fieldWeight in 2552, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2552)
  0.16666667 = coord(1/6)
```
Abstract

Reports results of a study to examine interindexer consistency (the degree to which indexers, when assigning terms to a chosen record, will choose the same terms to reflect that record) in the PsycINFO database using 60 records that were inadvertently processed twice between 1996 and 1998. Five aspects of interindexer consistency were analysed. Two methods were used to calculate interindexer consistency: one posited by Hooper (1965) and the other by Rollin (1981). Aspects analysed were: checktag consistency (66.24% using Hooper's calculation and 77.17% using Rollin's); major-to-all term consistency (49.31% and 62.59% respectively); overall indexing consistency (49.02% and 63.32%); classification code consistency (44.17% and 45.00%); and major-to-major term consistency (43.24% and 56.09%). The average consistency across all categories was 50.4% using Hooper's method and 60.83% using Rollin's. Although comparison with previous studies is difficult due to methodological variations in the overall study of indexing consistency and the specific characteristics of the database, results generally support previous findings when trends and similar studies are analysed.

Date

9. 2.1997 18:44:22
Pal, S.; Mitra, M.; Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval (2011) 0.01
```
0.010217575 = product of:
  0.06130545 = sum of:
    0.06130545 = sum of:
      0.034859132 = weight(_text_:methods in 4197) [ClassicSimilarity], result of:
        0.034859132 = score(doc=4197,freq=2.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.22209854 = fieldWeight in 4197, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4197)
      0.026446318 = weight(_text_:22 in 4197) [ClassicSimilarity], result of:
        0.026446318 = score(doc=4197,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.19345059 = fieldWeight in 4197, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4197)
  0.16666667 = coord(1/6)
```
Abstract

The Initiative for the Evaluation of XML retrieval (INEX) provides a TREC-like platform for evaluating content-oriented XML retrieval systems. Since 2007, INEX has been using a set of precision-recall based metrics for its ad hoc tasks. The authors investigate the reliability and robustness of these focused retrieval measures, and of the INEX pooling method. They explore four specific questions: How reliable are the metrics when assessments are incomplete, or when query sets are small? What is the minimum pool/query-set size that can be used to reliably evaluate systems? Can the INEX collections be used to fairly evaluate "new" systems that did not participate in the pooling process? And, for a fixed amount of assessment effort, would this effort be better spent in thoroughly judging a few queries, or in judging many queries relatively superficially? The authors' findings validate properties of precision-recall-based metrics observed in document retrieval settings. Early precision measures are found to be more error-prone and less stable under incomplete judgments and small topic-set sizes. They also find that system rankings remain largely unaffected even when assessment effort is substantially (but systematically) reduced, and confirm that the INEX collections remain usable when evaluating nonparticipating systems. Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries. However, when judging only a random sample of a pool, it is better to completely judge fewer topics than to partially judge many topics. This result confirms the effectiveness of pooling methods.

Date

22. 1.2011 14:20:56
Larsen, B.; Ingwersen, P.; Lund, B.: Data fusion according to the principle of polyrepresentation (2009) 0.01
```
0.010099277 = product of:
  0.06059566 = sum of:
    0.06059566 = sum of:
      0.039438605 = weight(_text_:methods in 2752) [ClassicSimilarity], result of:
        0.039438605 = score(doc=2752,freq=4.0), product of:
          0.15695344 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03903913 = queryNorm
          0.25127584 = fieldWeight in 2752, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.03125 = fieldNorm(doc=2752)
      0.021157054 = weight(_text_:22 in 2752) [ClassicSimilarity], result of:
        0.021157054 = score(doc=2752,freq=2.0), product of:
          0.1367084 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03903913 = queryNorm
          0.15476047 = fieldWeight in 2752, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=2752)
  0.16666667 = coord(1/6)
```
Abstract

We report data fusion experiments carried out on the four best-performing retrieval models from TREC 5. Three were conceptually/algorithmically very different from one another; one was algorithmically similar to one of the former. The objective of the test was to observe the performance of the 11 logical data fusion combinations compared to the performance of the four individual models and their intermediate fusions when following the principle of polyrepresentation. This principle is based on cognitive IR perspective (Ingwersen & Järvelin, 2005) and implies that each retrieval model is regarded as a representation of a unique interpretation of information retrieval (IR). It predicts that only fusions of very different, but equally good, IR models may outperform each constituent as well as their intermediate fusions. Two kinds of experiments were carried out. One tested restricted fusions, which entails that only the inner disjoint overlap documents between fused models are ranked. The second set of experiments was based on traditional data fusion methods. The experiments involved the 30 TREC 5 topics that contain more than 44 relevant documents. In all tests, the Borda and CombSUM scoring methods were used. Performance was measured by precision and recall, with document cutoff values (DCVs) at 100 and 15 documents, respectively. Results show that restricted fusions made of two, three, or four cognitively/algorithmically very different retrieval models perform significantly better than do the individual models at DCV100. At DCV15, however, the results of polyrepresentative fusion were less predictable. The traditional fusion method based on polyrepresentation principles demonstrates a clear picture of performance at both DCV levels and verifies the polyrepresentation predictions for data fusion in IR. Data fusion improves retrieval performance over their constituent IR models only if the models all are quite conceptually/algorithmically dissimilar and equally and well performing, in that order of importance.

Date

22. 3.2009 18:48:28
Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 0.01
```
0.009093864 = product of:
  0.05456318 = sum of:
    0.05456318 = product of:
      0.10912636 = sum of:
        0.10912636 = weight(_text_:methods in 4225) [ClassicSimilarity], result of:
          0.10912636 = score(doc=4225,freq=10.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.6952785 = fieldWeight in 4225, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4225)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback-Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.

Chu, H.: Factors affecting relevance judgment : a report from TREC Legal track (2011) 0.01

0.00885545 = product of:
  0.026566349 = sum of:
    0.01334319 = product of:
      0.02668638 = sum of:
        0.02668638 = weight(_text_:29 in 4540) [ClassicSimilarity], result of:
          0.02668638 = score(doc=4540,freq=2.0), product of:
            0.13732746 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.03903913 = queryNorm
            0.19432661 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4540)
      0.5 = coord(1/2)
    0.013223159 = product of:
      0.026446318 = sum of:
        0.026446318 = weight(_text_:22 in 4540) [ClassicSimilarity], result of:
          0.026446318 = score(doc=4540,freq=2.0), product of:
            0.1367084 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03903913 = queryNorm
            0.19345059 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4540)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Date: 12. 7.2011 18:29:22

Losada, D.E.; Parapar, J.; Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems (2017) 0.01
```
0.008714783 = product of:
  0.052288696 = sum of:
    0.052288696 = product of:
      0.10457739 = sum of:
        0.10457739 = weight(_text_:methods in 5098) [ClassicSimilarity], result of:
          0.10457739 = score(doc=5098,freq=18.0), product of:
            0.15695344 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.03903913 = queryNorm
            0.66629565 = fieldWeight in 5098, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5098)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

Evaluating Information Retrieval systems is crucial to making progress in search technologies. Evaluation is often based on assembling reference collections consisting of documents, queries and relevance judgments done by humans. In large-scale environments, exhaustively judging relevance becomes infeasible. Instead, only a pool of documents is judged for relevance. By selectively choosing documents from the pool we can optimize the number of judgments required to identify a given number of relevant documents. We argue that this iterative selection process can be naturally modeled as a reinforcement learning problem and propose innovative and formal adjudication methods based on multi-armed bandits. Casting document judging as a multi-armed bandit problem is not only theoretically appealing, but also leads to highly effective adjudication methods. Under this bandit allocation framework, we consider stationary and non-stationary models and propose seven new document adjudication methods (five stationary methods and two non-stationary variants). Our paper also reports a series of experiments performed to thoroughly compare our new methods against current adjudication methods. This comparative study includes existing methods designed for pooling-based evaluation and existing methods designed for metasearch. Our experiments show that our theoretically grounded adjudication methods can substantially minimize the assessment effort.

Search (144 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects

Classifications