Search (162 results, page 1 of 9)

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.31

0.31449008 = sum of:
  0.00823978 = product of:
    0.03295912 = sum of:
      0.03295912 = weight(_text_:based in 3368) [ClassicSimilarity], result of:
        0.03295912 = score(doc=3368,freq=2.0), product of:
          0.14144066 = queryWeight, product of:
            3.0129938 = idf(docFreq=5906, maxDocs=44218)
            0.04694356 = queryNorm
          0.23302436 = fieldWeight in 3368, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.0129938 = idf(docFreq=5906, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
    0.25 = coord(1/4)
  0.15808989 = weight(_text_:term in 3368) [ClassicSimilarity], result of:
    0.15808989 = score(doc=3368,freq=8.0), product of:
      0.21904005 = queryWeight, product of:
        4.66603 = idf(docFreq=1130, maxDocs=44218)
        0.04694356 = queryNorm
      0.72173965 = fieldWeight in 3368, product of:
        2.828427 = tf(freq=8.0), with freq of:
          8.0 = termFreq=8.0
        4.66603 = idf(docFreq=1130, maxDocs=44218)
        0.0546875 = fieldNorm(doc=3368)
  0.12589967 = weight(_text_:frequency in 3368) [ClassicSimilarity], result of:
    0.12589967 = score(doc=3368,freq=2.0), product of:
      0.27643865 = queryWeight, product of:
        5.888745 = idf(docFreq=332, maxDocs=44218)
        0.04694356 = queryNorm
      0.45543438 = fieldWeight in 3368, product of:
        1.4142135 = tf(freq=2.0), with freq of:
          2.0 = termFreq=2.0
        5.888745 = idf(docFreq=332, maxDocs=44218)
        0.0546875 = fieldNorm(doc=3368)
  0.022260714 = product of:
    0.04452143 = sum of:
      0.04452143 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
        0.04452143 = score(doc=3368,freq=2.0), product of:
          0.16438834 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04694356 = queryNorm
          0.2708308 = fieldWeight in 3368, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3368)
    0.5 = coord(1/2)

Abstract: The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
Date: 22. 2.1996 13:14:10

Ruthven, I.; Lalmas, M.; Rijsbergen, K. van: Combining and selecting characteristics of information use (2002) 0.19
```
0.18807887 = product of:
  0.25077182 = sum of:
    0.0066587473 = product of:
      0.02663499 = sum of:
        0.02663499 = weight(_text_:based in 5208) [ClassicSimilarity], result of:
          0.02663499 = score(doc=5208,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.18831211 = fieldWeight in 5208, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=5208)
      0.25 = coord(1/4)
    0.11950473 = weight(_text_:term in 5208) [ClassicSimilarity], result of:
      0.11950473 = score(doc=5208,freq=14.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.5455839 = fieldWeight in 5208, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.03125 = fieldNorm(doc=5208)
    0.12460835 = weight(_text_:frequency in 5208) [ClassicSimilarity], result of:
      0.12460835 = score(doc=5208,freq=6.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.45076314 = fieldWeight in 5208, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.03125 = fieldNorm(doc=5208)
  0.75 = coord(3/4)
```
Abstract

Ruthven, Lalmas, and van Rijsbergen use traditional term importance measures like inverse document frequency, noise, based upon in-document frequency, and term frequency supplemented by theme value which is calculated from differences of expected positions of words in a text from their actual positions, on the assumption that even distribution indicates term association with a main topic, and context, which is based on a query term's distance from the nearest other query term relative to the average expected distribution of all query terms in the document. They then define document characteristics like specificity, the sum of all idf values in a document over the total terms in the document, or document complexity, measured by the documents average idf value; and information to noise ratio, info-noise, tokens after stopping and stemming over tokens before these processes, measuring the ratio of useful and non-useful information in a document. Retrieval tests are then carried out using each characteristic, combinations of the characteristics, and relevance feedback to determine the correct combination of characteristics. A file ranks independently of query terms by both specificity and info-noise, but if presence of a query term is required unique rankings are generated. Tested on five standard collections the traditional characteristics out preformed the new characteristics, which did, however, out preform random retrieval. All possible combinations of characteristics were also tested both with and without a set of scaling weights applied. All characteristics can benefit by combination with another characteristic or set of characteristics and performance as a single characteristic is a good indicator of performance in combination. Larger combinations tended to be more effective than smaller ones and weighting increased precision measures of middle ranking combinations but decreased the ranking of poorer combinations. The best combinations vary for each collection, and in some collections with the addition of weighting. Finally, with all documents ranked by the all characteristics combination, they take the top 30 documents and calculate the characteristic scores for each term in both the relevant and the non-relevant sets. Then taking for each query term the characteristics whose average was higher for relevant than non-relevant documents the documents are re-ranked. The relevance feedback method of selecting characteristics can select a good set of characteristics for query terms.

Robertson, S.E.; Walker, S.; Hancock-Beaulieu, M.M.: Large test collection experiments of an operational, interactive system : OKAPI at TREC (1995) 0.16

0.16244806 = product of:
  0.21659742 = sum of:
    0.011652809 = product of:
      0.046611235 = sum of:
        0.046611235 = weight(_text_:based in 6964) [ClassicSimilarity], result of:
          0.046611235 = score(doc=6964,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.3295462 = fieldWeight in 6964, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6964)
      0.25 = coord(1/4)
    0.079044946 = weight(_text_:term in 6964) [ClassicSimilarity], result of:
      0.079044946 = score(doc=6964,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.36086982 = fieldWeight in 6964, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6964)
    0.12589967 = weight(_text_:frequency in 6964) [ClassicSimilarity], result of:
      0.12589967 = score(doc=6964,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.45543438 = fieldWeight in 6964, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6964)
  0.75 = coord(3/4)

Abstract: The Okapi system has been used in a series of experiments on the TREC collections, investiganting probabilistic methods, relevance feedback, and query expansion, and interaction issues. Some new probabilistic models have been developed, resulting in simple weigthing functions that take account of document length and within document and within query term frequency. All have been shown to be beneficial when based on large quantities of relevance data as in the routing task. Interaction issues are much more difficult to evaluate in the TREC framework, and no benefits have yet been demonstrated from feedback based on small numbers of 'relevant' items identified by intermediary searchers

Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 0.13
```
0.13344912 = product of:
  0.17793216 = sum of:
    0.008155267 = product of:
      0.032621067 = sum of:
        0.032621067 = weight(_text_:based in 2031) [ClassicSimilarity], result of:
          0.032621067 = score(doc=2031,freq=6.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.2306343 = fieldWeight in 2031, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=2031)
      0.25 = coord(1/4)
    0.04516854 = weight(_text_:term in 2031) [ClassicSimilarity], result of:
      0.04516854 = score(doc=2031,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.20621133 = fieldWeight in 2031, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.03125 = fieldNorm(doc=2031)
    0.12460835 = weight(_text_:frequency in 2031) [ClassicSimilarity], result of:
      0.12460835 = score(doc=2031,freq=6.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.45076314 = fieldWeight in 2031, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.03125 = fieldNorm(doc=2031)
  0.75 = coord(3/4)
```
Abstract

Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312-318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283-289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11-21; Yu, C., & Salton, G. (1976). Precision weighting - an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76-88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.
Talvensaari, T.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study on automatic creation of a comparable document collection in cross-language information retrieval (2006) 0.11
```
0.11420593 = product of:
  0.15227456 = sum of:
    0.005885557 = product of:
      0.023542227 = sum of:
        0.023542227 = weight(_text_:based in 5601) [ClassicSimilarity], result of:
          0.023542227 = score(doc=5601,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.16644597 = fieldWeight in 5601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5601)
      0.25 = coord(1/4)
    0.056460675 = weight(_text_:term in 5601) [ClassicSimilarity], result of:
      0.056460675 = score(doc=5601,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.25776416 = fieldWeight in 5601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5601)
    0.08992833 = weight(_text_:frequency in 5601) [ClassicSimilarity], result of:
      0.08992833 = score(doc=5601,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.32531026 = fieldWeight in 5601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5601)
  0.75 = coord(3/4)
```
Abstract

Purpose - To present a method for creating a comparable document collection from two document collections in different languages. Design/methodology/approach - The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary-based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date-restricted alignment schemes, which were also combined. Findings - The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five-degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity. Research limitations/implications - The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross-language information retrieval (CLIR). Practical implications - Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource. Originality/value - The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.

Huffman, G.D.; Vital, D.A.; Bivins, R.G.: Generating indices with lexical association methods : term uniqueness (1990) 0.11

0.108049594 = product of:
  0.14406613 = sum of:
    0.008323434 = product of:
      0.033293735 = sum of:
        0.033293735 = weight(_text_:based in 4152) [ClassicSimilarity], result of:
          0.033293735 = score(doc=4152,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23539014 = fieldWeight in 4152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.25 = coord(1/4)
    0.07984746 = weight(_text_:term in 4152) [ClassicSimilarity], result of:
      0.07984746 = score(doc=4152,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.3645336 = fieldWeight in 4152, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4152)
    0.055895224 = product of:
      0.11179045 = sum of:
        0.11179045 = weight(_text_:assessment in 4152) [ClassicSimilarity], result of:
          0.11179045 = score(doc=4152,freq=4.0), product of:
            0.25917634 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.04694356 = queryNorm
            0.43132967 = fieldWeight in 4152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.

Crestani, F.; Rijsbergen, C.J. van: Information retrieval by imaging (1996) 0.11

0.10762094 = product of:
  0.14349459 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 6967) [ClassicSimilarity], result of:
          0.028250674 = score(doc=6967,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 6967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=6967)
      0.25 = coord(1/4)
    0.117351316 = weight(_text_:term in 6967) [ClassicSimilarity], result of:
      0.117351316 = score(doc=6967,freq=6.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.5357528 = fieldWeight in 6967, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=6967)
    0.019080611 = product of:
      0.038161222 = sum of:
        0.038161222 = weight(_text_:22 in 6967) [ClassicSimilarity], result of:
          0.038161222 = score(doc=6967,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.23214069 = fieldWeight in 6967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=6967)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Explains briefly what constitutes the imaging process and explains how imaging can be used in information retrieval. Proposes an approach based on the concept of: 'a term is a possible world'; which enables the exploitation of term to term relationships which are estimated using an information theoretic measure. Reports results of an evaluation exercise to compare the performance of imaging retrieval, using possible world semantics, with a benchmark and using the Cranfield 2 document collection to measure precision and recall. Initially, the performance imaging retrieval was seen to be better but statistical analysis proved that the difference was not significant. The problem with imaging retrieval lies in the amount of computations needed to be performed at run time and a later experiement investigated the possibility of reducing this amount. Notes lines of further investigation
Source: Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon

Pal, S.; Mitra, M.; Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval (2011) 0.08
```
0.07595745 = product of:
  0.1519149 = sum of:
    0.008323434 = product of:
      0.033293735 = sum of:
        0.033293735 = weight(_text_:based in 4197) [ClassicSimilarity], result of:
          0.033293735 = score(doc=4197,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23539014 = fieldWeight in 4197, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4197)
      0.25 = coord(1/4)
    0.14359146 = sum of:
      0.11179045 = weight(_text_:assessment in 4197) [ClassicSimilarity], result of:
        0.11179045 = score(doc=4197,freq=4.0), product of:
          0.25917634 = queryWeight, product of:
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.04694356 = queryNorm
          0.43132967 = fieldWeight in 4197, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4197)
      0.031801023 = weight(_text_:22 in 4197) [ClassicSimilarity], result of:
        0.031801023 = score(doc=4197,freq=2.0), product of:
          0.16438834 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04694356 = queryNorm
          0.19345059 = fieldWeight in 4197, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4197)
  0.5 = coord(2/4)
```
Abstract

The Initiative for the Evaluation of XML retrieval (INEX) provides a TREC-like platform for evaluating content-oriented XML retrieval systems. Since 2007, INEX has been using a set of precision-recall based metrics for its ad hoc tasks. The authors investigate the reliability and robustness of these focused retrieval measures, and of the INEX pooling method. They explore four specific questions: How reliable are the metrics when assessments are incomplete, or when query sets are small? What is the minimum pool/query-set size that can be used to reliably evaluate systems? Can the INEX collections be used to fairly evaluate "new" systems that did not participate in the pooling process? And, for a fixed amount of assessment effort, would this effort be better spent in thoroughly judging a few queries, or in judging many queries relatively superficially? The authors' findings validate properties of precision-recall-based metrics observed in document retrieval settings. Early precision measures are found to be more error-prone and less stable under incomplete judgments and small topic-set sizes. They also find that system rankings remain largely unaffected even when assessment effort is substantially (but systematically) reduced, and confirm that the INEX collections remain usable when evaluating nonparticipating systems. Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries. However, when judging only a random sample of a pool, it is better to completely judge fewer topics than to partially judge many topics. This result confirms the effectiveness of pooling methods.

Date

22. 1.2011 14:20:56

Schamber, L.; Bateman, J.: User criteria in relevance evaluation : toward development of a measurement scale (1996) 0.06

0.06171962 = product of:
  0.12343924 = sum of:
    0.011652809 = product of:
      0.046611235 = sum of:
        0.046611235 = weight(_text_:based in 7351) [ClassicSimilarity], result of:
          0.046611235 = score(doc=7351,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.3295462 = fieldWeight in 7351, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7351)
      0.25 = coord(1/4)
    0.11178643 = weight(_text_:term in 7351) [ClassicSimilarity], result of:
      0.11178643 = score(doc=7351,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.510347 = fieldWeight in 7351, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7351)
  0.5 = coord(2/4)

Abstract: Part of a long term project which aims to develop a simple measurement scale based on user criteria, that will yield results applicable to the study of user evaluations in any type of information seeking and use environment. Describes 2 tests which were conducted to determine how users interpret criterion terms drawn from previous user based relevance studies. Presents results of these initial tests and describes conceptual and methodological challenges in long term development of the instrument

Rokaya, M.; Atlam, E.; Fuketa, M.; Dorji, T.C.; Aoe, J.-i.: Ranking of field association terms using Co-word analysis (2008) 0.06

0.05895106 = product of:
  0.11790212 = sum of:
    0.009988121 = product of:
      0.039952483 = sum of:
        0.039952483 = weight(_text_:based in 2060) [ClassicSimilarity], result of:
          0.039952483 = score(doc=2060,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.28246817 = fieldWeight in 2060, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2060)
      0.25 = coord(1/4)
    0.107914 = weight(_text_:frequency in 2060) [ClassicSimilarity], result of:
      0.107914 = score(doc=2060,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.39037234 = fieldWeight in 2060, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.046875 = fieldNorm(doc=2060)
  0.5 = coord(2/4)

Abstract: Information retrieval involves finding some desired information in a store of information or a database. In this paper, Co-word analysis will be used to achieve a ranking of a selected sample of FA terms. Based on this ranking a better arranging of search results can be achieved. Experimental results achieved using 41 MB of data (7660 documents) in the field of sports. The corpus was collected from CNN newspaper, sports field. This corpus was chosen to be distributed over 11 sub-fields of the field sports from the experimental results, the average precision increased by 18.3% after applying the proposed arranging scheme depending on the absolute frequency to count the terms weights, and the average precision increased by 17.2% after applying the proposed arranging scheme depending on a formula based on "TF*IDF" to count the terms weights.

¬The Fifth Text Retrieval Conference (TREC-5) (1997) 0.06

0.057888947 = product of:
  0.115777895 = sum of:
    0.09033708 = weight(_text_:term in 3087) [ClassicSimilarity], result of:
      0.09033708 = score(doc=3087,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.41242266 = fieldWeight in 3087, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0625 = fieldNorm(doc=3087)
    0.025440816 = product of:
      0.05088163 = sum of:
        0.05088163 = weight(_text_:22 in 3087) [ClassicSimilarity], result of:
          0.05088163 = score(doc=3087,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.30952093 = fieldWeight in 3087, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3087)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Proceedings of the 5th TREC-confrerence held in Gaithersburgh, Maryland, Nov 20-22, 1996. Aim of the conference was discussion on retrieval techniques for large test collections. Different research groups used different techniques, such as automated thesauri, term weighting, natural language techniques, relevance feedback and advanced pattern matching, for information retrieval from the same large database. This procedure makes it possible to compare the results. The proceedings include papers, tables of the system results, and brief system descriptions including timing and storage information

Leppanen, E.: Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset (1996) 0.06
```
0.057488333 = product of:
  0.11497667 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 27) [ClassicSimilarity], result of:
          0.028250674 = score(doc=27,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 27, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=27)
      0.25 = coord(1/4)
    0.107914 = weight(_text_:frequency in 27) [ClassicSimilarity], result of:
      0.107914 = score(doc=27,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.39037234 = fieldWeight in 27, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.046875 = fieldNorm(doc=27)
  0.5 = coord(2/4)
```
Abstract

Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system

Ding, C.H.Q.: ¬A probabilistic model for Latent Semantic Indexing (2005) 0.06

0.057488333 = product of:
  0.11497667 = sum of:
    0.0070626684 = product of:
      0.028250674 = sum of:
        0.028250674 = weight(_text_:based in 3459) [ClassicSimilarity], result of:
          0.028250674 = score(doc=3459,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.19973516 = fieldWeight in 3459, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3459)
      0.25 = coord(1/4)
    0.107914 = weight(_text_:frequency in 3459) [ClassicSimilarity], result of:
      0.107914 = score(doc=3459,freq=2.0), product of:
        0.27643865 = queryWeight, product of:
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.04694356 = queryNorm
        0.39037234 = fieldWeight in 3459, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.888745 = idf(docFreq=332, maxDocs=44218)
          0.046875 = fieldNorm(doc=3459)
  0.5 = coord(2/4)

Abstract: Latent Semantic Indexing (LSI), when applied to semantic space built an text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based an the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments an five standard document collections confirm and illustrate the analysis.

Leininger, K.: Interindexer consistency in PsychINFO (2000) 0.06
```
0.05744878 = product of:
  0.11489756 = sum of:
    0.09581695 = weight(_text_:term in 2552) [ClassicSimilarity], result of:
      0.09581695 = score(doc=2552,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.4374403 = fieldWeight in 2552, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2552)
    0.019080611 = product of:
      0.038161222 = sum of:
        0.038161222 = weight(_text_:22 in 2552) [ClassicSimilarity], result of:
          0.038161222 = score(doc=2552,freq=2.0), product of:
            0.16438834 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04694356 = queryNorm
            0.23214069 = fieldWeight in 2552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2552)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Reports results of a study to examine interindexer consistency (the degree to which indexers, when assigning terms to a chosen record, will choose the same terms to reflect that record) in the PsycINFO database using 60 records that were inadvertently processed twice between 1996 and 1998. Five aspects of interindexer consistency were analysed. Two methods were used to calculate interindexer consistency: one posited by Hooper (1965) and the other by Rollin (1981). Aspects analysed were: checktag consistency (66.24% using Hooper's calculation and 77.17% using Rollin's); major-to-all term consistency (49.31% and 62.59% respectively); overall indexing consistency (49.02% and 63.32%); classification code consistency (44.17% and 45.00%); and major-to-major term consistency (43.24% and 56.09%). The average consistency across all categories was 50.4% using Hooper's method and 60.83% using Rollin's. Although comparison with previous studies is difficult due to methodological variations in the overall study of indexing consistency and the specific characteristics of the database, results generally support previous findings when trends and similar studies are analysed.

Date

9. 2.1997 18:44:22

Colace, F.; Santo, M. de; Greco, L.; Napoletano, P.: Improving relevance feedback-based query expansion by the use of a weighted word pairs approach (2015) 0.05

0.052902535 = product of:
  0.10580507 = sum of:
    0.009988121 = product of:
      0.039952483 = sum of:
        0.039952483 = weight(_text_:based in 2263) [ClassicSimilarity], result of:
          0.039952483 = score(doc=2263,freq=4.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.28246817 = fieldWeight in 2263, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2263)
      0.25 = coord(1/4)
    0.09581695 = weight(_text_:term in 2263) [ClassicSimilarity], result of:
      0.09581695 = score(doc=2263,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.4374403 = fieldWeight in 2263, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2263)
  0.5 = coord(2/4)

Abstract: In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline.

Keen, E.M.: Some aspects of proximity searching in text retrieval systems (1992) 0.05

0.049876988 = product of:
  0.099753976 = sum of:
    0.009416891 = product of:
      0.037667565 = sum of:
        0.037667565 = weight(_text_:based in 6190) [ClassicSimilarity], result of:
          0.037667565 = score(doc=6190,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.26631355 = fieldWeight in 6190, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=6190)
      0.25 = coord(1/4)
    0.09033708 = weight(_text_:term in 6190) [ClassicSimilarity], result of:
      0.09033708 = score(doc=6190,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.41242266 = fieldWeight in 6190, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0625 = fieldNorm(doc=6190)
  0.5 = coord(2/4)

Abstract: Describes and evaluates the proximity search facilities in external online systems and in-house retrieval software. Discusses and illustrates capabilities, syntax and circumstances of use. Presents measurements of the overheads required by proximity for storage, record input time and search time. The search strategy narrowing effect of proximity is illustrated by recall and precision test results. Usage and problems lead to a number of design ideas for better implementation: some based on existing Boolean strategies, one on the use of weighted proximity to automatically produce ranked output. A comparison of Boolean, quorum and proximate term pairs distance is included

Ekmekcioglu, F.C.; Robertson, A.M.; Willett, P.: Effectiveness of query expansion in ranked-output document retrieval systems (1992) 0.05

0.049876988 = product of:
  0.099753976 = sum of:
    0.009416891 = product of:
      0.037667565 = sum of:
        0.037667565 = weight(_text_:based in 5689) [ClassicSimilarity], result of:
          0.037667565 = score(doc=5689,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.26631355 = fieldWeight in 5689, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=5689)
      0.25 = coord(1/4)
    0.09033708 = weight(_text_:term in 5689) [ClassicSimilarity], result of:
      0.09033708 = score(doc=5689,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.41242266 = fieldWeight in 5689, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0625 = fieldNorm(doc=5689)
  0.5 = coord(2/4)

Abstract: Reports an evaluation of 3 methods for the expansion of natural language queries in ranked output retrieval systems. The methods are based on term co-occurrence data, on Soundex codes, and on a string similarity measure. Searches for 110 queries in a data base of 26.280 titles and abstracts suggest that there is no significant difference in retrieval effectiveness between any of these methods and unexpanded searches

Díaz, A.; García, A.; Gervás, P.: User-centred versus system-centred evaluation of a personalization system (2008) 0.05
```
0.045809288 = product of:
  0.091618575 = sum of:
    0.011771114 = product of:
      0.047084454 = sum of:
        0.047084454 = weight(_text_:based in 2094) [ClassicSimilarity], result of:
          0.047084454 = score(doc=2094,freq=8.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.33289194 = fieldWeight in 2094, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2094)
      0.25 = coord(1/4)
    0.07984746 = weight(_text_:term in 2094) [ClassicSimilarity], result of:
      0.07984746 = score(doc=2094,freq=4.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.3645336 = fieldWeight in 2094, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2094)
  0.5 = coord(2/4)
```
Abstract

Some of the most popular measures to evaluate information filtering systems are usually independent of the users because they are based in relevance judgments obtained from experts. On the other hand, the user-centred evaluation allows showing the different impressions that the users have perceived about the system running. This work is focused on discussing the problem of user-centred versus system-centred evaluation of a Web content personalization system where the personalization is based on a user model that stores long term (section, categories and keywords) and short term interests (adapted from user provided feedback). The user-centred evaluation is based on questionnaires filled in by the users before and after using the system and the system-centred evaluation is based on the comparison between ranking of documents, obtained from the application of a multi-tier selection process, and binary relevance judgments collected previously from real users. The user-centred and system-centred evaluations performed with 106 users during 14 working days have provided valuable data concerning the behaviour of the users with respect to issues such as document relevance or the relative importance attributed to different ways of personalization. The results obtained shows general satisfaction on both the personalization processes (selection, adaptation and presentation) and the system as a whole.

Davis, C.H.: From document retrieval to Web browsing : some universal concerns (1997) 0.04

0.043642364 = product of:
  0.08728473 = sum of:
    0.00823978 = product of:
      0.03295912 = sum of:
        0.03295912 = weight(_text_:based in 399) [ClassicSimilarity], result of:
          0.03295912 = score(doc=399,freq=2.0), product of:
            0.14144066 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.04694356 = queryNorm
            0.23302436 = fieldWeight in 399, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=399)
      0.25 = coord(1/4)
    0.079044946 = weight(_text_:term in 399) [ClassicSimilarity], result of:
      0.079044946 = score(doc=399,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.36086982 = fieldWeight in 399, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=399)
  0.5 = coord(2/4)

Abstract: Computer based systems can produce enourmous retrieval sets even when good search logic is used. Sometimes this is desirable, more often it is not. Appropriate filters can limit search results, but they represent only a partial solution. Simple ranking techniques are needed that are both effective and easily understood by the humans doing the searching. Optimal search output, whether from a traditional database or the Internet, will result when intuitive interfaces are designed that inspire confidence while making the necessary mathematics transparent. Weighted term searching using powers of 2, a technique proposed early in the history of information retrieval, can be simplifies and used in combination with modern graphics and textual input to achieve these results

Harter, S.P.: Search term combinations and retrieval overlap : a proposed methodology and case study (1990) 0.04

0.039522473 = product of:
  0.15808989 = sum of:
    0.15808989 = weight(_text_:term in 339) [ClassicSimilarity], result of:
      0.15808989 = score(doc=339,freq=2.0), product of:
        0.21904005 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.04694356 = queryNorm
        0.72173965 = fieldWeight in 339, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.109375 = fieldNorm(doc=339)
  0.25 = coord(1/4)

Search (162 results, page 1 of 9)

Authors

Years

Languages

Types

Themes