Search (76 results, page 1 of 4)

  • × theme_ss:"Retrievalstudien"
  1. Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.31
    0.30766279 = product of:
      0.41021705 = sum of:
        0.18506117 = weight(_text_:vector in 3458) [ClassicSimilarity], result of:
          0.18506117 = score(doc=3458,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 3458, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
        0.19216156 = weight(_text_:space in 3458) [ClassicSimilarity], result of:
          0.19216156 = score(doc=3458,freq=10.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.7735293 = fieldWeight in 3458, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
        0.03299433 = product of:
          0.06598866 = sum of:
            0.06598866 = weight(_text_:model in 3458) [ClassicSimilarity], result of:
              0.06598866 = score(doc=3458,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36048993 = fieldWeight in 3458, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3458)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.
  2. Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.19
    0.19290367 = product of:
      0.2572049 = sum of:
        0.130858 = weight(_text_:vector in 2062) [ClassicSimilarity], result of:
          0.130858 = score(doc=2062,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.4268754 = fieldWeight in 2062, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=2062)
        0.08593727 = weight(_text_:space in 2062) [ClassicSimilarity], result of:
          0.08593727 = score(doc=2062,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 2062, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=2062)
        0.04040964 = product of:
          0.08081928 = sum of:
            0.08081928 = weight(_text_:model in 2062) [ClassicSimilarity], result of:
              0.08081928 = score(doc=2062,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.44150823 = fieldWeight in 2062, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2062)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Based on a relatively large subset representing one third of the Medline collection, this paper evaluates ten different IR models, including recent developments in both probabilistic and language models. We show that the best performing IR models is a probabilistic model developed within the Divergence from Randomness framework [Amati, G., & van Rijsbergen, C.J. (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM-Transactions on Information Systems 20(4), 357-389], which result in 170% enhancements in mean average precision when compared to the classical tf idf vector-space model. This paper also reports on our impact evaluations on the retrieval effectiveness of manually assigned descriptors (MeSH or Medical Subject Headings), showing that by including these terms retrieval performance can improve from 2.4% to 13.5%, depending on the underling IR model. Finally, we design a new general blind-query expansion approach showing improved retrieval performances compared to those obtained using the Rocchio approach.
  3. Evans, D.A.; Lefferts, R.G.: CLARIT-TREC experiments (1995) 0.18
    0.18066272 = product of:
      0.36132544 = sum of:
        0.21809667 = weight(_text_:vector in 1912) [ClassicSimilarity], result of:
          0.21809667 = score(doc=1912,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.711459 = fieldWeight in 1912, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.078125 = fieldNorm(doc=1912)
        0.14322878 = weight(_text_:space in 1912) [ClassicSimilarity], result of:
          0.14322878 = score(doc=1912,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.5765547 = fieldWeight in 1912, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.078125 = fieldNorm(doc=1912)
      0.5 = coord(2/4)
    
    Abstract
    Describes the following elements of the CLARIT system information management system: natural language processing, document indexing, vector space querying and query augmentation. Reports on the processing results carried out as part of the TREC-2 and into system parameterization. Results demonstrate high prescision and excellent recall, but the system is not yet optimized
  4. Shaw, W.M.; Burgin, R.; Howell, P.: Performance standards and evaluations in IR test collections : vector-space and other retrieval models (1997) 0.15
    0.15329741 = product of:
      0.30659482 = sum of:
        0.18506117 = weight(_text_:vector in 7259) [ClassicSimilarity], result of:
          0.18506117 = score(doc=7259,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 7259, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=7259)
        0.12153365 = weight(_text_:space in 7259) [ClassicSimilarity], result of:
          0.12153365 = score(doc=7259,freq=4.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.48922288 = fieldWeight in 7259, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=7259)
      0.5 = coord(2/4)
    
    Abstract
    Computes low performance standards for each query and for the group of queries in 13 traditional and 4 TREC test collections. Predicted by the hypergeometric distribution, the standards represent the highest level of retrieval effectiveness attributable to chance. Compares operational levels of performance for vector-space, ad-hoc-feature-based, probabilistic, and other retrieval models to the standards. The effectiveness of these techniques in small, traditional test collections, can be explained by retrieving a few more relevant documents for most queries than expected by chance. The effectiveness of retrieval techniques in the larger TREC test collections can only be explained by retrieving many more relevant documents for most queries than expected by chance. The discrepancy between deviations form chance in traditional and TREC test collections is due to a decrease in performance standards for large test collections, not to an increase in operational performance. The next generation of information retrieval systems would be enhanced by abandoning uninformative performance summaries and focusing on effectiveness and improvements in effectiveness of individual queries
  5. Singhal, A.: Document length normalization (1996) 0.13
    0.1264639 = product of:
      0.2529278 = sum of:
        0.15266767 = weight(_text_:vector in 6630) [ClassicSimilarity], result of:
          0.15266767 = score(doc=6630,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.4980213 = fieldWeight in 6630, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6630)
        0.100260146 = weight(_text_:space in 6630) [ClassicSimilarity], result of:
          0.100260146 = score(doc=6630,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.4035883 = fieldWeight in 6630, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6630)
      0.5 = coord(2/4)
    
    Abstract
    In the Text REtrieval Conference (TREC) collection - a large full text experimental text collection with varying documents lengths - observes that the likelihood of a document being judged relevant by a user increases with the document length. A retrieval strategy, such as the vector space cosine match, that retrieves documents of different lengths with roughly equal chances, will not optimally retrieve useful documents from such a collection. Presents a modified technique (pivoted cosine normalization) that attempts to match the likelihood of retrieving documents of all lengths to the likelihood of their relevance and shows that this technique yields significant improvements in retrieval effectiveness
  6. Shafique, M.; Chaudhry, A.S.: Intelligent agent-based online information retrieval (1995) 0.11
    0.112735406 = product of:
      0.22547081 = sum of:
        0.18506117 = weight(_text_:vector in 3851) [ClassicSimilarity], result of:
          0.18506117 = score(doc=3851,freq=4.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.603693 = fieldWeight in 3851, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=3851)
        0.04040964 = product of:
          0.08081928 = sum of:
            0.08081928 = weight(_text_:model in 3851) [ClassicSimilarity], result of:
              0.08081928 = score(doc=3851,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.44150823 = fieldWeight in 3851, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3851)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes an intelligent agent based information retrieval model. The relevance matrix used by the intelligent agent consists of rows and columns; rows represent the documents and columns are used for keywords. Entries represent predetermined weights of keywords in documents. The search/query vector is constructed by the intelligent agent through explicit interaction with the user, using an interactive query refinement techniques. With manipulation of the relevance matrix against the search vector, the agent uses the manipulated information to filter the document representations and retrieve the most relevant documents, consequently improving the retrieval performance. Work is in progress on an experiment to compare the retrieval results from a conventional retrieval model and an intelligent agent based retrieval model. A test document collection on artificial intelligence has been selected as a sample. Retrieval tests are being carried out on a selected group of researchers using the 2 retrieval systems. Results will be compared to assess the retrieval performance using precision and recall matrices
  7. ¬The Second Text Retrieval Conference : TREC-2 (1995) 0.11
    0.10839763 = product of:
      0.21679527 = sum of:
        0.130858 = weight(_text_:vector in 1320) [ClassicSimilarity], result of:
          0.130858 = score(doc=1320,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.4268754 = fieldWeight in 1320, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=1320)
        0.08593727 = weight(_text_:space in 1320) [ClassicSimilarity], result of:
          0.08593727 = score(doc=1320,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 1320, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=1320)
      0.5 = coord(2/4)
    
    Content
    Enthält die Beiträge: HARMAN, D.: Overview of the Second Text Retrieval Conference (TREC-2); SPRACK JONES, K.: Reflections on TREC; BUCKLEY, C., J. ALLAN u. G. SALTON: Automatic routing and retrieval using SMART: TREC-2; CALLAN, J.P., W.B. CROFT u. J. BROGLIO: TREC and TIPSTER experiments with INQUERY; ROBERTSON, S.R., S. WALKER u. M.M. HANCOCK-BEAULIEU: Large test collection experiments on an operational, interactive system: OKAPI at TREC; ZOBEL, J., A. MOFFAT, R. WILKINSON u. R. SACKS-DAVIS: Efficient retrieval of partial documents; METTLER, M. u. F. NORDBY: TREC routing experiments with the TRW/Paracel Fast Data Finder; EVANS, D.A. u. R.G. LEFFERTS: CLARIT-TREC experiments; STRZALKOWSKI, T.: Natural language information retrieval; CAID, W.R., S.T. DUMAIS u. S.I. GALLANT: Learned vector-space models for document retrieval; BELKIN, N.J. P. KANTOR, E.A. FOX u. J.A. SHAW: Combining the evidence of multiple query representations for information retrieval
  8. Cross-language information retrieval (1998) 0.09
    0.089183114 = product of:
      0.11891082 = sum of:
        0.05452417 = weight(_text_:vector in 6299) [ClassicSimilarity], result of:
          0.05452417 = score(doc=6299,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.17786475 = fieldWeight in 6299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
        0.050639022 = weight(_text_:space in 6299) [ClassicSimilarity], result of:
          0.050639022 = score(doc=6299,freq=4.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.20384288 = fieldWeight in 6299, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
        0.013747636 = product of:
          0.027495272 = sum of:
            0.027495272 = weight(_text_:model in 6299) [ClassicSimilarity], result of:
              0.027495272 = score(doc=6299,freq=4.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.15020414 = fieldWeight in 6299, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.01953125 = fieldNorm(doc=6299)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness
    Footnote
    Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
    The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."
  9. Ding, C.H.Q.: ¬A probabilistic model for Latent Semantic Indexing (2005) 0.06
    0.06317346 = product of:
      0.12634692 = sum of:
        0.08593727 = weight(_text_:space in 3459) [ClassicSimilarity], result of:
          0.08593727 = score(doc=3459,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 3459, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=3459)
        0.04040964 = product of:
          0.08081928 = sum of:
            0.08081928 = weight(_text_:model in 3459) [ClassicSimilarity], result of:
              0.08081928 = score(doc=3459,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.44150823 = fieldWeight in 3459, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3459)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Latent Semantic Indexing (LSI), when applied to semantic space built an text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based an the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments an five standard document collections confirm and illustrate the analysis.
  10. Dalrymple, P.W.: Retrieval by reformulation in two library catalogs : toward a cognitive model of searching behavior (1990) 0.05
    0.049793392 = product of:
      0.19917357 = sum of:
        0.19917357 = sum of:
          0.10887573 = weight(_text_:model in 5089) [ClassicSimilarity], result of:
            0.10887573 = score(doc=5089,freq=2.0), product of:
              0.1830527 = queryWeight, product of:
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.047605187 = queryNorm
              0.59477806 = fieldWeight in 5089, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.109375 = fieldNorm(doc=5089)
          0.09029783 = weight(_text_:22 in 5089) [ClassicSimilarity], result of:
            0.09029783 = score(doc=5089,freq=2.0), product of:
              0.16670525 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047605187 = queryNorm
              0.5416616 = fieldWeight in 5089, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.109375 = fieldNorm(doc=5089)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 18:43:54
  11. Chu, H.: Factors affecting relevance judgment : a report from TREC Legal track (2011) 0.04
    0.043869503 = product of:
      0.087739006 = sum of:
        0.07161439 = weight(_text_:space in 4540) [ClassicSimilarity], result of:
          0.07161439 = score(doc=4540,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.28827736 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4540)
        0.016124614 = product of:
          0.032249227 = sum of:
            0.032249227 = weight(_text_:22 in 4540) [ClassicSimilarity], result of:
              0.032249227 = score(doc=4540,freq=2.0), product of:
                0.16670525 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047605187 = queryNorm
                0.19345059 = fieldWeight in 4540, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4540)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose - This study intends to identify factors that affect relevance judgment of retrieved information as part of the 2007 TREC Legal track interactive task. Design/methodology/approach - Data were gathered and analyzed from the participants of the 2007 TREC Legal track interactive task using a questionnaire which includes not only a list of 80 relevance factors identified in prior research, but also a space for expressing their thoughts on relevance judgment in the process. Findings - This study finds that topicality remains a primary criterion, out of various options, for determining relevance, while specificity of the search request, task, or retrieved results also helps greatly in relevance judgment. Research limitations/implications - Relevance research should focus on the topicality and specificity of what is being evaluated as well as conducted in real environments. Practical implications - If multiple relevance factors are presented to assessors, the total number in a list should be below ten to take account of the limited processing capacity of human beings' short-term memory. Otherwise, the assessors might either completely ignore or inadequately consider some of the relevance factors when making judgment decisions. Originality/value - This study presents a method for reducing the artificiality of relevance research design, an apparent limitation in many related studies. Specifically, relevance judgment was made in this research as part of the 2007 TREC Legal track interactive task rather than a study devised for the sake of it. The assessors also served as searchers so that their searching experience would facilitate their subsequent relevance judgments.
    Date
    12. 7.2011 18:29:22
  12. Kelledy, L.; Smeaton, A.F.: TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding (1997) 0.04
    0.042968635 = product of:
      0.17187454 = sum of:
        0.17187454 = weight(_text_:space in 3089) [ClassicSimilarity], result of:
          0.17187454 = score(doc=3089,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.6918657 = fieldWeight in 3089, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.09375 = fieldNorm(doc=3089)
      0.25 = coord(1/4)
    
  13. Newby, G.B.: Metric multidimensional information space (1997) 0.04
    0.042968635 = product of:
      0.17187454 = sum of:
        0.17187454 = weight(_text_:space in 3105) [ClassicSimilarity], result of:
          0.17187454 = score(doc=3105,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.6918657 = fieldWeight in 3105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.09375 = fieldNorm(doc=3105)
      0.25 = coord(1/4)
    
  14. Newby, G.B.: Cognitive space and information space (2001) 0.04
    0.03721193 = product of:
      0.14884771 = sum of:
        0.14884771 = weight(_text_:space in 6977) [ClassicSimilarity], result of:
          0.14884771 = score(doc=6977,freq=6.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.59917325 = fieldWeight in 6977, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=6977)
      0.25 = coord(1/4)
    
    Abstract
    This article works towards realization of exosomatic memory for information systems. In exosomatic memory systems, the information spaces of systems will be consistent with the cognitive spaces of their human users. A method for measuring concept relations in human cognitive space is presented: the paired comparison survey with Principal Components Analysis. A study to measure the cognitive spaces of 16 research participants is presented. Items measured include relations among seven TREC topic statements as well as 17 concepts from the topic statements. A method for automatically generating information spaces from document collections is presented that uses term cooccurrence, eigensystems analysis, and Principal Components Analysis. The extent of similarity between the cognitive spaces and the information spaces, which were derived independently from each other, is measured. A strong similarity between the information spaces and the cognitive spaces are found, indicating that the methods described may have good utility for working towards information systems that operate as exosomatic memories
  15. Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.03
    0.03485952 = product of:
      0.13943808 = sum of:
        0.13943808 = sum of:
          0.09428916 = weight(_text_:model in 3368) [ClassicSimilarity], result of:
            0.09428916 = score(doc=3368,freq=6.0), product of:
              0.1830527 = queryWeight, product of:
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.047605187 = queryNorm
              0.51509297 = fieldWeight in 3368, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3368)
          0.045148917 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
            0.045148917 = score(doc=3368,freq=2.0), product of:
              0.16670525 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047605187 = queryNorm
              0.2708308 = fieldWeight in 3368, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3368)
      0.25 = coord(1/4)
    
    Abstract
    The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
    Date
    22. 2.1996 13:14:10
  16. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.03
    0.0327145 = product of:
      0.130858 = sum of:
        0.130858 = weight(_text_:vector in 5699) [ClassicSimilarity], result of:
          0.130858 = score(doc=5699,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.4268754 = fieldWeight in 5699, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=5699)
      0.25 = coord(1/4)
    
    Abstract
    The Smart information retrieval project emphazises completely automatic approaches to the understanding and retrieval of large quantities of text. The work in the TREC-2 environment continues, performing both routing and ad hoc experiments. The ad hoc work extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document that matches the query. The performance of ad hoc runs is good, but it is clear that full advantage of the available local information is not been taken advantage of. The routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was previously done. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query
  17. Binder, G.; Stahl, M.; Faulborn, L.: Vergleichsuntersuchung MESSENGER-FULCRUM (2000) 0.03
    0.025065036 = product of:
      0.100260146 = sum of:
        0.100260146 = weight(_text_:space in 4885) [ClassicSimilarity], result of:
          0.100260146 = score(doc=4885,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.4035883 = fieldWeight in 4885, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4885)
      0.25 = coord(1/4)
    
    Abstract
    In einem Benutzertest, der im Rahmen der Projektes GIRT stattfand, wurde die Leistungsfähigkeit zweier Retrievalsprachen für die Datenbankrecherche überprüft. Die Ergebnisse werden in diesem Bericht dargestellt: Das System FULCRUM beruht auf automatischer Indexierung und liefert ein nach statistischer Relevanz sortiertes Suchergebnis. Die Standardfreitextsuche des Systems MESSENGER wurde um die intellektuell vom IZ vergebenen Deskriptoren ergänzt. Die Ergebnisse zeigen, dass in FULCRUM das Boole'sche Exakt-Match-Retrieval dem Verktos-Space-Modell (Best-Match-Verfahren) von den Versuchspersonen vorgezogen wurde. Die in MESSENGER realisierte Mischform aus intellektueller und automatischer Indexierung erwies sich gegenüber dem quantitativ-statistischen Ansatz beim Recall als überlegen
  18. Chen, H.; Martinez, J.; Kirchhoff, A.; Ng, T.D.; Schatz, B.R.: Alleviating search uncertainty through concept associations : automatic indexing, co-occurence analysis, and parallel computing (1998) 0.02
    0.021484317 = product of:
      0.08593727 = sum of:
        0.08593727 = weight(_text_:space in 5202) [ClassicSimilarity], result of:
          0.08593727 = score(doc=5202,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 5202, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=5202)
      0.25 = coord(1/4)
    
    Abstract
    In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400.000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compaed with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in 'concept recall', but in 'concept precision' the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase 'variety' in search terms the thereby reduce search uncertainty
  19. Boros, E.; Kantor, P.B.; Neu, D.J.: Pheromonic representation of user quests by digital structures (1999) 0.02
    0.017903598 = product of:
      0.07161439 = sum of:
        0.07161439 = weight(_text_:space in 6684) [ClassicSimilarity], result of:
          0.07161439 = score(doc=6684,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.28827736 = fieldWeight in 6684, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6684)
      0.25 = coord(1/4)
    
    Abstract
    In a novel approach to information finding in networked environments, each user's specific purpose or "quest" can be represented in numerous ways. The most familiar is a list of keywords, or a natural language sentence or paragraph. More effective is an extended text that has been judged as to relevance. This forms the basis of relevance feedback, as it is used in information retrieval. In the "Ant World" project (Ant World, 1999; Kantor et al., 1999b; Kantor et al., 1999a), the items to be retrieved are not documents, but rather quests, represented by entire collections of judged documents. In order to save space and time we have developed methods for representing these complex entities in a short string of about 1,000 bytes, which we call a "Digital Information Pheromone" (DIP). The principles for determining the DIP for a given quest, and for matching DIPs to each other are presented. The effectiveness of this scheme is explored with some applications to the large judged collections of TREC documents
  20. Bashir, S.; Rauber, A.: On the relationship between query characteristics and IR functions retrieval bias (2011) 0.02
    0.017903598 = product of:
      0.07161439 = sum of:
        0.07161439 = weight(_text_:space in 4628) [ClassicSimilarity], result of:
          0.07161439 = score(doc=4628,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.28827736 = fieldWeight in 4628, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4628)
      0.25 = coord(1/4)
    
    Abstract
    Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications. While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria for query generation for estimating retrievability. Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries. Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias. However, this approach does not consider the difference between different query characteristics (QCs) and their influence on retrieval functions' bias quantification. This article provides an in-depth study of retrievability over different QCs. It analyzes the correlation of lower/higher retrieval bias with different query characteristics. The presence of strong correlation between retrieval bias and query characteristics in experiments indicates the possibility of determining retrieval bias of retrieval functions without processing an exhaustive query set. Experiments are validated on TREC Chemical Retrieval Track consisting of 1.2 million patent documents.

Languages

  • e 70
  • d 4
  • f 1
  • More… Less…

Types

  • a 70
  • s 4
  • m 3
  • el 1
  • r 1
  • More… Less…