Search (104 results, page 1 of 6)

  • × theme_ss:"Retrievalstudien"
  1. Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.11
    0.11080296 = product of:
      0.16620444 = sum of:
        0.14280158 = weight(_text_:query in 3368) [ClassicSimilarity], result of:
          0.14280158 = score(doc=3368,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.62258047 = fieldWeight in 3368, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3368)
        0.023402857 = product of:
          0.046805713 = sum of:
            0.046805713 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
              0.046805713 = score(doc=3368,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.2708308 = fieldWeight in 3368, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3368)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
    Date
    22. 2.1996 13:14:10
  2. Pal, S.; Mitra, M.; Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval (2011) 0.07
    0.0666666 = product of:
      0.0999999 = sum of:
        0.08328357 = weight(_text_:query in 4197) [ClassicSimilarity], result of:
          0.08328357 = score(doc=4197,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 4197, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4197)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 4197) [ClassicSimilarity], result of:
              0.03343265 = score(doc=4197,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 4197, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4197)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The Initiative for the Evaluation of XML retrieval (INEX) provides a TREC-like platform for evaluating content-oriented XML retrieval systems. Since 2007, INEX has been using a set of precision-recall based metrics for its ad hoc tasks. The authors investigate the reliability and robustness of these focused retrieval measures, and of the INEX pooling method. They explore four specific questions: How reliable are the metrics when assessments are incomplete, or when query sets are small? What is the minimum pool/query-set size that can be used to reliably evaluate systems? Can the INEX collections be used to fairly evaluate "new" systems that did not participate in the pooling process? And, for a fixed amount of assessment effort, would this effort be better spent in thoroughly judging a few queries, or in judging many queries relatively superficially? The authors' findings validate properties of precision-recall-based metrics observed in document retrieval settings. Early precision measures are found to be more error-prone and less stable under incomplete judgments and small topic-set sizes. They also find that system rankings remain largely unaffected even when assessment effort is substantially (but systematically) reduced, and confirm that the INEX collections remain usable when evaluating nonparticipating systems. Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries. However, when judging only a random sample of a pool, it is better to completely judge fewer topics than to partially judge many topics. This result confirms the effectiveness of pooling methods.
    Date
    22. 1.2011 14:20:56
  3. Rao, A.; Lu, A.; Meier, E.; Ahmed, S.; Pliske, D.: Query processing in TREC6 (2000) 0.05
    0.054964356 = product of:
      0.16489306 = sum of:
        0.16489306 = weight(_text_:query in 6420) [ClassicSimilarity], result of:
          0.16489306 = score(doc=6420,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.71889395 = fieldWeight in 6420, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.109375 = fieldNorm(doc=6420)
      0.33333334 = coord(1/3)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.05
    0.052673157 = product of:
      0.15801947 = sum of:
        0.15801947 = weight(_text_:query in 5699) [ClassicSimilarity], result of:
          0.15801947 = score(doc=5699,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.68892676 = fieldWeight in 5699, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=5699)
      0.33333334 = coord(1/3)
    
    Abstract
    The Smart information retrieval project emphazises completely automatic approaches to the understanding and retrieval of large quantities of text. The work in the TREC-2 environment continues, performing both routing and ad hoc experiments. The ad hoc work extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document that matches the query. The performance of ad hoc runs is good, but it is clear that full advantage of the available local information is not been taken advantage of. The routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was previously done. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query
  5. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.05
    0.052673157 = product of:
      0.15801947 = sum of:
        0.15801947 = weight(_text_:query in 2342) [ClassicSimilarity], result of:
          0.15801947 = score(doc=2342,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.68892676 = fieldWeight in 2342, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
  6. Bashir, S.; Rauber, A.: On the relationship between query characteristics and IR functions retrieval bias (2011) 0.05
    0.051936433 = product of:
      0.1558093 = sum of:
        0.1558093 = weight(_text_:query in 4628) [ClassicSimilarity], result of:
          0.1558093 = score(doc=4628,freq=14.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.67929095 = fieldWeight in 4628, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4628)
      0.33333334 = coord(1/3)
    
    Abstract
    Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications. While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria for query generation for estimating retrievability. Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries. Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias. However, this approach does not consider the difference between different query characteristics (QCs) and their influence on retrieval functions' bias quantification. This article provides an in-depth study of retrievability over different QCs. It analyzes the correlation of lower/higher retrieval bias with different query characteristics. The presence of strong correlation between retrieval bias and query characteristics in experiments indicates the possibility of determining retrieval bias of retrieval functions without processing an exhaustive query set. Experiments are validated on TREC Chemical Retrieval Track consisting of 1.2 million patent documents.
  7. Belkin, N.J.: ¬An overview of results from Rutgers' investigations of interactive information retrieval (1998) 0.05
    0.05040447 = product of:
      0.075606704 = sum of:
        0.05889038 = weight(_text_:query in 2339) [ClassicSimilarity], result of:
          0.05889038 = score(doc=2339,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 2339, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2339)
        0.016716326 = product of:
          0.03343265 = sum of:
            0.03343265 = weight(_text_:22 in 2339) [ClassicSimilarity], result of:
              0.03343265 = score(doc=2339,freq=2.0), product of:
                0.1728227 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049352113 = queryNorm
                0.19345059 = fieldWeight in 2339, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2339)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Over the last 4 years, the Information Interaction Laboratory at Rutgers' School of communication, Information and Library Studies has performed a series of investigations concerned with various aspects of people's interactions with advanced information retrieval (IR) systems. We have benn especially concerned with understanding not just what people do, and why, and with what effect, but also with what they would like to do, and how they attempt to accomplish it, and with what difficulties. These investigations have led to some quite interesting conclusions about the nature and structure of people's interactions with information, about support for cooperative human-computer interaction in query reformulation, and about the value of visualization of search results for supporting various forms of interaction with information. In this discussion, I give an overview of the research program and its projects, present representative results from the projects, and discuss some implications of these results for support of subject searching in information retrieval systems
    Date
    22. 9.1997 19:16:05
  8. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 0.05
    0.04760053 = product of:
      0.14280158 = sum of:
        0.14280158 = weight(_text_:query in 5907) [ClassicSimilarity], result of:
          0.14280158 = score(doc=5907,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.62258047 = fieldWeight in 5907, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5907)
      0.33333334 = coord(1/3)
    
    Abstract
    Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery in a subcollection of the TREC collection, which contained some 515,000 documents
  9. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 0.05
    0.04760053 = product of:
      0.14280158 = sum of:
        0.14280158 = weight(_text_:query in 4225) [ClassicSimilarity], result of:
          0.14280158 = score(doc=4225,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.62258047 = fieldWeight in 4225, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4225)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback-Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.
  10. Davis, M.; Dunning, T.: ¬A TREC evaluation of query translation methods for multi-lingual text retrieval (1996) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 1917) [ClassicSimilarity], result of:
          0.14133692 = score(doc=1917,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 1917, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.09375 = fieldNorm(doc=1917)
      0.33333334 = coord(1/3)
    
  11. Gauch, S.; Wang, J.: Corpus analysis for TREC 5 query expansion (1997) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 5800) [ClassicSimilarity], result of:
          0.14133692 = score(doc=5800,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 5800, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.09375 = fieldNorm(doc=5800)
      0.33333334 = coord(1/3)
    
  12. Kelledy, L.; Smeaton, A.F.: TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding (1997) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 3089) [ClassicSimilarity], result of:
          0.14133692 = score(doc=3089,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 3089, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.09375 = fieldNorm(doc=3089)
      0.33333334 = coord(1/3)
    
  13. Singhal, A.; Buckley, C.; Mitra, M.: Using query zoning and correlation with SMART : TREC 5 (1997) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 3090) [ClassicSimilarity], result of:
          0.14133692 = score(doc=3090,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 3090, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.09375 = fieldNorm(doc=3090)
      0.33333334 = coord(1/3)
    
  14. Burnett, M.; Jones, R.; Pape, L.: InTEXT automatic query enhancements in TREC-5 (1997) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 3091) [ClassicSimilarity], result of:
          0.14133692 = score(doc=3091,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 3091, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.09375 = fieldNorm(doc=3091)
      0.33333334 = coord(1/3)
    
  15. Li, J.; Zhang, P.; Song, D.; Wu, Y.: Understanding an enriched multidimensional user relevance model by analyzing query logs (2017) 0.05
    0.04711231 = product of:
      0.14133692 = sum of:
        0.14133692 = weight(_text_:query in 3961) [ClassicSimilarity], result of:
          0.14133692 = score(doc=3961,freq=8.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.61619484 = fieldWeight in 3961, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=3961)
      0.33333334 = coord(1/3)
    
    Abstract
    Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small-scale user studies, which may not reflect a real-world and natural search scenario. In this article, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability, and scope) by providing two additional dimensions, that is, interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing's query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
  16. Kristensen, J.: Expanding end-users' query statements for free text searching with a search-aid thesaurus (1993) 0.04
    0.04441791 = product of:
      0.13325372 = sum of:
        0.13325372 = weight(_text_:query in 6621) [ClassicSimilarity], result of:
          0.13325372 = score(doc=6621,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5809541 = fieldWeight in 6621, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0625 = fieldNorm(doc=6621)
      0.33333334 = coord(1/3)
    
    Abstract
    Tests the effectiveness of a thesaurus as a search-aid in free text searching of a full text database. A set of queries was searched against a large full text database of newspaper articles. The thesaurus contained equivalence, hierarchical and associative relationships. Each query was searched in five modes: basic search, synonym search, narrower term search, related term search, and union of all previous searches. The searches were analyzed in terms of relative recall and precision
  17. Turtle, H.; Flood, J.: Query evaluation : strategies and optimizations (1995) 0.04
    0.04441791 = product of:
      0.13325372 = sum of:
        0.13325372 = weight(_text_:query in 4087) [ClassicSimilarity], result of:
          0.13325372 = score(doc=4087,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5809541 = fieldWeight in 4087, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0625 = fieldNorm(doc=4087)
      0.33333334 = coord(1/3)
    
    Abstract
    Discusses the 2 major query evaluation strategies used in large text retrieval systems and analyzes the performance of these strategies. Discusses several optimization techniques that can be used to reduce evaluation costs and present simulation results to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials
  18. Sarigil, E.; Sengor Altingovde, I.; Blanco, R.; Barla Cambazoglu, B.; Ozcan, R.; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results (2018) 0.04
    0.0438943 = product of:
      0.13168289 = sum of:
        0.13168289 = weight(_text_:query in 4039) [ClassicSimilarity], result of:
          0.13168289 = score(doc=4039,freq=10.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5741056 = fieldWeight in 4039, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4039)
      0.33333334 = coord(1/3)
    
    Abstract
    A non-negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this work, we provide a detailed characterization of such queries and show that search engines try to improve such queries by showing the results of related queries. Through a user study, we show that these query suggestions are usually perceived as relevant. Also, through a query log analysis, we show that the users are dissatisfied after submitting a query that match no results at least 88.5% of the time. As a first step towards solving these no-answer queries, we devised a large number of features that can be used to identify such queries and built machine-learning models. These models can be useful for scenarios such as the mobile- or meta-search, where identifying a query that will retrieve no results at the client device (i.e., even before submitting it to the search engine) may yield gains in terms of the bandwidth usage, power consumption, and/or monetary costs. Experiments over query logs indicate that, despite the heavy skew in class sizes, our models achieve good prediction quality, with accuracy (in terms of area under the curve) up to 0.95.
  19. Ruthven, I.; Lalmas, M.; Rijsbergen, K. van: Combining and selecting characteristics of information use (2002) 0.04
    0.041549146 = product of:
      0.12464744 = sum of:
        0.12464744 = weight(_text_:query in 5208) [ClassicSimilarity], result of:
          0.12464744 = score(doc=5208,freq=14.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5434328 = fieldWeight in 5208, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.03125 = fieldNorm(doc=5208)
      0.33333334 = coord(1/3)
    
    Abstract
    Ruthven, Lalmas, and van Rijsbergen use traditional term importance measures like inverse document frequency, noise, based upon in-document frequency, and term frequency supplemented by theme value which is calculated from differences of expected positions of words in a text from their actual positions, on the assumption that even distribution indicates term association with a main topic, and context, which is based on a query term's distance from the nearest other query term relative to the average expected distribution of all query terms in the document. They then define document characteristics like specificity, the sum of all idf values in a document over the total terms in the document, or document complexity, measured by the documents average idf value; and information to noise ratio, info-noise, tokens after stopping and stemming over tokens before these processes, measuring the ratio of useful and non-useful information in a document. Retrieval tests are then carried out using each characteristic, combinations of the characteristics, and relevance feedback to determine the correct combination of characteristics. A file ranks independently of query terms by both specificity and info-noise, but if presence of a query term is required unique rankings are generated. Tested on five standard collections the traditional characteristics out preformed the new characteristics, which did, however, out preform random retrieval. All possible combinations of characteristics were also tested both with and without a set of scaling weights applied. All characteristics can benefit by combination with another characteristic or set of characteristics and performance as a single characteristic is a good indicator of performance in combination. Larger combinations tended to be more effective than smaller ones and weighting increased precision measures of middle ranking combinations but decreased the ranking of poorer combinations. The best combinations vary for each collection, and in some collections with the addition of weighting. Finally, with all documents ranked by the all characteristics combination, they take the top 30 documents and calculate the characteristic scores for each term in both the relevant and the non-relevant sets. Then taking for each query term the characteristics whose average was higher for relevant than non-relevant documents the documents are re-ranked. The relevance feedback method of selecting characteristics can select a good set of characteristics for query terms.
  20. Losee, R.M.: Evaluating retrieval performance given database and query characteristics : analytic determination of performance surfaces (1996) 0.04
    0.040800452 = product of:
      0.12240136 = sum of:
        0.12240136 = weight(_text_:query in 4162) [ClassicSimilarity], result of:
          0.12240136 = score(doc=4162,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.5336404 = fieldWeight in 4162, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=4162)
      0.33333334 = coord(1/3)
    
    Abstract
    An analytic method of information retrieval and filtering evaluation can quantitatively predict the expected number of documents examined in retrieving a relevant document. It also allows researchers and practioners to qualitatively understand how varying different estimates of query parameter values affects retrieval performance. The incoorporation of relevance feedback to increase our knowledge about the parameters of relevant documents and the robustness of parameter estimates is modeled. Single term and two term independence models, as well as a complete term dependence model, are developed. An economic model of retrieval performance may be used to study the effects of database size and to provide analytic answers to questions comparing retrieval from small and large databases, as well as questions about the number of terms in a query. Results are presented as a performance surface, a three dimensional graph showing the effects of two independent variables on performance.

Years

Languages

  • e 98
  • d 3
  • chi 1
  • f 1
  • More… Less…

Types

  • a 98
  • s 5
  • m 3
  • el 2
  • More… Less…