Search (5 results, page 1 of 1)

  • × theme_ss:"Retrievalalgorithmen"
  • × theme_ss:"Computerlinguistik"
  1. Frakes, W.B.: Stemming algorithms (1992) 0.01
    0.008201688 = product of:
      0.05741181 = sum of:
        0.05741181 = weight(_text_:studies in 3503) [ClassicSimilarity], result of:
          0.05741181 = score(doc=3503,freq=2.0), product of:
            0.1627809 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.04079441 = queryNorm
            0.35269377 = fieldWeight in 3503, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0625 = fieldNorm(doc=3503)
      0.14285715 = coord(1/7)
    
    Abstract
    Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
  2. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
    0.0074671716 = product of:
      0.0522702 = sum of:
        0.0522702 = weight(_text_:case in 2502) [ClassicSimilarity], result of:
          0.0522702 = score(doc=2502,freq=2.0), product of:
            0.17934912 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.04079441 = queryNorm
            0.29144385 = fieldWeight in 2502, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.14285715 = coord(1/7)
    
    Abstract
    Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
  3. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.01
    0.0062226425 = product of:
      0.043558497 = sum of:
        0.043558497 = weight(_text_:case in 1338) [ClassicSimilarity], result of:
          0.043558497 = score(doc=1338,freq=2.0), product of:
            0.17934912 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.04079441 = queryNorm
            0.24286987 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.14285715 = coord(1/7)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  4. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.00
    0.0049781143 = product of:
      0.034846798 = sum of:
        0.034846798 = weight(_text_:case in 2122) [ClassicSimilarity], result of:
          0.034846798 = score(doc=2122,freq=2.0), product of:
            0.17934912 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.04079441 = queryNorm
            0.1942959 = fieldWeight in 2122, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
      0.14285715 = coord(1/7)
    
    Abstract
    Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
  5. French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.00
    0.0041691167 = product of:
      0.029183816 = sum of:
        0.029183816 = weight(_text_:libraries in 4811) [ClassicSimilarity], result of:
          0.029183816 = score(doc=4811,freq=2.0), product of:
            0.13401186 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.04079441 = queryNorm
            0.2177704 = fieldWeight in 4811, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.046875 = fieldNorm(doc=4811)
      0.14285715 = coord(1/7)
    
    Abstract
    As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files