Search (5 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × author_ss:"Savoy, J."
  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.01
    0.0059455284 = product of:
      0.041618697 = sum of:
        0.0104854815 = weight(_text_:information in 2950) [ClassicSimilarity], result of:
          0.0104854815 = score(doc=2950,freq=6.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.20156369 = fieldWeight in 2950, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2950)
        0.031133216 = weight(_text_:retrieval in 2950) [ClassicSimilarity], result of:
          0.031133216 = score(doc=2950,freq=6.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.34732026 = fieldWeight in 2950, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2950)
      0.14285715 = coord(2/14)
    
    Abstract
    It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.8, S.1616-1624
  2. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.01
    0.005670654 = product of:
      0.039694577 = sum of:
        0.00856136 = weight(_text_:information in 4102) [ClassicSimilarity], result of:
          0.00856136 = score(doc=4102,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.16457605 = fieldWeight in 4102, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
        0.031133216 = weight(_text_:retrieval in 4102) [ClassicSimilarity], result of:
          0.031133216 = score(doc=4102,freq=6.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.34732026 = fieldWeight in 4102, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
      0.14285715 = coord(2/14)
    
    Abstract
    This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2266-2273
  3. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.00
    0.0034326524 = product of:
      0.024028566 = sum of:
        0.0060537956 = weight(_text_:information in 2037) [ClassicSimilarity], result of:
          0.0060537956 = score(doc=2037,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.116372846 = fieldWeight in 2037, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2037)
        0.01797477 = weight(_text_:retrieval in 2037) [ClassicSimilarity], result of:
          0.01797477 = score(doc=2037,freq=2.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.20052543 = fieldWeight in 2037, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2037)
      0.14285715 = coord(2/14)
    
    Abstract
    This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.
    Source
    Information processing and management. 44(2008) no.1, S.310-324
  4. Savoy, J.: ¬A stemming procedure and stopword list for general French Corpora (1999) 0.00
    8.64828E-4 = product of:
      0.012107591 = sum of:
        0.012107591 = weight(_text_:information in 4314) [ClassicSimilarity], result of:
          0.012107591 = score(doc=4314,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.23274569 = fieldWeight in 4314, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.09375 = fieldNorm(doc=4314)
      0.071428575 = coord(1/14)
    
    Source
    Journal of the American Society for Information Science. 50(1999) no.10, S.944-954
  5. Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.00
    3.6034497E-4 = product of:
      0.0050448296 = sum of:
        0.0050448296 = weight(_text_:information in 3042) [ClassicSimilarity], result of:
          0.0050448296 = score(doc=3042,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.09697737 = fieldWeight in 3042, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3042)
      0.071428575 = coord(1/14)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1858-1870