Search (4 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × author_ss:"Savoy, J."
  1. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.00
    0.0028993662 = product of:
      0.008698098 = sum of:
        0.008698098 = product of:
          0.017396197 = sum of:
            0.017396197 = weight(_text_:of in 4102) [ClassicSimilarity], result of:
              0.017396197 = score(doc=4102,freq=12.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.25392252 = fieldWeight in 4102, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4102)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2266-2273
  2. Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.00
    0.0027899165 = product of:
      0.008369749 = sum of:
        0.008369749 = product of:
          0.016739499 = sum of:
            0.016739499 = weight(_text_:of in 3042) [ClassicSimilarity], result of:
              0.016739499 = score(doc=3042,freq=16.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.24433708 = fieldWeight in 3042, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3042)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tf?idf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Z?score) to determine the most significantly overused word-types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term-specific measure (Z?score), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1858-1870
  3. Savoy, J.: ¬A stemming procedure and stopword list for general French Corpora (1999) 0.00
    0.0023673228 = product of:
      0.0071019684 = sum of:
        0.0071019684 = product of:
          0.014203937 = sum of:
            0.014203937 = weight(_text_:of in 4314) [ClassicSimilarity], result of:
              0.014203937 = score(doc=4314,freq=2.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.20732689 = fieldWeight in 4314, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4314)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Journal of the American Society for Information Science. 50(1999) no.10, S.944-954
  4. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.00
    0.0011836614 = product of:
      0.0035509842 = sum of:
        0.0035509842 = product of:
          0.0071019684 = sum of:
            0.0071019684 = weight(_text_:of in 2950) [ClassicSimilarity], result of:
              0.0071019684 = score(doc=2950,freq=2.0), product of:
                0.06850986 = queryWeight, product of:
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.043811057 = queryNorm
                0.103663445 = fieldWeight in 2950, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.5637573 = idf(docFreq=25162, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2950)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.8, S.1616-1624