Search (5 results, page 1 of 1)

  • × author_ss:"Savoy, J."
  • × theme_ss:"Computerlinguistik"
  1. Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.01
    0.011655438 = product of:
      0.023310876 = sum of:
        0.023310876 = product of:
          0.034966312 = sum of:
            0.031998128 = weight(_text_:k in 3042) [ClassicSimilarity], result of:
              0.031998128 = score(doc=3042,freq=2.0), product of:
                0.16225883 = queryWeight, product of:
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.04545348 = queryNorm
                0.19720423 = fieldWeight in 3042, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.569778 = idf(docFreq=3384, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3042)
            0.0029681858 = weight(_text_:s in 3042) [ClassicSimilarity], result of:
              0.0029681858 = score(doc=3042,freq=2.0), product of:
                0.049418733 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.04545348 = queryNorm
                0.060061958 = fieldWeight in 3042, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3042)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tf?idf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Z?score) to determine the most significantly overused word-types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term-specific measure (Z?score), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1858-1870
  2. Savoy, J.: ¬A stemming procedure and stopword list for general French Corpora (1999) 0.00
    0.0011872743 = product of:
      0.0023745487 = sum of:
        0.0023745487 = product of:
          0.007123646 = sum of:
            0.007123646 = weight(_text_:s in 4314) [ClassicSimilarity], result of:
              0.007123646 = score(doc=4314,freq=2.0), product of:
                0.049418733 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.04545348 = queryNorm
                0.14414869 = fieldWeight in 4314, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4314)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    Journal of the American Society for Information Science. 50(1999) no.10, S.944-954
  3. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.00
    5.936372E-4 = product of:
      0.0011872743 = sum of:
        0.0011872743 = product of:
          0.003561823 = sum of:
            0.003561823 = weight(_text_:s in 2037) [ClassicSimilarity], result of:
              0.003561823 = score(doc=2037,freq=2.0), product of:
                0.049418733 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.04545348 = queryNorm
                0.072074346 = fieldWeight in 2037, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2037)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    Information processing and management. 44(2008) no.1, S.310-324
  4. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.00
    5.936372E-4 = product of:
      0.0011872743 = sum of:
        0.0011872743 = product of:
          0.003561823 = sum of:
            0.003561823 = weight(_text_:s in 2950) [ClassicSimilarity], result of:
              0.003561823 = score(doc=2950,freq=2.0), product of:
                0.049418733 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.04545348 = queryNorm
                0.072074346 = fieldWeight in 2950, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2950)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.8, S.1616-1624
  5. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.00
    5.936372E-4 = product of:
      0.0011872743 = sum of:
        0.0011872743 = product of:
          0.003561823 = sum of:
            0.003561823 = weight(_text_:s in 4102) [ClassicSimilarity], result of:
              0.003561823 = score(doc=4102,freq=2.0), product of:
                0.049418733 = queryWeight, product of:
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.04545348 = queryNorm
                0.072074346 = fieldWeight in 4102, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.0872376 = idf(docFreq=40523, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4102)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2266-2273