Document (#21505)

Author
Frakes, W.B.
Title
Stemming algorithms
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.131-160
Abstract
Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
Theme
Computerlinguistik
Retrievalalgorithmen

Similar documents (content)

  1. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.27
    0.269279 = sum of:
      0.269279 = product of:
        1.1219959 = sum of:
          0.030859957 = weight(abstract_txt:improve in 6573) [ClassicSimilarity], result of:
            0.030859957 = score(doc=6573,freq=1.0), product of:
              0.06582094 = queryWeight, product of:
                1.0090814 = boost
                5.0010357 = idf(docFreq=781, maxDocs=42740)
                0.013043013 = queryNorm
              0.4688471 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0010357 = idf(docFreq=781, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
          0.033067353 = weight(abstract_txt:effectiveness in 6573) [ClassicSimilarity], result of:
            0.033067353 = score(doc=6573,freq=1.0), product of:
              0.06892342 = queryWeight, product of:
                1.0325892 = boost
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.013043013 = queryNorm
              0.47976947 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
          0.061878566 = weight(abstract_txt:reduce in 6573) [ClassicSimilarity], result of:
            0.061878566 = score(doc=6573,freq=1.0), product of:
              0.10466321 = queryWeight, product of:
                1.2724513 = boost
                6.3063045 = idf(docFreq=211, maxDocs=42740)
                0.013043013 = queryNorm
              0.591216 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3063045 = idf(docFreq=211, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
          0.07467664 = weight(abstract_txt:relate in 6573) [ClassicSimilarity], result of:
            0.07467664 = score(doc=6573,freq=1.0), product of:
              0.11863798 = queryWeight, product of:
                1.3547403 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.013043013 = queryNorm
              0.6294497 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
          0.040348627 = weight(abstract_txt:indexing in 6573) [ClassicSimilarity], result of:
            0.040348627 = score(doc=6573,freq=1.0), product of:
              0.09915844 = queryWeight, product of:
                1.7515559 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.013043013 = queryNorm
              0.40691066 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
          0.8811647 = weight(abstract_txt:stemming in 6573) [ClassicSimilarity], result of:
            0.8811647 = score(doc=6573,freq=3.0), product of:
              0.7290459 = queryWeight, product of:
                7.5094237 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.013043013 = queryNorm
              1.2086546 = fieldWeight in 6573, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.09375 = fieldNorm(doc=6573)
        0.24 = coord(6/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.24
    0.23572426 = sum of:
      0.23572426 = product of:
        1.1786213 = sum of:
          0.0220449 = weight(abstract_txt:effectiveness in 302) [ClassicSimilarity], result of:
            0.0220449 = score(doc=302,freq=1.0), product of:
              0.06892342 = queryWeight, product of:
                1.0325892 = boost
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.013043013 = queryNorm
              0.3198463 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.11906932 = weight(abstract_txt:gram in 302) [ClassicSimilarity], result of:
            0.11906932 = score(doc=302,freq=2.0), product of:
              0.16840358 = queryWeight, product of:
                1.6140605 = boost
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.013043013 = queryNorm
              0.70704746 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.026899084 = weight(abstract_txt:indexing in 302) [ClassicSimilarity], result of:
            0.026899084 = score(doc=302,freq=1.0), product of:
              0.09915844 = queryWeight, product of:
                1.7515559 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.013043013 = queryNorm
              0.27127376 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.17983785 = weight(abstract_txt:stemmer in 302) [ClassicSimilarity], result of:
            0.17983785 = score(doc=302,freq=2.0), product of:
              0.22168627 = queryWeight, product of:
                1.851883 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.013043013 = queryNorm
              0.81122684 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.8307702 = weight(abstract_txt:stemming in 302) [ClassicSimilarity], result of:
            0.8307702 = score(doc=302,freq=6.0), product of:
              0.7290459 = queryWeight, product of:
                7.5094237 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.013043013 = queryNorm
              1.1395307 = fieldWeight in 302, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
        0.2 = coord(5/25)
    
  3. Paice, C.D.: Method for evaluation of stemming algorithms based on error counting (1996) 0.23
    0.2260072 = sum of:
      0.2260072 = product of:
        1.412545 = sum of:
          0.038578577 = weight(abstract_txt:effectiveness in 5868) [ClassicSimilarity], result of:
            0.038578577 = score(doc=5868,freq=1.0), product of:
              0.06892342 = queryWeight, product of:
                1.0325892 = boost
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.013043013 = queryNorm
              0.559731 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.117541 = idf(docFreq=695, maxDocs=42740)
                0.109375 = fieldNorm(doc=5868)
          0.23606542 = weight(abstract_txt:porter in 5868) [ClassicSimilarity], result of:
            0.23606542 = score(doc=5868,freq=1.0), product of:
              0.2305814 = queryWeight, product of:
                1.8886709 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013043013 = queryNorm
              1.0237834 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.109375 = fieldNorm(doc=5868)
          0.109875426 = weight(abstract_txt:algorithms in 5868) [ClassicSimilarity], result of:
            0.109875426 = score(doc=5868,freq=1.0), product of:
              0.1744803 = queryWeight, product of:
                2.3234448 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.013043013 = queryNorm
              0.6297297 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.109375 = fieldNorm(doc=5868)
          1.0280255 = weight(abstract_txt:stemming in 5868) [ClassicSimilarity], result of:
            1.0280255 = score(doc=5868,freq=3.0), product of:
              0.7290459 = queryWeight, product of:
                7.5094237 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.013043013 = queryNorm
              1.4100971 = fieldWeight in 5868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.109375 = fieldNorm(doc=5868)
        0.16 = coord(4/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.21
    0.20659824 = sum of:
      0.20659824 = product of:
        1.291239 = sum of:
          0.22479732 = weight(abstract_txt:stemmer in 5867) [ClassicSimilarity], result of:
            0.22479732 = score(doc=5867,freq=2.0), product of:
              0.22168627 = queryWeight, product of:
                1.851883 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.013043013 = queryNorm
              1.0140336 = fieldWeight in 5867, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.16351962 = weight(abstract_txt:morphologically in 5867) [ClassicSimilarity], result of:
            0.16351962 = score(doc=5867,freq=1.0), product of:
              0.22590956 = queryWeight, product of:
                1.8694397 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.013043013 = queryNorm
              0.7238278 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.16861816 = weight(abstract_txt:porter in 5867) [ClassicSimilarity], result of:
            0.16861816 = score(doc=5867,freq=1.0), product of:
              0.2305814 = queryWeight, product of:
                1.8886709 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013043013 = queryNorm
              0.7312739 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.734304 = weight(abstract_txt:stemming in 5867) [ClassicSimilarity], result of:
            0.734304 = score(doc=5867,freq=3.0), product of:
              0.7290459 = queryWeight, product of:
                7.5094237 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.013043013 = queryNorm
              1.0072123 = fieldWeight in 5867, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
        0.16 = coord(4/25)
    
  5. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.20
    0.20428604 = sum of:
      0.20428604 = product of:
        1.2767878 = sum of:
          0.075771354 = weight(abstract_txt:files in 3586) [ClassicSimilarity], result of:
            0.075771354 = score(doc=3586,freq=2.0), product of:
              0.08579515 = queryWeight, product of:
                1.1520611 = boost
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.013043013 = queryNorm
              0.88316596 = fieldWeight in 3586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.4976101 = weight(abstract_txt:stemmer in 3586) [ClassicSimilarity], result of:
            0.4976101 = score(doc=3586,freq=5.0), product of:
              0.22168627 = queryWeight, product of:
                1.851883 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.013043013 = queryNorm
              2.244659 = fieldWeight in 3586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.109875426 = weight(abstract_txt:algorithms in 3586) [ClassicSimilarity], result of:
            0.109875426 = score(doc=3586,freq=1.0), product of:
              0.1744803 = queryWeight, product of:
                2.3234448 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.013043013 = queryNorm
              0.6297297 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.5935309 = weight(abstract_txt:stemming in 3586) [ClassicSimilarity], result of:
            0.5935309 = score(doc=3586,freq=1.0), product of:
              0.7290459 = queryWeight, product of:
                7.5094237 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.013043013 = queryNorm
              0.81412 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
        0.16 = coord(4/25)