Document (#21504)

Author
Frakes, W.B.
Title
Stemming algorithms
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.131-160
Abstract
Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
Theme
Computerlinguistik
Retrievalalgorithmen

Similar documents (content)

  1. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.27
    0.26924822 = sum of:
      0.26924822 = product of:
        1.1218677 = sum of:
          0.030115064 = weight(abstract_txt:improve in 6504) [ClassicSimilarity], result of:
            0.030115064 = score(doc=6504,freq=1.0), product of:
              0.06476462 = queryWeight, product of:
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.013057592 = queryNorm
              0.46499252 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.03270814 = weight(abstract_txt:effectiveness in 6504) [ClassicSimilarity], result of:
            0.03270814 = score(doc=6504,freq=1.0), product of:
              0.06843094 = queryWeight, product of:
                1.0279154 = boost
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.013057592 = queryNorm
              0.47797295 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.061411917 = weight(abstract_txt:reduce in 6504) [ClassicSimilarity], result of:
            0.061411917 = score(doc=6504,freq=1.0), product of:
              0.10414787 = queryWeight, product of:
                1.2681081 = boost
                6.2897153 = idf(docFreq=222, maxDocs=44218)
                0.013057592 = queryNorm
              0.5896608 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2897153 = idf(docFreq=222, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.07377478 = weight(abstract_txt:relate in 6504) [ClassicSimilarity], result of:
            0.07377478 = score(doc=6504,freq=1.0), product of:
              0.11769388 = queryWeight, product of:
                1.3480563 = boost
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.013057592 = queryNorm
              0.6268361 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.040619437 = weight(abstract_txt:indexing in 6504) [ClassicSimilarity], result of:
            0.040619437 = score(doc=6504,freq=1.0), product of:
              0.09961266 = queryWeight, product of:
                1.7538941 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.013057592 = queryNorm
              0.40777382 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.88323826 = weight(abstract_txt:stemming in 6504) [ClassicSimilarity], result of:
            0.88323826 = score(doc=6504,freq=3.0), product of:
              0.73026997 = queryWeight, product of:
                7.5085797 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.013057592 = queryNorm
              1.2094681 = fieldWeight in 6504, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
        0.24 = coord(6/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.24
    0.23596056 = sum of:
      0.23596056 = product of:
        1.1798028 = sum of:
          0.021805424 = weight(abstract_txt:effectiveness in 3301) [ClassicSimilarity], result of:
            0.021805424 = score(doc=3301,freq=1.0), product of:
              0.06843094 = queryWeight, product of:
                1.0279154 = boost
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.013057592 = queryNorm
              0.31864864 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.116288565 = weight(abstract_txt:gram in 3301) [ClassicSimilarity], result of:
            0.116288565 = score(doc=3301,freq=2.0), product of:
              0.16578966 = queryWeight, product of:
                1.5999626 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.013057592 = queryNorm
              0.7014223 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.027079623 = weight(abstract_txt:indexing in 3301) [ClassicSimilarity], result of:
            0.027079623 = score(doc=3301,freq=1.0), product of:
              0.09961266 = queryWeight, product of:
                1.7538941 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.013057592 = queryNorm
              0.27184922 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.18190409 = weight(abstract_txt:stemmer in 3301) [ClassicSimilarity], result of:
            0.18190409 = score(doc=3301,freq=2.0), product of:
              0.2234058 = queryWeight, product of:
                1.857284 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013057592 = queryNorm
              0.81423175 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.83272505 = weight(abstract_txt:stemming in 3301) [ClassicSimilarity], result of:
            0.83272505 = score(doc=3301,freq=6.0), product of:
              0.73026997 = queryWeight, product of:
                7.5085797 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.013057592 = queryNorm
              1.1402975 = fieldWeight in 3301, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
        0.2 = coord(5/25)
    
  3. Paice, C.D.: Method for evaluation of stemming algorithms based on error counting (1996) 0.23
    0.22630814 = sum of:
      0.22630814 = product of:
        1.4144258 = sum of:
          0.038159493 = weight(abstract_txt:effectiveness in 5799) [ClassicSimilarity], result of:
            0.038159493 = score(doc=5799,freq=1.0), product of:
              0.06843094 = queryWeight, product of:
                1.0279154 = boost
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.013057592 = queryNorm
              0.5576351 = fieldWeight in 5799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.098378 = idf(docFreq=733, maxDocs=44218)
                0.109375 = fieldNorm(doc=5799)
          0.2387262 = weight(abstract_txt:porter in 5799) [ClassicSimilarity], result of:
            0.2387262 = score(doc=5799,freq=1.0), product of:
              0.23233652 = queryWeight, product of:
                1.894043 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.013057592 = queryNorm
              1.0275018 = fieldWeight in 5799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.109375 = fieldNorm(doc=5799)
          0.107095554 = weight(abstract_txt:algorithms in 5799) [ClassicSimilarity], result of:
            0.107095554 = score(doc=5799,freq=1.0), product of:
              0.1715438 = queryWeight, product of:
                2.30162 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.013057592 = queryNorm
              0.6243044 = fieldWeight in 5799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.109375 = fieldNorm(doc=5799)
          1.0304446 = weight(abstract_txt:stemming in 5799) [ClassicSimilarity], result of:
            1.0304446 = score(doc=5799,freq=3.0), product of:
              0.73026997 = queryWeight, product of:
                7.5085797 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.013057592 = queryNorm
              1.4110461 = fieldWeight in 5799, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.109375 = fieldNorm(doc=5799)
        0.16 = coord(4/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.21
    0.2078899 = sum of:
      0.2078899 = product of:
        1.2993119 = sum of:
          0.22738013 = weight(abstract_txt:stemmer in 5798) [ClassicSimilarity], result of:
            0.22738013 = score(doc=5798,freq=2.0), product of:
              0.2234058 = queryWeight, product of:
                1.857284 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013057592 = queryNorm
              1.0177897 = fieldWeight in 5798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.16538118 = weight(abstract_txt:morphologically in 5798) [ClassicSimilarity], result of:
            0.16538118 = score(doc=5798,freq=1.0), product of:
              0.22764607 = queryWeight, product of:
                1.8748269 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.013057592 = queryNorm
              0.72648376 = fieldWeight in 5798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.17051871 = weight(abstract_txt:porter in 5798) [ClassicSimilarity], result of:
            0.17051871 = score(doc=5798,freq=1.0), product of:
              0.23233652 = queryWeight, product of:
                1.894043 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.013057592 = queryNorm
              0.7339299 = fieldWeight in 5798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.7360319 = weight(abstract_txt:stemming in 5798) [ClassicSimilarity], result of:
            0.7360319 = score(doc=5798,freq=3.0), product of:
              0.73026997 = queryWeight, product of:
                7.5085797 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.013057592 = queryNorm
              1.0078901 = fieldWeight in 5798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
        0.16 = coord(4/25)
    
  5. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.21
    0.20505302 = sum of:
      0.20505302 = product of:
        1.2815814 = sum of:
          0.07623101 = weight(abstract_txt:files in 2585) [ClassicSimilarity], result of:
            0.07623101 = score(doc=2585,freq=2.0), product of:
              0.0861513 = queryWeight, product of:
                1.1533524 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.013057592 = queryNorm
              0.8848503 = fieldWeight in 2585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.50332737 = weight(abstract_txt:stemmer in 2585) [ClassicSimilarity], result of:
            0.50332737 = score(doc=2585,freq=5.0), product of:
              0.2234058 = queryWeight, product of:
                1.857284 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013057592 = queryNorm
              2.2529736 = fieldWeight in 2585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.107095554 = weight(abstract_txt:algorithms in 2585) [ClassicSimilarity], result of:
            0.107095554 = score(doc=2585,freq=1.0), product of:
              0.1715438 = queryWeight, product of:
                2.30162 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.013057592 = queryNorm
              0.6243044 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.5949275 = weight(abstract_txt:stemming in 2585) [ClassicSimilarity], result of:
            0.5949275 = score(doc=2585,freq=1.0), product of:
              0.73026997 = queryWeight, product of:
                7.5085797 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.013057592 = queryNorm
              0.8146679 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.16 = coord(4/25)