Document (#21505)

Author
Frakes, W.B.
Title
Stemming algorithms
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.131-160
Abstract
Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
Theme
Computerlinguistik
Retrievalalgorithmen

Similar documents (content)

  1. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.27
    0.26985943 = sum of:
      0.26985943 = product of:
        1.1244143 = sum of:
          0.030576173 = weight(abstract_txt:improve in 573) [ClassicSimilarity], result of:
            0.030576173 = score(doc=573,freq=1.0), product of:
              0.06538955 = queryWeight, product of:
                1.0052092 = boost
                4.987736 = idf(docFreq=801, maxDocs=43254)
                0.013042127 = queryNorm
              0.4676003 = fieldWeight in 573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.987736 = idf(docFreq=801, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
          0.032845005 = weight(abstract_txt:effectiveness in 573) [ClassicSimilarity], result of:
            0.032845005 = score(doc=573,freq=1.0), product of:
              0.06858553 = queryWeight, product of:
                1.0294814 = boost
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.013042127 = queryNorm
              0.47889116 = fieldWeight in 573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
          0.06173963 = weight(abstract_txt:reduce in 573) [ClassicSimilarity], result of:
            0.06173963 = score(doc=573,freq=1.0), product of:
              0.10446295 = queryWeight, product of:
                1.2705256 = boost
                6.304207 = idf(docFreq=214, maxDocs=43254)
                0.013042127 = queryNorm
              0.5910194 = fieldWeight in 573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.304207 = idf(docFreq=214, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
          0.07451241 = weight(abstract_txt:relate in 573) [ClassicSimilarity], result of:
            0.07451241 = score(doc=573,freq=1.0), product of:
              0.118414626 = queryWeight, product of:
                1.3527107 = boost
                6.7120004 = idf(docFreq=142, maxDocs=43254)
                0.013042127 = queryNorm
              0.62925005 = fieldWeight in 573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7120004 = idf(docFreq=142, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
          0.040429704 = weight(abstract_txt:indexing in 573) [ClassicSimilarity], result of:
            0.040429704 = score(doc=573,freq=1.0), product of:
              0.09924988 = queryWeight, product of:
                1.7513876 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.013042127 = queryNorm
              0.4073527 = fieldWeight in 573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
          0.8843114 = weight(abstract_txt:stemming in 573) [ClassicSimilarity], result of:
            0.8843114 = score(doc=573,freq=3.0), product of:
              0.73047614 = queryWeight, product of:
                7.5126004 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.013042127 = queryNorm
              1.2105958 = fieldWeight in 573, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.09375 = fieldNorm(doc=573)
        0.24 = coord(6/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.24
    0.23647144 = sum of:
      0.23647144 = product of:
        1.1823572 = sum of:
          0.02189667 = weight(abstract_txt:effectiveness in 302) [ClassicSimilarity], result of:
            0.02189667 = score(doc=302,freq=1.0), product of:
              0.06858553 = queryWeight, product of:
                1.0294814 = boost
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.013042127 = queryNorm
              0.31926078 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.11945454 = weight(abstract_txt:gram in 302) [ClassicSimilarity], result of:
            0.11945454 = score(doc=302,freq=2.0), product of:
              0.1686963 = queryWeight, product of:
                1.6145632 = boost
                8.011283 = idf(docFreq=38, maxDocs=43254)
                0.013042127 = queryNorm
              0.7081041 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.011283 = idf(docFreq=38, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.026953135 = weight(abstract_txt:indexing in 302) [ClassicSimilarity], result of:
            0.026953135 = score(doc=302,freq=1.0), product of:
              0.09924988 = queryWeight, product of:
                1.7513876 = boost
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.013042127 = queryNorm
              0.27156845 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.345095 = idf(docFreq=1524, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.18031599 = weight(abstract_txt:stemmer in 302) [ClassicSimilarity], result of:
            0.18031599 = score(doc=302,freq=2.0), product of:
              0.22198653 = queryWeight, product of:
                1.8521049 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.013042127 = queryNorm
              0.81228346 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.8337369 = weight(abstract_txt:stemming in 302) [ClassicSimilarity], result of:
            0.8337369 = score(doc=302,freq=6.0), product of:
              0.73047614 = queryWeight, product of:
                7.5126004 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.013042127 = queryNorm
              1.1413609 = fieldWeight in 302, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
        0.2 = coord(5/25)
    
  3. Paice, C.D.: Method for evaluation of stemming algorithms based on error counting (1996) 0.23
    0.22654088 = sum of:
      0.22654088 = product of:
        1.4158806 = sum of:
          0.03831917 = weight(abstract_txt:effectiveness in 6868) [ClassicSimilarity], result of:
            0.03831917 = score(doc=6868,freq=1.0), product of:
              0.06858553 = queryWeight, product of:
                1.0294814 = boost
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.013042127 = queryNorm
              0.55870634 = fieldWeight in 6868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1081724 = idf(docFreq=710, maxDocs=43254)
                0.109375 = fieldNorm(doc=6868)
          0.23667505 = weight(abstract_txt:porter in 6868) [ClassicSimilarity], result of:
            0.23667505 = score(doc=6868,freq=1.0), product of:
              0.230882 = queryWeight, product of:
                1.8888493 = boost
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.013042127 = queryNorm
              1.0250909 = fieldWeight in 6868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.109375 = fieldNorm(doc=6868)
          0.10918964 = weight(abstract_txt:algorithms in 6868) [ClassicSimilarity], result of:
            0.10918964 = score(doc=6868,freq=1.0), product of:
              0.17368115 = queryWeight, product of:
                2.316827 = boost
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.013042127 = queryNorm
              0.6286787 = fieldWeight in 6868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.109375 = fieldNorm(doc=6868)
          1.0316967 = weight(abstract_txt:stemming in 6868) [ClassicSimilarity], result of:
            1.0316967 = score(doc=6868,freq=3.0), product of:
              0.73047614 = queryWeight, product of:
                7.5126004 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.013042127 = queryNorm
              1.4123619 = fieldWeight in 6868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.109375 = fieldNorm(doc=6868)
        0.16 = coord(4/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.21
    0.2072517 = sum of:
      0.2072517 = product of:
        1.2953231 = sum of:
          0.22539498 = weight(abstract_txt:stemmer in 6867) [ClassicSimilarity], result of:
            0.22539498 = score(doc=6867,freq=2.0), product of:
              0.22198653 = queryWeight, product of:
                1.8521049 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.013042127 = queryNorm
              1.0153543 = fieldWeight in 6867, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.078125 = fieldNorm(doc=6867)
          0.16394836 = weight(abstract_txt:morphologically in 6867) [ClassicSimilarity], result of:
            0.16394836 = score(doc=6867,freq=1.0), product of:
              0.22621001 = queryWeight, product of:
                1.8696408 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.013042127 = queryNorm
              0.7247617 = fieldWeight in 6867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.078125 = fieldNorm(doc=6867)
          0.16905361 = weight(abstract_txt:porter in 6867) [ClassicSimilarity], result of:
            0.16905361 = score(doc=6867,freq=1.0), product of:
              0.230882 = queryWeight, product of:
                1.8888493 = boost
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.013042127 = queryNorm
              0.73220783 = fieldWeight in 6867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.37226 = idf(docFreq=9, maxDocs=43254)
                0.078125 = fieldNorm(doc=6867)
          0.7369262 = weight(abstract_txt:stemming in 6867) [ClassicSimilarity], result of:
            0.7369262 = score(doc=6867,freq=3.0), product of:
              0.73047614 = queryWeight, product of:
                7.5126004 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.013042127 = queryNorm
              1.00883 = fieldWeight in 6867, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.078125 = fieldNorm(doc=6867)
        0.16 = coord(4/25)
    
  5. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.20
    0.20472224 = sum of:
      0.20472224 = product of:
        1.2795141 = sum of:
          0.07574108 = weight(abstract_txt:files in 4586) [ClassicSimilarity], result of:
            0.07574108 = score(doc=4586,freq=2.0), product of:
              0.085736565 = queryWeight, product of:
                1.1510265 = boost
                5.7112656 = idf(docFreq=388, maxDocs=43254)
                0.013042127 = queryNorm
              0.8834163 = fieldWeight in 4586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7112656 = idf(docFreq=388, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          0.49893308 = weight(abstract_txt:stemmer in 4586) [ClassicSimilarity], result of:
            0.49893308 = score(doc=4586,freq=5.0), product of:
              0.22198653 = queryWeight, product of:
                1.8521049 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.013042127 = queryNorm
              2.2475827 = fieldWeight in 4586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          0.10918964 = weight(abstract_txt:algorithms in 4586) [ClassicSimilarity], result of:
            0.10918964 = score(doc=4586,freq=1.0), product of:
              0.17368115 = queryWeight, product of:
                2.316827 = boost
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.013042127 = queryNorm
              0.6286787 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          0.5956504 = weight(abstract_txt:stemming in 4586) [ClassicSimilarity], result of:
            0.5956504 = score(doc=4586,freq=1.0), product of:
              0.73047614 = queryWeight, product of:
                7.5126004 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.013042127 = queryNorm
              0.81542754 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
        0.16 = coord(4/25)