Document (#21505)

Author
Frakes, W.B.
Title
Stemming algorithms
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.131-160
Abstract
Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
Theme
Computerlinguistik
Retrievalalgorithmen

Similar documents (content)

  1. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.27
    0.26996067 = sum of:
      0.26996067 = product of:
        1.1248362 = sum of:
          0.03102239 = weight(abstract_txt:improve in 6573) [ClassicSimilarity], result of:
            0.03102239 = score(doc=6573,freq=1.0), product of:
              0.0660464 = queryWeight, product of:
                1.0107551 = boost
                5.010197 = idf(docFreq=766, maxDocs=42306)
                0.013042127 = queryNorm
              0.469706 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.010197 = idf(docFreq=766, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
          0.033113852 = weight(abstract_txt:effectiveness in 6573) [ClassicSimilarity], result of:
            0.033113852 = score(doc=6573,freq=1.0), product of:
              0.068982475 = queryWeight, product of:
                1.0329772 = boost
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.013042127 = queryNorm
              0.4800328 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
          0.06255397 = weight(abstract_txt:reduce in 6573) [ClassicSimilarity], result of:
            0.06255397 = score(doc=6573,freq=1.0), product of:
              0.105414964 = queryWeight, product of:
                1.2769458 = boost
                6.3296742 = idf(docFreq=204, maxDocs=42306)
                0.013042127 = queryNorm
              0.593407 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3296742 = idf(docFreq=204, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
          0.07527993 = weight(abstract_txt:relate in 6573) [ClassicSimilarity], result of:
            0.07527993 = score(doc=6573,freq=1.0), product of:
              0.1192665 = queryWeight, product of:
                1.3582528 = boost
                6.732703 = idf(docFreq=136, maxDocs=42306)
                0.013042127 = queryNorm
              0.6311909 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.732703 = idf(docFreq=136, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
          0.040276244 = weight(abstract_txt:indexing in 6573) [ClassicSimilarity], result of:
            0.040276244 = score(doc=6573,freq=1.0), product of:
              0.099031866 = queryWeight, product of:
                1.7503457 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.013042127 = queryNorm
              0.40669984 = fieldWeight in 6573, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
          0.8825898 = weight(abstract_txt:stemming in 6573) [ClassicSimilarity], result of:
            0.8825898 = score(doc=6573,freq=3.0), product of:
              0.7297731 = queryWeight, product of:
                7.5127735 = boost
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.013042127 = queryNorm
              1.209403 = fieldWeight in 6573, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.09375 = fieldNorm(doc=6573)
        0.24 = coord(6/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.24
    0.23576427 = sum of:
      0.23576427 = product of:
        1.1788213 = sum of:
          0.0220759 = weight(abstract_txt:effectiveness in 302) [ClassicSimilarity], result of:
            0.0220759 = score(doc=302,freq=1.0), product of:
              0.068982475 = queryWeight, product of:
                1.0329772 = boost
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.013042127 = queryNorm
              0.32002187 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.0625 = fieldNorm(doc=302)
          0.11858555 = weight(abstract_txt:gram in 302) [ClassicSimilarity], result of:
            0.11858555 = score(doc=302,freq=2.0), product of:
              0.16793364 = queryWeight, product of:
                1.6117222 = boost
                7.9891224 = idf(docFreq=38, maxDocs=42306)
                0.013042127 = queryNorm
              0.7061453 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.9891224 = idf(docFreq=38, maxDocs=42306)
                0.0625 = fieldNorm(doc=302)
          0.026850829 = weight(abstract_txt:indexing in 302) [ClassicSimilarity], result of:
            0.026850829 = score(doc=302,freq=1.0), product of:
              0.099031866 = queryWeight, product of:
                1.7503457 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.013042127 = queryNorm
              0.2711332 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0625 = fieldNorm(doc=302)
          0.17919537 = weight(abstract_txt:stemmer in 302) [ClassicSimilarity], result of:
            0.17919537 = score(doc=302,freq=2.0), product of:
              0.2211402 = queryWeight, product of:
                1.8495038 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.013042127 = queryNorm
              0.8103247 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.0625 = fieldNorm(doc=302)
          0.8321137 = weight(abstract_txt:stemming in 302) [ClassicSimilarity], result of:
            0.8321137 = score(doc=302,freq=6.0), product of:
              0.7297731 = queryWeight, product of:
                7.5127735 = boost
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.013042127 = queryNorm
              1.1402361 = fieldWeight in 302, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.0625 = fieldNorm(doc=302)
        0.2 = coord(5/25)
    
  3. Paice, C.D.: Method for evaluation of stemming algorithms based on error counting (1996) 0.23
    0.22615205 = sum of:
      0.22615205 = product of:
        1.4134504 = sum of:
          0.038632825 = weight(abstract_txt:effectiveness in 5868) [ClassicSimilarity], result of:
            0.038632825 = score(doc=5868,freq=1.0), product of:
              0.068982475 = queryWeight, product of:
                1.0329772 = boost
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.013042127 = queryNorm
              0.56003827 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.12035 = idf(docFreq=686, maxDocs=42306)
                0.109375 = fieldNorm(doc=5868)
          0.23523736 = weight(abstract_txt:porter in 5868) [ClassicSimilarity], result of:
            0.23523736 = score(doc=5868,freq=1.0), product of:
              0.23002338 = queryWeight, product of:
                1.8862852 = boost
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.013042127 = queryNorm
              1.0226672 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.109375 = fieldNorm(doc=5868)
          0.10989203 = weight(abstract_txt:algorithms in 5868) [ClassicSimilarity], result of:
            0.10989203 = score(doc=5868,freq=1.0), product of:
              0.17448387 = queryWeight, product of:
                2.3233466 = boost
                5.758281 = idf(docFreq=362, maxDocs=42306)
                0.013042127 = queryNorm
              0.629812 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.758281 = idf(docFreq=362, maxDocs=42306)
                0.109375 = fieldNorm(doc=5868)
          1.0296881 = weight(abstract_txt:stemming in 5868) [ClassicSimilarity], result of:
            1.0296881 = score(doc=5868,freq=3.0), product of:
              0.7297731 = queryWeight, product of:
                7.5127735 = boost
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.013042127 = queryNorm
              1.4109702 = fieldWeight in 5868, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.109375 = fieldNorm(doc=5868)
        0.16 = coord(4/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.21
    0.20647247 = sum of:
      0.20647247 = product of:
        1.290453 = sum of:
          0.22399423 = weight(abstract_txt:stemmer in 5867) [ClassicSimilarity], result of:
            0.22399423 = score(doc=5867,freq=2.0), product of:
              0.2211402 = queryWeight, product of:
                1.8495038 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.013042127 = queryNorm
              1.012906 = fieldWeight in 5867, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.078125 = fieldNorm(doc=5867)
          0.16294056 = weight(abstract_txt:morphologically in 5867) [ClassicSimilarity], result of:
            0.16294056 = score(doc=5867,freq=1.0), product of:
              0.22535782 = queryWeight, product of:
                1.8670573 = boost
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.013042127 = queryNorm
              0.72303045 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.254789 = idf(docFreq=10, maxDocs=42306)
                0.078125 = fieldNorm(doc=5867)
          0.16802667 = weight(abstract_txt:porter in 5867) [ClassicSimilarity], result of:
            0.16802667 = score(doc=5867,freq=1.0), product of:
              0.23002338 = queryWeight, product of:
                1.8862852 = boost
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.013042127 = queryNorm
              0.7304765 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.078125 = fieldNorm(doc=5867)
          0.7354915 = weight(abstract_txt:stemming in 5867) [ClassicSimilarity], result of:
            0.7354915 = score(doc=5867,freq=3.0), product of:
              0.7297731 = queryWeight, product of:
                7.5127735 = boost
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.013042127 = queryNorm
              1.0078359 = fieldWeight in 5867, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.078125 = fieldNorm(doc=5867)
        0.16 = coord(4/25)
    
  5. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.20
    0.20412311 = sum of:
      0.20412311 = product of:
        1.2757695 = sum of:
          0.075554326 = weight(abstract_txt:files in 3586) [ClassicSimilarity], result of:
            0.075554326 = score(doc=3586,freq=2.0), product of:
              0.08562437 = queryWeight, product of:
                1.1508535 = boost
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.013042127 = queryNorm
              0.8823928 = fieldWeight in 3586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.704649 = idf(docFreq=382, maxDocs=42306)
                0.109375 = fieldNorm(doc=3586)
          0.49583238 = weight(abstract_txt:stemmer in 3586) [ClassicSimilarity], result of:
            0.49583238 = score(doc=3586,freq=5.0), product of:
              0.2211402 = queryWeight, product of:
                1.8495038 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.013042127 = queryNorm
              2.242163 = fieldWeight in 3586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.109375 = fieldNorm(doc=3586)
          0.10989203 = weight(abstract_txt:algorithms in 3586) [ClassicSimilarity], result of:
            0.10989203 = score(doc=3586,freq=1.0), product of:
              0.17448387 = queryWeight, product of:
                2.3233466 = boost
                5.758281 = idf(docFreq=362, maxDocs=42306)
                0.013042127 = queryNorm
              0.629812 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.758281 = idf(docFreq=362, maxDocs=42306)
                0.109375 = fieldNorm(doc=3586)
          0.59449077 = weight(abstract_txt:stemming in 3586) [ClassicSimilarity], result of:
            0.59449077 = score(doc=3586,freq=1.0), product of:
              0.7297731 = queryWeight, product of:
                7.5127735 = boost
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.013042127 = queryNorm
              0.8146241 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4479914 = idf(docFreq=66, maxDocs=42306)
                0.109375 = fieldNorm(doc=3586)
        0.16 = coord(4/25)