Document (#27587)

Author
Fox, B.
Fox, C.J.
Title
Efficient stemmer generation
Source
Information processing and management. 38(2002) no.4, S.547-558
Year
2002
Abstract
This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.
Theme
Computerlinguistik

Similar documents (content)

  1. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.24
    0.24343684 = sum of:
      0.24343684 = product of:
        1.2171842 = sum of:
          0.020880995 = weight(abstract_txt:generated in 5867) [ClassicSimilarity], result of:
            0.020880995 = score(doc=5867,freq=1.0), product of:
              0.048241436 = queryWeight, product of:
                1.0353646 = boost
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.008409806 = queryNorm
              0.43284357 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5403976 = idf(docFreq=455, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.032497678 = weight(abstract_txt:algorithm in 5867) [ClassicSimilarity], result of:
            0.032497678 = score(doc=5867,freq=2.0), product of:
              0.05142145 = queryWeight, product of:
                1.068945 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.008409806 = queryNorm
              0.6319868 = fieldWeight in 5867, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.08770009 = weight(abstract_txt:stemming in 5867) [ClassicSimilarity], result of:
            0.08770009 = score(doc=5867,freq=3.0), product of:
              0.087072104 = queryWeight, product of:
                1.390986 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.008409806 = queryNorm
              1.0072123 = fieldWeight in 5867, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.27065924 = weight(abstract_txt:stemmers in 5867) [ClassicSimilarity], result of:
            0.27065924 = score(doc=5867,freq=1.0), product of:
              0.38392088 = queryWeight, product of:
                5.0590005 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.008409806 = queryNorm
              0.704987 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
          0.8054462 = weight(abstract_txt:stemmer in 5867) [ClassicSimilarity], result of:
            0.8054462 = score(doc=5867,freq=2.0), product of:
              0.79429936 = queryWeight, product of:
                10.290843 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.008409806 = queryNorm
              1.0140336 = fieldWeight in 5867, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=5867)
        0.2 = coord(5/25)
    
  2. Frakes, W.B.: Stemming algorithms (1992) 0.19
    0.18942584 = sum of:
      0.18942584 = product of:
        1.1839116 = sum of:
          0.037493564 = weight(abstract_txt:algorithms in 4504) [ClassicSimilarity], result of:
            0.037493564 = score(doc=4504,freq=1.0), product of:
              0.052096747 = queryWeight, product of:
                1.0759412 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.008409806 = queryNorm
              0.7196911 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.16202775 = weight(abstract_txt:stemming in 4504) [ClassicSimilarity], result of:
            0.16202775 = score(doc=4504,freq=4.0), product of:
              0.087072104 = queryWeight, product of:
                1.390986 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.008409806 = queryNorm
              1.8608457 = fieldWeight in 4504, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.073131785 = weight(abstract_txt:files in 4504) [ClassicSimilarity], result of:
            0.073131785 = score(doc=4504,freq=1.0), product of:
              0.10246768 = queryWeight, product of:
                2.1339865 = boost
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.008409806 = queryNorm
              0.7137059 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.9112584 = weight(abstract_txt:stemmer in 4504) [ClassicSimilarity], result of:
            0.9112584 = score(doc=4504,freq=1.0), product of:
              0.79429936 = queryWeight, product of:
                10.290843 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.008409806 = queryNorm
              1.147248 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
        0.16 = coord(4/25)
    
  3. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.17
    0.17122278 = sum of:
      0.17122278 = product of:
        1.0701424 = sum of:
          0.099221334 = weight(abstract_txt:stemming in 302) [ClassicSimilarity], result of:
            0.099221334 = score(doc=302,freq=6.0), product of:
              0.087072104 = queryWeight, product of:
                1.390986 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.008409806 = queryNorm
              1.1395307 = fieldWeight in 302, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.02034812 = weight(abstract_txt:than in 302) [ClassicSimilarity], result of:
            0.02034812 = score(doc=302,freq=3.0), product of:
              0.0480668 = queryWeight, product of:
                1.4615741 = boost
                3.9105554 = idf(docFreq=2326, maxDocs=42740)
                0.008409806 = queryNorm
              0.42333004 = fieldWeight in 302, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9105554 = idf(docFreq=2326, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.30621594 = weight(abstract_txt:stemmers in 302) [ClassicSimilarity], result of:
            0.30621594 = score(doc=302,freq=2.0), product of:
              0.38392088 = queryWeight, product of:
                5.0590005 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.008409806 = queryNorm
              0.7976017 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.64435697 = weight(abstract_txt:stemmer in 302) [ClassicSimilarity], result of:
            0.64435697 = score(doc=302,freq=2.0), product of:
              0.79429936 = queryWeight, product of:
                10.290843 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.008409806 = queryNorm
              0.81122684 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
        0.16 = coord(4/25)
    
  4. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.14
    0.13624232 = sum of:
      0.13624232 = product of:
        0.6812116 = sum of:
          0.013169499 = weight(abstract_txt:small in 396) [ClassicSimilarity], result of:
            0.013169499 = score(doc=396,freq=1.0), product of:
              0.04500218 = queryWeight, product of:
                5.3511558 = idf(docFreq=550, maxDocs=42740)
                0.008409806 = queryNorm
              0.29264134 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3511558 = idf(docFreq=550, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.050124772 = weight(abstract_txt:stemming in 396) [ClassicSimilarity], result of:
            0.050124772 = score(doc=396,freq=2.0), product of:
              0.087072104 = queryWeight, product of:
                1.390986 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.008409806 = queryNorm
              0.5756697 = fieldWeight in 396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.010279493 = weight(abstract_txt:than in 396) [ClassicSimilarity], result of:
            0.010279493 = score(doc=396,freq=1.0), product of:
              0.0480668 = queryWeight, product of:
                1.4615741 = boost
                3.9105554 = idf(docFreq=2326, maxDocs=42740)
                0.008409806 = queryNorm
              0.2138585 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9105554 = idf(docFreq=2326, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.04382547 = weight(abstract_txt:generation in 396) [ClassicSimilarity], result of:
            0.04382547 = score(doc=396,freq=2.0), product of:
              0.10030867 = queryWeight, product of:
                2.111385 = boost
                5.649175 = idf(docFreq=408, maxDocs=42740)
                0.008409806 = queryNorm
              0.4369061 = fieldWeight in 396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.649175 = idf(docFreq=408, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.5638124 = weight(abstract_txt:stemmer in 396) [ClassicSimilarity], result of:
            0.5638124 = score(doc=396,freq=2.0), product of:
              0.79429936 = queryWeight, product of:
                10.290843 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.008409806 = queryNorm
              0.7098235 = fieldWeight in 396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
        0.2 = coord(5/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.12
    0.12286959 = sum of:
      0.12286959 = product of:
        1.0239133 = sum of:
          0.07160682 = weight(abstract_txt:stemming in 4951) [ClassicSimilarity], result of:
            0.07160682 = score(doc=4951,freq=2.0), product of:
              0.087072104 = queryWeight, product of:
                1.390986 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.008409806 = queryNorm
              0.8223853 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.38276994 = weight(abstract_txt:stemmers in 4951) [ClassicSimilarity], result of:
            0.38276994 = score(doc=4951,freq=2.0), product of:
              0.38392088 = queryWeight, product of:
                5.0590005 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.008409806 = queryNorm
              0.9970021 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.5695365 = weight(abstract_txt:stemmer in 4951) [ClassicSimilarity], result of:
            0.5695365 = score(doc=4951,freq=1.0), product of:
              0.79429936 = queryWeight, product of:
                10.290843 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.008409806 = queryNorm
              0.71703005 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
        0.12 = coord(3/25)