Document (#27586)

Author
Fox, B.
Fox, C.J.
Title
Efficient stemmer generation
Source
Information processing and management. 38(2002) no.4, S.547-558
Year
2002
Abstract
This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.
Theme
Computerlinguistik

Similar documents (content)

  1. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.24
    0.2444028 = sum of:
      0.2444028 = product of:
        1.222014 = sum of:
          0.02054608 = weight(abstract_txt:generated in 5798) [ClassicSimilarity], result of:
            0.02054608 = score(doc=5798,freq=1.0), product of:
              0.047634285 = queryWeight, product of:
                1.0350893 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.008335325 = queryNorm
              0.43132967 = fieldWeight in 5798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.032066353 = weight(abstract_txt:algorithm in 5798) [ClassicSimilarity], result of:
            0.032066353 = score(doc=5798,freq=2.0), product of:
              0.05086941 = queryWeight, product of:
                1.0696614 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.008335325 = queryNorm
              0.63036615 = fieldWeight in 5798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.087381445 = weight(abstract_txt:stemming in 5798) [ClassicSimilarity], result of:
            0.087381445 = score(doc=5798,freq=3.0), product of:
              0.08669739 = queryWeight, product of:
                1.3964359 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.008335325 = queryNorm
              1.0078901 = fieldWeight in 5798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.2721854 = weight(abstract_txt:stemmers in 5798) [ClassicSimilarity], result of:
            0.2721854 = score(doc=5798,freq=1.0), product of:
              0.38463658 = queryWeight, product of:
                5.0945272 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.008335325 = queryNorm
              0.707643 = fieldWeight in 5798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
          0.80983466 = weight(abstract_txt:stemmer in 5798) [ClassicSimilarity], result of:
            0.80983466 = score(doc=5798,freq=2.0), product of:
              0.79567975 = queryWeight, product of:
                10.362457 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.008335325 = queryNorm
              1.0177897 = fieldWeight in 5798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=5798)
        0.2 = coord(5/25)
    
  2. Frakes, W.B.: Stemming algorithms (1992) 0.19
    0.18994 = sum of:
      0.18994 = product of:
        1.1871251 = sum of:
          0.0363267 = weight(abstract_txt:algorithms in 3503) [ClassicSimilarity], result of:
            0.0363267 = score(doc=3503,freq=1.0), product of:
              0.050914045 = queryWeight, product of:
                1.0701306 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.008335325 = queryNorm
              0.7134907 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.16143903 = weight(abstract_txt:stemming in 3503) [ClassicSimilarity], result of:
            0.16143903 = score(doc=3503,freq=4.0), product of:
              0.08669739 = queryWeight, product of:
                1.3964359 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.008335325 = queryNorm
              1.862098 = fieldWeight in 3503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.07313601 = weight(abstract_txt:files in 3503) [ClassicSimilarity], result of:
            0.07313601 = score(doc=3503,freq=1.0), product of:
              0.10227854 = queryWeight, product of:
                2.1449897 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.008335325 = queryNorm
              0.715067 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.91622335 = weight(abstract_txt:stemmer in 3503) [ClassicSimilarity], result of:
            0.91622335 = score(doc=3503,freq=1.0), product of:
              0.79567975 = queryWeight, product of:
                10.362457 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.008335325 = queryNorm
              1.1514976 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
        0.16 = coord(4/25)
    
  3. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.17
    0.17194645 = sum of:
      0.17194645 = product of:
        1.0746653 = sum of:
          0.09886082 = weight(abstract_txt:stemming in 3301) [ClassicSimilarity], result of:
            0.09886082 = score(doc=3301,freq=6.0), product of:
              0.08669739 = queryWeight, product of:
                1.3964359 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.008335325 = queryNorm
              1.1402975 = fieldWeight in 3301, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.019994155 = weight(abstract_txt:than in 3301) [ClassicSimilarity], result of:
            0.019994155 = score(doc=3301,freq=3.0), product of:
              0.04741822 = queryWeight, product of:
                1.4605136 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.008335325 = queryNorm
              0.4216555 = fieldWeight in 3301, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.3079426 = weight(abstract_txt:stemmers in 3301) [ClassicSimilarity], result of:
            0.3079426 = score(doc=3301,freq=2.0), product of:
              0.38463658 = queryWeight, product of:
                5.0945272 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.008335325 = queryNorm
              0.8006066 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.64786774 = weight(abstract_txt:stemmer in 3301) [ClassicSimilarity], result of:
            0.64786774 = score(doc=3301,freq=2.0), product of:
              0.79567975 = queryWeight, product of:
                10.362457 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.008335325 = queryNorm
              0.81423175 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
        0.16 = coord(4/25)
    
  4. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.14
    0.13658874 = sum of:
      0.13658874 = product of:
        0.68294364 = sum of:
          0.012968617 = weight(abstract_txt:small in 4395) [ClassicSimilarity], result of:
            0.012968617 = score(doc=4395,freq=1.0), product of:
              0.044459447 = queryWeight, product of:
                5.333859 = idf(docFreq=579, maxDocs=44218)
                0.008335325 = queryNorm
              0.29169542 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.333859 = idf(docFreq=579, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.049942657 = weight(abstract_txt:stemming in 4395) [ClassicSimilarity], result of:
            0.049942657 = score(doc=4395,freq=2.0), product of:
              0.08669739 = queryWeight, product of:
                1.3964359 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.008335325 = queryNorm
              0.5760572 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.010100677 = weight(abstract_txt:than in 4395) [ClassicSimilarity], result of:
            0.010100677 = score(doc=4395,freq=1.0), product of:
              0.04741822 = queryWeight, product of:
                1.4605136 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.008335325 = queryNorm
              0.21301256 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.043047495 = weight(abstract_txt:generation in 4395) [ClassicSimilarity], result of:
            0.043047495 = score(doc=4395,freq=2.0), product of:
              0.09893126 = queryWeight, product of:
                2.1095982 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.008335325 = queryNorm
              0.4351253 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.5668842 = weight(abstract_txt:stemmer in 4395) [ClassicSimilarity], result of:
            0.5668842 = score(doc=4395,freq=2.0), product of:
              0.79567975 = queryWeight, product of:
                10.362457 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.008335325 = queryNorm
              0.71245277 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
        0.2 = coord(5/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.12
    0.12346973 = sum of:
      0.12346973 = product of:
        1.0289145 = sum of:
          0.07134665 = weight(abstract_txt:stemming in 2950) [ClassicSimilarity], result of:
            0.07134665 = score(doc=2950,freq=2.0), product of:
              0.08669739 = queryWeight, product of:
                1.3964359 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.008335325 = queryNorm
              0.8229388 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.38492826 = weight(abstract_txt:stemmers in 2950) [ClassicSimilarity], result of:
            0.38492826 = score(doc=2950,freq=2.0), product of:
              0.38463658 = queryWeight, product of:
                5.0945272 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.008335325 = queryNorm
              1.0007583 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.5726396 = weight(abstract_txt:stemmer in 2950) [ClassicSimilarity], result of:
            0.5726396 = score(doc=2950,freq=1.0), product of:
              0.79567975 = queryWeight, product of:
                10.362457 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.008335325 = queryNorm
              0.71968603 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
        0.12 = coord(3/25)