Document (#14868)

Author
Kraaij, W.
Pohlmann, R.
Title
Evaluation of a Dutch stemming algorithm
Source
New review of document and text management. 1995, no.1, S.25-43
Year
1995
Abstract
A stemming algorithm enables the recall of text retrieval systems to be enhanced. Describes the development of a Dutch version of the Porter stemming algorithm. The stemmer was evaluated using a method drawn from Paice. The evaluation method is based on a list of groups of morphologically related words. Ideally, each group must be stemmed to the same root. The result of applying the stemmer to these groups of words is used to calculate the understemming and overstemming index. These parameters and the diversity of stem group categories that could be generated from the CELEX database enabled a careful analysis of the effects of each stemming rule. The test suite is highly suited to qualitative comparison of different versions of stemmers
Theme
Computerlinguistik

Similar documents (author)

  1. Pohlmann, T.: Vermittlung von Informationskompetenz an Master-Studierende und Doktoranden : Themen und Konzepte (2012) 2.11
    2.1131148 = sum of:
      2.1131148 = product of:
        4.2262297 = sum of:
          4.2262297 = weight(author_txt:pohlmann in 4448) [ClassicSimilarity], result of:
            4.2262297 = score(doc=4448,freq=1.0), product of:
              0.6958919 = queryWeight, product of:
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07161607 = queryNorm
              6.0731125 = fieldWeight in 4448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.625 = fieldNorm(doc=4448)
        0.5 = coord(1/2)
    
  2. Hiemstra, D.; Kraaij, W.: ¬A language-modeling approach to TREC (2005) 1.77
    1.7722294 = sum of:
      1.7722294 = product of:
        3.5444589 = sum of:
          3.5444589 = weight(author_txt:kraaij in 92) [ClassicSimilarity], result of:
            3.5444589 = score(doc=92,freq=1.0), product of:
              0.71814644 = queryWeight, product of:
                1.0158641 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.07161607 = queryNorm
              4.9355655 = fieldWeight in 92, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.5 = fieldNorm(doc=92)
        0.5 = coord(1/2)
    
  3. Friedrich, H.; Pohlmann, J.M.: Aufbau und Betrieb von Navigationssystemen als zentrale Aufgabe moderner Fachinformationssysteme (1996) 1.69
    1.6904919 = sum of:
      1.6904919 = product of:
        3.3809838 = sum of:
          3.3809838 = weight(author_txt:pohlmann in 5334) [ClassicSimilarity], result of:
            3.3809838 = score(doc=5334,freq=1.0), product of:
              0.6958919 = queryWeight, product of:
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07161607 = queryNorm
              4.85849 = fieldWeight in 5334, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=5334)
        0.5 = coord(1/2)
    
  4. Pohlmann, J.M.; König, E.: Landwirtschaft vernetzt : das deutsche Agrarinformationsnetz DAINet (1996) 1.69
    1.6904919 = sum of:
      1.6904919 = product of:
        3.3809838 = sum of:
          3.3809838 = weight(author_txt:pohlmann in 6002) [ClassicSimilarity], result of:
            3.3809838 = score(doc=6002,freq=1.0), product of:
              0.6958919 = queryWeight, product of:
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07161607 = queryNorm
              4.85849 = fieldWeight in 6002, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=6002)
        0.5 = coord(1/2)
    
  5. Diebel, C.; Pohlmann, C.: Retrokonversion des alten alphabetischen Kataloges der Deutschen Bücherei Leipzig (2001) 1.69
    1.6904919 = sum of:
      1.6904919 = product of:
        3.3809838 = sum of:
          3.3809838 = weight(author_txt:pohlmann in 589) [ClassicSimilarity], result of:
            3.3809838 = score(doc=589,freq=1.0), product of:
              0.6958919 = queryWeight, product of:
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07161607 = queryNorm
              4.85849 = fieldWeight in 589, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=589)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.36
    0.35547376 = sum of:
      0.35547376 = product of:
        1.2695491 = sum of:
          0.042086173 = weight(abstract_txt:rule in 4687) [ClassicSimilarity], result of:
            0.042086173 = score(doc=4687,freq=1.0), product of:
              0.10182598 = queryWeight, product of:
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.015397769 = queryNorm
              0.41331467 = fieldWeight in 4687, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.08015423 = weight(abstract_txt:stem in 4687) [ClassicSimilarity], result of:
            0.08015423 = score(doc=4687,freq=1.0), product of:
              0.15645279 = queryWeight, product of:
                1.2395451 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.015397769 = queryNorm
              0.5123222 = fieldWeight in 4687, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.026906459 = weight(abstract_txt:method in 4687) [ClassicSimilarity], result of:
            0.026906459 = score(doc=4687,freq=1.0), product of:
              0.09520944 = queryWeight, product of:
                1.3674948 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015397769 = queryNorm
              0.28260285 = fieldWeight in 4687, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.07756096 = weight(abstract_txt:words in 4687) [ClassicSimilarity], result of:
            0.07756096 = score(doc=4687,freq=3.0), product of:
              0.13371004 = queryWeight, product of:
                1.6205697 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.015397769 = queryNorm
              0.58006835 = fieldWeight in 4687, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.18270575 = weight(abstract_txt:algorithm in 4687) [ClassicSimilarity], result of:
            0.18270575 = score(doc=4687,freq=5.0), product of:
              0.22855158 = queryWeight, product of:
                2.5949168 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.015397769 = queryNorm
              0.79940706 = fieldWeight in 4687, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.22501367 = weight(abstract_txt:stemmer in 4687) [ClassicSimilarity], result of:
            0.22501367 = score(doc=4687,freq=1.0), product of:
              0.39226684 = queryWeight, product of:
                2.7757254 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015397769 = queryNorm
              0.573624 = fieldWeight in 4687, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
          0.6351218 = weight(abstract_txt:stemming in 4687) [ClassicSimilarity], result of:
            0.6351218 = score(doc=4687,freq=7.0), product of:
              0.51600945 = queryWeight, product of:
                4.5022492 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015397769 = queryNorm
              1.2308336 = fieldWeight in 4687, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0625 = fieldNorm(doc=4687)
        0.28 = coord(7/25)
    
  2. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.33
    0.32635733 = sum of:
      0.32635733 = product of:
        1.0198667 = sum of:
          0.15682653 = weight(abstract_txt:stem in 396) [ClassicSimilarity], result of:
            0.15682653 = score(doc=396,freq=5.0), product of:
              0.15645279 = queryWeight, product of:
                1.2395451 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.015397769 = queryNorm
              1.0023888 = fieldWeight in 396, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.017978547 = weight(abstract_txt:each in 396) [ClassicSimilarity], result of:
            0.017978547 = score(doc=396,freq=1.0), product of:
              0.07954386 = queryWeight, product of:
                1.2499396 = boost
                4.132947 = idf(docFreq=1862, maxDocs=42740)
                0.015397769 = queryNorm
              0.22602054 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132947 = idf(docFreq=1862, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.023094999 = weight(abstract_txt:evaluation in 396) [ClassicSimilarity], result of:
            0.023094999 = score(doc=396,freq=1.0), product of:
              0.093997344 = queryWeight, product of:
                1.3587623 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.015397769 = queryNorm
              0.24569842 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.040777937 = weight(abstract_txt:method in 396) [ClassicSimilarity], result of:
            0.040777937 = score(doc=396,freq=3.0), product of:
              0.09520944 = queryWeight, product of:
                1.3674948 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.015397769 = queryNorm
              0.4282972 = fieldWeight in 396, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.10126998 = weight(abstract_txt:morphologically in 396) [ClassicSimilarity], result of:
            0.10126998 = score(doc=396,freq=1.0), product of:
              0.19986993 = queryWeight, product of:
                1.4010203 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.015397769 = queryNorm
              0.5066794 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.10442757 = weight(abstract_txt:porter in 396) [ClassicSimilarity], result of:
            0.10442757 = score(doc=396,freq=1.0), product of:
              0.20400324 = queryWeight, product of:
                1.4154327 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.015397769 = queryNorm
              0.5118917 = fieldWeight in 396, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.2784402 = weight(abstract_txt:stemmer in 396) [ClassicSimilarity], result of:
            0.2784402 = score(doc=396,freq=2.0), product of:
              0.39226684 = queryWeight, product of:
                2.7757254 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015397769 = queryNorm
              0.7098235 = fieldWeight in 396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
          0.297051 = weight(abstract_txt:stemming in 396) [ClassicSimilarity], result of:
            0.297051 = score(doc=396,freq=2.0), product of:
              0.51600945 = queryWeight, product of:
                4.5022492 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015397769 = queryNorm
              0.5756697 = fieldWeight in 396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.0546875 = fieldNorm(doc=396)
        0.32 = coord(8/25)
    
  3. Frakes, W.B.: Stemming algorithms (1992) 0.30
    0.3008651 = sum of:
      0.3008651 = product of:
        1.8804071 = sum of:
          0.23147425 = weight(abstract_txt:morphologically in 4504) [ClassicSimilarity], result of:
            0.23147425 = score(doc=4504,freq=1.0), product of:
              0.19986993 = queryWeight, product of:
                1.4010203 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.015397769 = queryNorm
              1.1581244 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.2386916 = weight(abstract_txt:porter in 4504) [ClassicSimilarity], result of:
            0.2386916 = score(doc=4504,freq=1.0), product of:
              0.20400324 = queryWeight, product of:
                1.4154327 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.015397769 = queryNorm
              1.1700382 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.45002735 = weight(abstract_txt:stemmer in 4504) [ClassicSimilarity], result of:
            0.45002735 = score(doc=4504,freq=1.0), product of:
              0.39226684 = queryWeight, product of:
                2.7757254 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015397769 = queryNorm
              1.147248 = fieldWeight in 4504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
          0.96021396 = weight(abstract_txt:stemming in 4504) [ClassicSimilarity], result of:
            0.96021396 = score(doc=4504,freq=4.0), product of:
              0.51600945 = queryWeight, product of:
                4.5022492 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015397769 = queryNorm
              1.8608457 = fieldWeight in 4504, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.125 = fieldNorm(doc=4504)
        0.16 = coord(4/25)
    
  4. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.28283376 = sum of:
      0.28283376 = product of:
        1.7677109 = sum of:
          0.32412228 = weight(abstract_txt:stemmers in 3586) [ClassicSimilarity], result of:
            0.32412228 = score(doc=3586,freq=3.0), product of:
              0.18960035 = queryWeight, product of:
                1.3645525 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.015397769 = queryNorm
              1.7095026 = fieldWeight in 3586, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.14298986 = weight(abstract_txt:algorithm in 3586) [ClassicSimilarity], result of:
            0.14298986 = score(doc=3586,freq=1.0), product of:
              0.22855158 = queryWeight, product of:
                2.5949168 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.015397769 = queryNorm
              0.62563497 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.88050526 = weight(abstract_txt:stemmer in 3586) [ClassicSimilarity], result of:
            0.88050526 = score(doc=3586,freq=5.0), product of:
              0.39226684 = queryWeight, product of:
                2.7757254 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015397769 = queryNorm
              2.244659 = fieldWeight in 3586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.4200936 = weight(abstract_txt:stemming in 3586) [ClassicSimilarity], result of:
            0.4200936 = score(doc=3586,freq=1.0), product of:
              0.51600945 = queryWeight, product of:
                4.5022492 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015397769 = queryNorm
              0.81412 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
        0.16 = coord(4/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.2814035 = sum of:
      0.2814035 = product of:
        1.1725147 = sum of:
          0.10019279 = weight(abstract_txt:stem in 4951) [ClassicSimilarity], result of:
            0.10019279 = score(doc=4951,freq=1.0), product of:
              0.15645279 = queryWeight, product of:
                1.2395451 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.015397769 = queryNorm
              0.64040273 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.032992855 = weight(abstract_txt:evaluation in 4951) [ClassicSimilarity], result of:
            0.032992855 = score(doc=4951,freq=1.0), product of:
              0.093997344 = queryWeight, product of:
                1.3587623 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.015397769 = queryNorm
              0.35099775 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.18903194 = weight(abstract_txt:stemmers in 4951) [ClassicSimilarity], result of:
            0.18903194 = score(doc=4951,freq=2.0), product of:
              0.18960035 = queryWeight, product of:
                1.3645525 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.015397769 = queryNorm
              0.9970021 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.14467141 = weight(abstract_txt:morphologically in 4951) [ClassicSimilarity], result of:
            0.14467141 = score(doc=4951,freq=1.0), product of:
              0.19986993 = queryWeight, product of:
                1.4010203 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.015397769 = queryNorm
              0.7238278 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.2812671 = weight(abstract_txt:stemmer in 4951) [ClassicSimilarity], result of:
            0.2812671 = score(doc=4951,freq=1.0), product of:
              0.39226684 = queryWeight, product of:
                2.7757254 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.015397769 = queryNorm
              0.71703005 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
          0.4243586 = weight(abstract_txt:stemming in 4951) [ClassicSimilarity], result of:
            0.4243586 = score(doc=4951,freq=2.0), product of:
              0.51600945 = queryWeight, product of:
                4.5022492 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.015397769 = queryNorm
              0.8223853 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=4951)
        0.24 = coord(6/25)