Document (#14868)

Author
Kraaij, W.
Pohlmann, R.
Title
Evaluation of a Dutch stemming algorithm
Source
New review of document and text management. 1995, no.1, S.25-43
Year
1995
Abstract
A stemming algorithm enables the recall of text retrieval systems to be enhanced. Describes the development of a Dutch version of the Porter stemming algorithm. The stemmer was evaluated using a method drawn from Paice. The evaluation method is based on a list of groups of morphologically related words. Ideally, each group must be stemmed to the same root. The result of applying the stemmer to these groups of words is used to calculate the understemming and overstemming index. These parameters and the diversity of stem group categories that could be generated from the CELEX database enabled a careful analysis of the effects of each stemming rule. The test suite is highly suited to qualitative comparison of different versions of stemmers
Theme
Computerlinguistik

Similar documents (author)

  1. Pohlmann, T.: Vermittlung von Informationskompetenz an Master-Studierende und Doktoranden : Themen und Konzepte (2012) 2.12
    2.1206276 = sum of:
      2.1206276 = product of:
        4.2412553 = sum of:
          4.2412553 = weight(author_txt:pohlmann in 2447) [ClassicSimilarity], result of:
            4.2412553 = score(doc=2447,freq=1.0), product of:
              0.6959311 = queryWeight, product of:
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07137039 = queryNorm
              6.094361 = fieldWeight in 2447, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.625 = fieldNorm(doc=2447)
        0.5 = coord(1/2)
    
  2. Hiemstra, D.; Kraaij, W.: ¬A language-modeling approach to TREC (2005) 1.78
    1.7782391 = sum of:
      1.7782391 = product of:
        3.5564783 = sum of:
          3.5564783 = weight(author_txt:kraaij in 5091) [ClassicSimilarity], result of:
            3.5564783 = score(doc=5091,freq=1.0), product of:
              0.71810853 = queryWeight, product of:
                1.0158087 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.07137039 = queryNorm
              4.952564 = fieldWeight in 5091, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.5 = fieldNorm(doc=5091)
        0.5 = coord(1/2)
    
  3. Friedrich, H.; Pohlmann, J.M.: Aufbau und Betrieb von Navigationssystemen als zentrale Aufgabe moderner Fachinformationssysteme (1996) 1.70
    1.6965021 = sum of:
      1.6965021 = product of:
        3.3930042 = sum of:
          3.3930042 = weight(author_txt:pohlmann in 5265) [ClassicSimilarity], result of:
            3.3930042 = score(doc=5265,freq=1.0), product of:
              0.6959311 = queryWeight, product of:
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07137039 = queryNorm
              4.8754888 = fieldWeight in 5265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=5265)
        0.5 = coord(1/2)
    
  4. Pohlmann, J.M.; König, E.: Landwirtschaft vernetzt : das deutsche Agrarinformationsnetz DAINet (1996) 1.70
    1.6965021 = sum of:
      1.6965021 = product of:
        3.3930042 = sum of:
          3.3930042 = weight(author_txt:pohlmann in 5933) [ClassicSimilarity], result of:
            3.3930042 = score(doc=5933,freq=1.0), product of:
              0.6959311 = queryWeight, product of:
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07137039 = queryNorm
              4.8754888 = fieldWeight in 5933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=5933)
        0.5 = coord(1/2)
    
  5. Diebel, C.; Pohlmann, C.: Retrokonversion des alten alphabetischen Kataloges der Deutschen Bücherei Leipzig (2001) 1.70
    1.6965021 = sum of:
      1.6965021 = product of:
        3.3930042 = sum of:
          3.3930042 = weight(author_txt:pohlmann in 5673) [ClassicSimilarity], result of:
            3.3930042 = score(doc=5673,freq=1.0), product of:
              0.6959311 = queryWeight, product of:
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07137039 = queryNorm
              4.8754888 = fieldWeight in 5673, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=5673)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.35
    0.35386685 = sum of:
      0.35386685 = product of:
        1.2638102 = sum of:
          0.042061325 = weight(abstract_txt:rule in 2686) [ClassicSimilarity], result of:
            0.042061325 = score(doc=2686,freq=1.0), product of:
              0.10172821 = queryWeight, product of:
                6.615483 = idf(docFreq=160, maxDocs=44218)
                0.0153772915 = queryNorm
              0.41346768 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.615483 = idf(docFreq=160, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.074605666 = weight(abstract_txt:stem in 2686) [ClassicSimilarity], result of:
            0.074605666 = score(doc=2686,freq=1.0), product of:
              0.14906213 = queryWeight, product of:
                1.2104949 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0153772915 = queryNorm
              0.5005005 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.026493594 = weight(abstract_txt:method in 2686) [ClassicSimilarity], result of:
            0.026493594 = score(doc=2686,freq=1.0), product of:
              0.09417956 = queryWeight, product of:
                1.3607321 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0153772915 = queryNorm
              0.28130937 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.07719379 = weight(abstract_txt:words in 2686) [ClassicSimilarity], result of:
            0.07719379 = score(doc=2686,freq=3.0), product of:
              0.13321218 = queryWeight, product of:
                1.6183269 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0153772915 = queryNorm
              0.57948 = fieldWeight in 2686, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.18099561 = weight(abstract_txt:algorithm in 2686) [ClassicSimilarity], result of:
            0.18099561 = score(doc=2686,freq=5.0), product of:
              0.22699443 = queryWeight, product of:
                2.5873044 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0153772915 = queryNorm
              0.7973571 = fieldWeight in 2686, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.22713673 = weight(abstract_txt:stemmer in 2686) [ClassicSimilarity], result of:
            0.22713673 = score(doc=2686,freq=1.0), product of:
              0.39450666 = queryWeight, product of:
                2.784976 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0153772915 = queryNorm
              0.5757488 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.6353234 = weight(abstract_txt:stemming in 2686) [ClassicSimilarity], result of:
            0.6353234 = score(doc=2686,freq=7.0), product of:
              0.5158261 = queryWeight, product of:
                4.503612 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0153772915 = queryNorm
              1.231662 = fieldWeight in 2686, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
        0.28 = coord(7/25)
    
  2. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.32
    0.32405013 = sum of:
      0.32405013 = product of:
        1.0126567 = sum of:
          0.14597043 = weight(abstract_txt:stem in 4395) [ClassicSimilarity], result of:
            0.14597043 = score(doc=4395,freq=5.0), product of:
              0.14906213 = queryWeight, product of:
                1.2104949 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0153772915 = queryNorm
              0.979259 = fieldWeight in 4395, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.017763566 = weight(abstract_txt:each in 4395) [ClassicSimilarity], result of:
            0.017763566 = score(doc=4395,freq=1.0), product of:
              0.07886376 = queryWeight, product of:
                1.2451826 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.0153772915 = queryNorm
              0.2252437 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.02295272 = weight(abstract_txt:evaluation in 4395) [ClassicSimilarity], result of:
            0.02295272 = score(doc=4395,freq=1.0), product of:
              0.093557835 = queryWeight, product of:
                1.3562332 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0153772915 = queryNorm
              0.24533188 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.040152214 = weight(abstract_txt:method in 4395) [ClassicSimilarity], result of:
            0.040152214 = score(doc=4395,freq=3.0), product of:
              0.09417956 = queryWeight, product of:
                1.3607321 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0153772915 = queryNorm
              0.42633682 = fieldWeight in 4395, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.10221484 = weight(abstract_txt:morphologically in 4395) [ClassicSimilarity], result of:
            0.10221484 = score(doc=4395,freq=1.0), product of:
              0.20099722 = queryWeight, product of:
                1.4056407 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0153772915 = queryNorm
              0.5085386 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.10539013 = weight(abstract_txt:porter in 4395) [ClassicSimilarity], result of:
            0.10539013 = score(doc=4395,freq=1.0), product of:
              0.20513858 = queryWeight, product of:
                1.4200479 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0153772915 = queryNorm
              0.5137509 = fieldWeight in 4395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.28106737 = weight(abstract_txt:stemmer in 4395) [ClassicSimilarity], result of:
            0.28106737 = score(doc=4395,freq=2.0), product of:
              0.39450666 = queryWeight, product of:
                2.784976 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0153772915 = queryNorm
              0.71245277 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
          0.29714534 = weight(abstract_txt:stemming in 4395) [ClassicSimilarity], result of:
            0.29714534 = score(doc=4395,freq=2.0), product of:
              0.5158261 = queryWeight, product of:
                4.503612 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0153772915 = queryNorm
              0.5760572 = fieldWeight in 4395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4395)
        0.32 = coord(8/25)
    
  3. Frakes, W.B.: Stemming algorithms (1992) 0.30
    0.30229086 = sum of:
      0.30229086 = product of:
        1.889318 = sum of:
          0.23363395 = weight(abstract_txt:morphologically in 3503) [ClassicSimilarity], result of:
            0.23363395 = score(doc=3503,freq=1.0), product of:
              0.20099722 = queryWeight, product of:
                1.4056407 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0153772915 = queryNorm
              1.162374 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.24089172 = weight(abstract_txt:porter in 3503) [ClassicSimilarity], result of:
            0.24089172 = score(doc=3503,freq=1.0), product of:
              0.20513858 = queryWeight, product of:
                1.4200479 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0153772915 = queryNorm
              1.1742878 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.45427346 = weight(abstract_txt:stemmer in 3503) [ClassicSimilarity], result of:
            0.45427346 = score(doc=3503,freq=1.0), product of:
              0.39450666 = queryWeight, product of:
                2.784976 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0153772915 = queryNorm
              1.1514976 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.9605188 = weight(abstract_txt:stemming in 3503) [ClassicSimilarity], result of:
            0.9605188 = score(doc=3503,freq=4.0), product of:
              0.5158261 = queryWeight, product of:
                4.503612 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0153772915 = queryNorm
              1.862098 = fieldWeight in 3503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
        0.16 = coord(4/25)
    
  4. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.2844694 = sum of:
      0.2844694 = product of:
        1.7779338 = sum of:
          0.3272423 = weight(abstract_txt:stemmers in 2585) [ClassicSimilarity], result of:
            0.3272423 = score(doc=2585,freq=3.0), product of:
              0.19070698 = queryWeight, product of:
                1.3691865 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0153772915 = queryNorm
              1.715943 = fieldWeight in 2585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.14165148 = weight(abstract_txt:algorithm in 2585) [ClassicSimilarity], result of:
            0.14165148 = score(doc=2585,freq=1.0), product of:
              0.22699443 = queryWeight, product of:
                2.5873044 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0153772915 = queryNorm
              0.62403065 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.8888131 = weight(abstract_txt:stemmer in 2585) [ClassicSimilarity], result of:
            0.8888131 = score(doc=2585,freq=5.0), product of:
              0.39450666 = queryWeight, product of:
                2.784976 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0153772915 = queryNorm
              2.2529736 = fieldWeight in 2585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.42022696 = weight(abstract_txt:stemming in 2585) [ClassicSimilarity], result of:
            0.42022696 = score(doc=2585,freq=1.0), product of:
              0.5158261 = queryWeight, product of:
                4.503612 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0153772915 = queryNorm
              0.8146679 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.16 = coord(4/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.2811201 = sum of:
      0.2811201 = product of:
        1.1713338 = sum of:
          0.093257084 = weight(abstract_txt:stem in 2950) [ClassicSimilarity], result of:
            0.093257084 = score(doc=2950,freq=1.0), product of:
              0.14906213 = queryWeight, product of:
                1.2104949 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0153772915 = queryNorm
              0.6256256 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.0327896 = weight(abstract_txt:evaluation in 2950) [ClassicSimilarity], result of:
            0.0327896 = score(doc=2950,freq=1.0), product of:
              0.093557835 = queryWeight, product of:
                1.3562332 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0153772915 = queryNorm
              0.35047412 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.1908516 = weight(abstract_txt:stemmers in 2950) [ClassicSimilarity], result of:
            0.1908516 = score(doc=2950,freq=2.0), product of:
              0.19070698 = queryWeight, product of:
                1.3691865 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0153772915 = queryNorm
              1.0007583 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.14602122 = weight(abstract_txt:morphologically in 2950) [ClassicSimilarity], result of:
            0.14602122 = score(doc=2950,freq=1.0), product of:
              0.20099722 = queryWeight, product of:
                1.4056407 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0153772915 = queryNorm
              0.72648376 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.28392094 = weight(abstract_txt:stemmer in 2950) [ClassicSimilarity], result of:
            0.28392094 = score(doc=2950,freq=1.0), product of:
              0.39450666 = queryWeight, product of:
                2.784976 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0153772915 = queryNorm
              0.71968603 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.4244933 = weight(abstract_txt:stemming in 2950) [ClassicSimilarity], result of:
            0.4244933 = score(doc=2950,freq=2.0), product of:
              0.5158261 = queryWeight, product of:
                4.503612 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0153772915 = queryNorm
              0.8229388 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
        0.24 = coord(6/25)