Document (#40188)

Author
Flores, F.N.
Moreira, V.P.
Title
Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective
Source
Information processing and management. 52(2016) no.5, S.840-854
Year
2016
Abstract
The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Content
Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316300358.
Theme
Automatisches Indexieren
Multilinguale Probleme

Similar documents (author)

  1. Flores, F.; Spinosa, C.: Information technology and the institution of identity : reflections since 'Understanding computers and cognition' (1998) 1.85
    1.8456286 = sum of:
      1.8456286 = product of:
        3.6912572 = sum of:
          3.6912572 = weight(author_txt:flores in 3833) [ClassicSimilarity], result of:
            3.6912572 = score(doc=3833,freq=1.0), product of:
              0.75710505 = queryWeight, product of:
                1.0765247 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07212469 = queryNorm
              4.8754888 = fieldWeight in 3833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=3833)
        0.5 = coord(1/2)
    
  2. Winograd, T.; Flores, F.: Erkenntnis, Maschinen, Verstehen : zur Neugestaltung von Computersystemen (1992) 1.85
    1.8456286 = sum of:
      1.8456286 = product of:
        3.6912572 = sum of:
          3.6912572 = weight(author_txt:flores in 1524) [ClassicSimilarity], result of:
            3.6912572 = score(doc=1524,freq=1.0), product of:
              0.75710505 = queryWeight, product of:
                1.0765247 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07212469 = queryNorm
              4.8754888 = fieldWeight in 1524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=1524)
        0.5 = coord(1/2)
    
  3. Moreira, F. Mosso => Mosso Moreira, F.: 1.57
    1.5690925 = sum of:
      1.5690925 = product of:
        3.138185 = sum of:
          3.138185 = weight(author_txt:moreira in 4730) [ClassicSimilarity], result of:
            3.138185 = score(doc=4730,freq=2.0), product of:
              0.6532932 = queryWeight, product of:
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.07212469 = queryNorm
              4.8036394 = fieldWeight in 4730, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.375 = fieldNorm(doc=4730)
        0.5 = coord(1/2)
    
  4. Medina-Mora, R.; Winograd, T.; Flores, R.: ¬The ActionWorkflow approach to workflow management technology (1993) 1.38
    1.3842214 = sum of:
      1.3842214 = product of:
        2.7684429 = sum of:
          2.7684429 = weight(author_txt:flores in 7056) [ClassicSimilarity], result of:
            2.7684429 = score(doc=7056,freq=1.0), product of:
              0.75710505 = queryWeight, product of:
                1.0765247 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07212469 = queryNorm
              3.6566167 = fieldWeight in 7056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.375 = fieldNorm(doc=7056)
        0.5 = coord(1/2)
    
  5. Flores-Herr, N.; Sack, H.; Bossert, K.: Suche in Multimediaarchiven von Kultureinrichtungen (2011) 1.38
    1.3842214 = sum of:
      1.3842214 = product of:
        2.7684429 = sum of:
          2.7684429 = weight(author_txt:flores in 346) [ClassicSimilarity], result of:
            2.7684429 = score(doc=346,freq=1.0), product of:
              0.75710505 = queryWeight, product of:
                1.0765247 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07212469 = queryNorm
              3.6566167 = fieldWeight in 346, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.375 = fieldNorm(doc=346)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.28062293 = sum of:
      0.28062293 = product of:
        1.1692622 = sum of:
          0.098625034 = weight(abstract_txt:stem in 2950) [ClassicSimilarity], result of:
            0.098625034 = score(doc=2950,freq=1.0), product of:
              0.15764226 = queryWeight, product of:
                1.2172272 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.016172476 = queryNorm
              0.6256256 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.02540003 = weight(abstract_txt:terms in 2950) [ClassicSimilarity], result of:
            0.02540003 = score(doc=2950,freq=1.0), product of:
              0.08039839 = queryWeight, product of:
                1.2293456 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016172476 = queryNorm
              0.3159271 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.019268777 = weight(abstract_txt:information in 2950) [ClassicSimilarity], result of:
            0.019268777 = score(doc=2950,freq=2.0), product of:
              0.07203839 = queryWeight, product of:
                1.8399343 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.016172476 = queryNorm
              0.2674793 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.08376108 = weight(abstract_txt:retrieval in 2950) [ClassicSimilarity], result of:
            0.08376108 = score(doc=2950,freq=3.0), product of:
              0.17812276 = queryWeight, product of:
                3.1693532 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.016172476 = queryNorm
              0.47024357 = fieldWeight in 2950, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.33669567 = weight(abstract_txt:stemming in 2950) [ClassicSimilarity], result of:
            0.33669567 = score(doc=2950,freq=2.0), product of:
              0.40913817 = queryWeight, product of:
                3.3964949 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016172476 = queryNorm
              0.8229388 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.6055116 = weight(abstract_txt:stemmers in 2950) [ClassicSimilarity], result of:
            0.6055116 = score(doc=2950,freq=2.0), product of:
              0.6050528 = queryWeight, product of:
                4.1304045 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016172476 = queryNorm
              1.0007583 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
        0.24 = coord(6/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.24
    0.23882464 = sum of:
      0.23882464 = product of:
        1.492654 = sum of:
          0.021104503 = weight(abstract_txt:also in 2585) [ClassicSimilarity], result of:
            0.021104503 = score(doc=2585,freq=1.0), product of:
              0.056779202 = queryWeight, product of:
                1.0331062 = boost
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.016172476 = queryNorm
              0.37169427 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.10000155 = weight(abstract_txt:algorithms in 2585) [ClassicSimilarity], result of:
            0.10000155 = score(doc=2585,freq=1.0), product of:
              0.16018075 = queryWeight, product of:
                1.7352238 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.016172476 = queryNorm
              0.6243044 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.33331174 = weight(abstract_txt:stemming in 2585) [ClassicSimilarity], result of:
            0.33331174 = score(doc=2585,freq=1.0), product of:
              0.40913817 = queryWeight, product of:
                3.3964949 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016172476 = queryNorm
              0.8146679 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          1.0382361 = weight(abstract_txt:stemmers in 2585) [ClassicSimilarity], result of:
            1.0382361 = score(doc=2585,freq=3.0), product of:
              0.6050528 = queryWeight, product of:
                4.1304045 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016172476 = queryNorm
              1.715943 = fieldWeight in 2585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.16 = coord(4/25)
    
  3. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.24
    0.2373063 = sum of:
      0.2373063 = product of:
        0.8475225 = sum of:
          0.051815573 = weight(abstract_txt:spanish in 2686) [ClassicSimilarity], result of:
            0.051815573 = score(doc=2686,freq=1.0), product of:
              0.119104475 = queryWeight, product of:
                1.0580333 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.016172476 = queryNorm
              0.43504304 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.07890003 = weight(abstract_txt:stem in 2686) [ClassicSimilarity], result of:
            0.07890003 = score(doc=2686,freq=1.0), product of:
              0.15764226 = queryWeight, product of:
                1.2172272 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.016172476 = queryNorm
              0.5005005 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.080813445 = weight(abstract_txt:algorithms in 2686) [ClassicSimilarity], result of:
            0.080813445 = score(doc=2686,freq=2.0), product of:
              0.16018075 = queryWeight, product of:
                1.7352238 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.016172476 = queryNorm
              0.5045141 = fieldWeight in 2686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.015415023 = weight(abstract_txt:information in 2686) [ClassicSimilarity], result of:
            0.015415023 = score(doc=2686,freq=2.0), product of:
              0.07203839 = queryWeight, product of:
                1.8399343 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.016172476 = queryNorm
              0.21398345 = fieldWeight in 2686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.07797087 = weight(abstract_txt:accurate in 2686) [ClassicSimilarity], result of:
            0.07797087 = score(doc=2686,freq=1.0), product of:
              0.1970544 = queryWeight, product of:
                1.9246129 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.016172476 = queryNorm
              0.39568195 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.038687587 = weight(abstract_txt:retrieval in 2686) [ClassicSimilarity], result of:
            0.038687587 = score(doc=2686,freq=1.0), product of:
              0.17812276 = queryWeight, product of:
                3.1693532 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.016172476 = queryNorm
              0.21719621 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.50391996 = weight(abstract_txt:stemming in 2686) [ClassicSimilarity], result of:
            0.50391996 = score(doc=2686,freq=7.0), product of:
              0.40913817 = queryWeight, product of:
                3.3964949 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016172476 = queryNorm
              1.231662 = fieldWeight in 2686, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
        0.28 = coord(7/25)
    
  4. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.21
    0.20572416 = sum of:
      0.20572416 = product of:
        1.0286208 = sum of:
          0.0120597165 = weight(abstract_txt:also in 3301) [ClassicSimilarity], result of:
            0.0120597165 = score(doc=3301,freq=1.0), product of:
              0.056779202 = queryWeight, product of:
                1.0331062 = boost
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.016172476 = queryNorm
              0.21239673 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3983476 = idf(docFreq=4017, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.010900067 = weight(abstract_txt:information in 3301) [ClassicSimilarity], result of:
            0.010900067 = score(doc=3301,freq=1.0), product of:
              0.07203839 = queryWeight, product of:
                1.8399343 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.016172476 = queryNorm
              0.15130915 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.054712508 = weight(abstract_txt:retrieval in 3301) [ClassicSimilarity], result of:
            0.054712508 = score(doc=3301,freq=2.0), product of:
              0.17812276 = queryWeight, product of:
                3.1693532 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.016172476 = queryNorm
              0.3071618 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.46653923 = weight(abstract_txt:stemming in 3301) [ClassicSimilarity], result of:
            0.46653923 = score(doc=3301,freq=6.0), product of:
              0.40913817 = queryWeight, product of:
                3.3964949 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016172476 = queryNorm
              1.1402975 = fieldWeight in 3301, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.4844093 = weight(abstract_txt:stemmers in 3301) [ClassicSimilarity], result of:
            0.4844093 = score(doc=3301,freq=2.0), product of:
              0.6050528 = queryWeight, product of:
                4.1304045 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016172476 = queryNorm
              0.8006066 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
        0.2 = coord(5/25)
    
  5. Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.21
    0.20563574 = sum of:
      0.20563574 = product of:
        1.0281787 = sum of:
          0.028270546 = weight(abstract_txt:most in 6987) [ClassicSimilarity], result of:
            0.028270546 = score(doc=6987,freq=1.0), product of:
              0.0764645 = queryWeight, product of:
                1.1988925 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.016172476 = queryNorm
              0.3697212 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.11835005 = weight(abstract_txt:stem in 6987) [ClassicSimilarity], result of:
            0.11835005 = score(doc=6987,freq=1.0), product of:
              0.15764226 = queryWeight, product of:
                1.2172272 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.016172476 = queryNorm
              0.7507508 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.08206876 = weight(abstract_txt:retrieval in 6987) [ClassicSimilarity], result of:
            0.08206876 = score(doc=6987,freq=2.0), product of:
              0.17812276 = queryWeight, product of:
                3.1693532 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.016172476 = queryNorm
              0.4607427 = fieldWeight in 6987, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.28569576 = weight(abstract_txt:stemming in 6987) [ClassicSimilarity], result of:
            0.28569576 = score(doc=6987,freq=1.0), product of:
              0.40913817 = queryWeight, product of:
                3.3964949 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.016172476 = queryNorm
              0.6982868 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.51379365 = weight(abstract_txt:stemmers in 6987) [ClassicSimilarity], result of:
            0.51379365 = score(doc=6987,freq=1.0), product of:
              0.6050528 = queryWeight, product of:
                4.1304045 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016172476 = queryNorm
              0.8491715 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
        0.2 = coord(5/25)