Document (#40189)

Author
Flores, F.N.
Moreira, V.P.
Title
Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective
Source
Information processing and management. 52(2016) no.5, S.840-854
Year
2016
Abstract
The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Content
Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316300358.
Theme
Automatisches Indexieren
Multilinguale Probleme

Similar documents (author)

  1. Flores, F.; Spinosa, C.: Information technology and the institution of identity : reflections since 'Understanding computers and cognition' (1998) 1.81
    1.8148271 = sum of:
      1.8148271 = product of:
        3.6296542 = sum of:
          3.6296542 = weight(author_txt:flores in 5834) [ClassicSimilarity], result of:
            3.6296542 = score(doc=5834,freq=1.0), product of:
              0.7461565 = queryWeight, product of:
                1.0586507 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.07244559 = queryNorm
              4.8644676 = fieldWeight in 5834, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.5 = fieldNorm(doc=5834)
        0.5 = coord(1/2)
    
  2. Winograd, T.; Flores, F.: Erkenntnis, Maschinen, Verstehen : zur Neugestaltung von Computersystemen (1992) 1.81
    1.8148271 = sum of:
      1.8148271 = product of:
        3.6296542 = sum of:
          3.6296542 = weight(author_txt:flores in 3525) [ClassicSimilarity], result of:
            3.6296542 = score(doc=3525,freq=1.0), product of:
              0.7461565 = queryWeight, product of:
                1.0586507 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.07244559 = queryNorm
              4.8644676 = fieldWeight in 3525, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.5 = fieldNorm(doc=3525)
        0.5 = coord(1/2)
    
  3. Moreira, F. Mosso => Mosso Moreira, F.: 1.62
    1.622383 = sum of:
      1.622383 = product of:
        3.244766 = sum of:
          3.244766 = weight(author_txt:moreira in 731) [ClassicSimilarity], result of:
            3.244766 = score(doc=731,freq=2.0), product of:
              0.6657705 = queryWeight, product of:
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.07244559 = queryNorm
              4.8737006 = fieldWeight in 731, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.375 = fieldNorm(doc=731)
        0.5 = coord(1/2)
    
  4. Medina-Mora, R.; Winograd, T.; Flores, R.: ¬The ActionWorkflow approach to workflow management technology (1993) 1.36
    1.3611203 = sum of:
      1.3611203 = product of:
        2.7222407 = sum of:
          2.7222407 = weight(author_txt:flores in 56) [ClassicSimilarity], result of:
            2.7222407 = score(doc=56,freq=1.0), product of:
              0.7461565 = queryWeight, product of:
                1.0586507 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.07244559 = queryNorm
              3.6483507 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.375 = fieldNorm(doc=56)
        0.5 = coord(1/2)
    
  5. Flores-Herr, N.; Sack, H.; Bossert, K.: Suche in Multimediaarchiven von Kultureinrichtungen (2011) 1.36
    1.3611203 = sum of:
      1.3611203 = product of:
        2.7222407 = sum of:
          2.7222407 = weight(author_txt:flores in 1811) [ClassicSimilarity], result of:
            2.7222407 = score(doc=1811,freq=1.0), product of:
              0.7461565 = queryWeight, product of:
                1.0586507 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.07244559 = queryNorm
              3.6483507 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.375 = fieldNorm(doc=1811)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.28135127 = sum of:
      0.28135127 = product of:
        1.172297 = sum of:
          0.025643447 = weight(abstract_txt:terms in 4951) [ClassicSimilarity], result of:
            0.025643447 = score(doc=4951,freq=1.0), product of:
              0.0808848 = queryWeight, product of:
                1.2343978 = boost
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.016147017 = queryNorm
              0.31703666 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.058069 = idf(docFreq=2031, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.10613928 = weight(abstract_txt:stem in 4951) [ClassicSimilarity], result of:
            0.10613928 = score(doc=4951,freq=1.0), product of:
              0.16549698 = queryWeight, product of:
                1.2485379 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.016147017 = queryNorm
              0.6413367 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.01939275 = weight(abstract_txt:information in 4951) [ClassicSimilarity], result of:
            0.01939275 = score(doc=4951,freq=2.0), product of:
              0.07232341 = queryWeight, product of:
                1.8455726 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.016147017 = queryNorm
              0.2681393 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.08330109 = weight(abstract_txt:retrieval in 4951) [ClassicSimilarity], result of:
            0.08330109 = score(doc=4951,freq=3.0), product of:
              0.17741205 = queryWeight, product of:
                3.1664588 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.016147017 = queryNorm
              0.46953458 = fieldWeight in 4951, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.33730763 = weight(abstract_txt:stemming in 4951) [ClassicSimilarity], result of:
            0.33730763 = score(doc=4951,freq=2.0), product of:
              0.40949994 = queryWeight, product of:
                3.4016862 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.016147017 = queryNorm
              0.82370615 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
          0.60051286 = weight(abstract_txt:stemmers in 4951) [ClassicSimilarity], result of:
            0.60051286 = score(doc=4951,freq=2.0), product of:
              0.6015217 = queryWeight, product of:
                4.1228065 = boost
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.016147017 = queryNorm
              0.9983229 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.078125 = fieldNorm(doc=4951)
        0.24 = coord(6/25)
    
  2. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.24
    0.23978673 = sum of:
      0.23978673 = product of:
        0.8563812 = sum of:
          0.052070525 = weight(abstract_txt:spanish in 4151) [ClassicSimilarity], result of:
            0.052070525 = score(doc=4151,freq=1.0), product of:
              0.11945581 = queryWeight, product of:
                1.0607433 = boost
                6.9743648 = idf(docFreq=109, maxDocs=43254)
                0.016147017 = queryNorm
              0.4358978 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9743648 = idf(docFreq=109, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.08491142 = weight(abstract_txt:stem in 4151) [ClassicSimilarity], result of:
            0.08491142 = score(doc=4151,freq=1.0), product of:
              0.16549698 = queryWeight, product of:
                1.2485379 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.016147017 = queryNorm
              0.51306933 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.082443215 = weight(abstract_txt:algorithms in 4151) [ClassicSimilarity], result of:
            0.082443215 = score(doc=4151,freq=2.0), product of:
              0.16227412 = queryWeight, product of:
                1.7484223 = boost
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.016147017 = queryNorm
              0.5080491 = fieldWeight in 4151, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.015514199 = weight(abstract_txt:information in 4151) [ClassicSimilarity], result of:
            0.015514199 = score(doc=4151,freq=2.0), product of:
              0.07232341 = queryWeight, product of:
                1.8455726 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.016147017 = queryNorm
              0.21451144 = fieldWeight in 4151, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.0781308 = weight(abstract_txt:accurate in 4151) [ClassicSimilarity], result of:
            0.0781308 = score(doc=4151,freq=1.0), product of:
              0.1972593 = queryWeight, product of:
                1.9277043 = boost
                6.337307 = idf(docFreq=207, maxDocs=43254)
                0.016147017 = queryNorm
              0.3960817 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.337307 = idf(docFreq=207, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.03847513 = weight(abstract_txt:retrieval in 4151) [ClassicSimilarity], result of:
            0.03847513 = score(doc=4151,freq=1.0), product of:
              0.17741205 = queryWeight, product of:
                3.1664588 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.016147017 = queryNorm
              0.21686874 = fieldWeight in 4151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
          0.50483584 = weight(abstract_txt:stemming in 4151) [ClassicSimilarity], result of:
            0.50483584 = score(doc=4151,freq=7.0), product of:
              0.40949994 = queryWeight, product of:
                3.4016862 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.016147017 = queryNorm
              1.2328105 = fieldWeight in 4151, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.0625 = fieldNorm(doc=4151)
        0.28 = coord(7/25)
    
  3. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.24
    0.2379307 = sum of:
      0.2379307 = product of:
        1.4870669 = sum of:
          0.02146598 = weight(abstract_txt:also in 4586) [ClassicSimilarity], result of:
            0.02146598 = score(doc=4586,freq=1.0), product of:
              0.057406943 = queryWeight, product of:
                1.0399295 = boost
                3.418757 = idf(docFreq=3850, maxDocs=43254)
                0.016147017 = queryNorm
              0.37392655 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.418757 = idf(docFreq=3850, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          0.10201828 = weight(abstract_txt:algorithms in 4586) [ClassicSimilarity], result of:
            0.10201828 = score(doc=4586,freq=1.0), product of:
              0.16227412 = queryWeight, product of:
                1.7484223 = boost
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.016147017 = queryNorm
              0.6286787 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.747919 = idf(docFreq=374, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          0.33391753 = weight(abstract_txt:stemming in 4586) [ClassicSimilarity], result of:
            0.33391753 = score(doc=4586,freq=1.0), product of:
              0.40949994 = queryWeight, product of:
                3.4016862 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.016147017 = queryNorm
              0.81542754 = fieldWeight in 4586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
          1.0296651 = weight(abstract_txt:stemmers in 4586) [ClassicSimilarity], result of:
            1.0296651 = score(doc=4586,freq=3.0), product of:
              0.6015217 = queryWeight, product of:
                4.1228065 = boost
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.016147017 = queryNorm
              1.7117672 = fieldWeight in 4586, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.109375 = fieldNorm(doc=4586)
        0.16 = coord(4/25)
    
  4. Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.21
    0.20664512 = sum of:
      0.20664512 = product of:
        1.0332255 = sum of:
          0.028473154 = weight(abstract_txt:most in 57) [ClassicSimilarity], result of:
            0.028473154 = score(doc=57,freq=1.0), product of:
              0.0768043 = queryWeight, product of:
                1.2028582 = boost
                3.9543834 = idf(docFreq=2253, maxDocs=43254)
                0.016147017 = queryNorm
              0.37072343 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9543834 = idf(docFreq=2253, maxDocs=43254)
                0.09375 = fieldNorm(doc=57)
          0.12736712 = weight(abstract_txt:stem in 57) [ClassicSimilarity], result of:
            0.12736712 = score(doc=57,freq=1.0), product of:
              0.16549698 = queryWeight, product of:
                1.2485379 = boost
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.016147017 = queryNorm
              0.76960397 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.09375 = fieldNorm(doc=57)
          0.08161807 = weight(abstract_txt:retrieval in 57) [ClassicSimilarity], result of:
            0.08161807 = score(doc=57,freq=2.0), product of:
              0.17741205 = queryWeight, product of:
                3.1664588 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.016147017 = queryNorm
              0.46004808 = fieldWeight in 57, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.09375 = fieldNorm(doc=57)
          0.28621504 = weight(abstract_txt:stemming in 57) [ClassicSimilarity], result of:
            0.28621504 = score(doc=57,freq=1.0), product of:
              0.40949994 = queryWeight, product of:
                3.4016862 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.016147017 = queryNorm
              0.6989379 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.09375 = fieldNorm(doc=57)
          0.50955206 = weight(abstract_txt:stemmers in 57) [ClassicSimilarity], result of:
            0.50955206 = score(doc=57,freq=1.0), product of:
              0.6015217 = queryWeight, product of:
                4.1228065 = boost
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.016147017 = queryNorm
              0.8471051 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.09375 = fieldNorm(doc=57)
        0.2 = coord(5/25)
    
  5. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.21
    0.2050892 = sum of:
      0.2050892 = product of:
        1.0254459 = sum of:
          0.012266275 = weight(abstract_txt:also in 302) [ClassicSimilarity], result of:
            0.012266275 = score(doc=302,freq=1.0), product of:
              0.057406943 = queryWeight, product of:
                1.0399295 = boost
                3.418757 = idf(docFreq=3850, maxDocs=43254)
                0.016147017 = queryNorm
              0.21367231 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.418757 = idf(docFreq=3850, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.010970196 = weight(abstract_txt:information in 302) [ClassicSimilarity], result of:
            0.010970196 = score(doc=302,freq=1.0), product of:
              0.07232341 = queryWeight, product of:
                1.8455726 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.016147017 = queryNorm
              0.1516825 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.054412045 = weight(abstract_txt:retrieval in 302) [ClassicSimilarity], result of:
            0.054412045 = score(doc=302,freq=2.0), product of:
              0.17741205 = queryWeight, product of:
                3.1664588 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.016147017 = queryNorm
              0.3066987 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.46738723 = weight(abstract_txt:stemming in 302) [ClassicSimilarity], result of:
            0.46738723 = score(doc=302,freq=6.0), product of:
              0.40949994 = queryWeight, product of:
                3.4016862 = boost
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.016147017 = queryNorm
              1.1413609 = fieldWeight in 302, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4553375 = idf(docFreq=67, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
          0.48041028 = weight(abstract_txt:stemmers in 302) [ClassicSimilarity], result of:
            0.48041028 = score(doc=302,freq=2.0), product of:
              0.6015217 = queryWeight, product of:
                4.1228065 = boost
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.016147017 = queryNorm
              0.7986583 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.035788 = idf(docFreq=13, maxDocs=43254)
                0.0625 = fieldNorm(doc=302)
        0.2 = coord(5/25)