Document (#12069)

Author
Hull, D.A.
Title
Stemming algorithms : a case study for detailed evaluation
Source
Journal of the American Society for Information Science. 47(1996) no.1, S.70-84
Year
1996
Abstract
The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the bases of these measures. We claim that average performance figures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This article is a case study of stemming algorithms which describes a number of novel approaches to evaluation and demonstrates their value
Theme
Retrievalstudien

Similar documents (author)

  1. Hull, P.: Videotex: a new tool for librarians (1994) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:hull in 7836) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 7836, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=7836)
    
  2. Hull, T.J.: Reference services and electronic records : the impact of changing methods of communication and access (1995) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:hull in 1743) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 1743, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=1743)
    
  3. Hull, T.J.: Reference services and electronic records : the impact of changing methods of communication and access (1995) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:hull in 1744) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 1744, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=1744)
    
  4. Hull, T.J.: Reference services for electronic records in archives (1997) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:hull in 481) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 481, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=481)
    
  5. Hull, D.: ¬A weighted Boolean model for cross-language text retrieval (1998) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:hull in 6307) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 6307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=6307)
    

Similar documents (content)

  1. Alemayehu, N.: Analysis of performance variation using quey expansion (2003) 0.18
    0.17612104 = sum of:
      0.17612104 = product of:
        0.55037826 = sum of:
          0.026019288 = weight(abstract_txt:study in 1454) [ClassicSimilarity], result of:
            0.026019288 = score(doc=1454,freq=2.0), product of:
              0.085978776 = queryWeight, product of:
                1.1905854 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.021092184 = queryNorm
              0.30262455 = fieldWeight in 1454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.047124345 = weight(abstract_txt:retrieval in 1454) [ClassicSimilarity], result of:
            0.047124345 = score(doc=1454,freq=6.0), product of:
              0.08857628 = queryWeight, product of:
                1.2084359 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.021092184 = queryNorm
              0.5320199 = fieldWeight in 1454, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.01584387 = weight(abstract_txt:that in 1454) [ClassicSimilarity], result of:
            0.01584387 = score(doc=1454,freq=3.0), product of:
              0.061768707 = queryWeight, product of:
                1.2359326 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021092184 = queryNorm
              0.2565032 = fieldWeight in 1454, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.13243064 = weight(abstract_txt:figures in 1454) [ClassicSimilarity], result of:
            0.13243064 = score(doc=1454,freq=2.0), product of:
              0.20191875 = queryWeight, product of:
                1.2901442 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.021092184 = queryNorm
              0.6558611 = fieldWeight in 1454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.04138574 = weight(abstract_txt:evaluation in 1454) [ClassicSimilarity], result of:
            0.04138574 = score(doc=1454,freq=1.0), product of:
              0.14760627 = queryWeight, product of:
                1.5599738 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.021092184 = queryNorm
              0.2803793 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.072022885 = weight(abstract_txt:case in 1454) [ClassicSimilarity], result of:
            0.072022885 = score(doc=1454,freq=2.0), product of:
              0.16950193 = queryWeight, product of:
                1.6716765 = boost
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.021092184 = queryNorm
              0.42490894 = fieldWeight in 1454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.073611625 = weight(abstract_txt:measures in 1454) [ClassicSimilarity], result of:
            0.073611625 = score(doc=1454,freq=1.0), product of:
              0.21668819 = queryWeight, product of:
                1.8900902 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021092184 = queryNorm
              0.33971223 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
          0.14193986 = weight(abstract_txt:average in 1454) [ClassicSimilarity], result of:
            0.14193986 = score(doc=1454,freq=1.0), product of:
              0.38427103 = queryWeight, product of:
                3.0826864 = boost
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.021092184 = queryNorm
              0.36937436 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.0625 = fieldNorm(doc=1454)
        0.32 = coord(8/25)
    
  2. Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.16
    0.16233948 = sum of:
      0.16233948 = product of:
        0.57978386 = sum of:
          0.019238433 = weight(abstract_txt:retrieval in 5225) [ClassicSimilarity], result of:
            0.019238433 = score(doc=5225,freq=1.0), product of:
              0.08857628 = queryWeight, product of:
                1.2084359 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.021092184 = queryNorm
              0.21719621 = fieldWeight in 5225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.009147463 = weight(abstract_txt:that in 5225) [ClassicSimilarity], result of:
            0.009147463 = score(doc=5225,freq=1.0), product of:
              0.061768707 = queryWeight, product of:
                1.2359326 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021092184 = queryNorm
              0.1480922 = fieldWeight in 5225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.13085201 = weight(abstract_txt:superiority in 5225) [ClassicSimilarity], result of:
            0.13085201 = score(doc=5225,freq=1.0), product of:
              0.2523759 = queryWeight, product of:
                1.4423606 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.021092184 = queryNorm
              0.5184806 = fieldWeight in 5225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.04138574 = weight(abstract_txt:evaluation in 5225) [ClassicSimilarity], result of:
            0.04138574 = score(doc=5225,freq=1.0), product of:
              0.14760627 = queryWeight, product of:
                1.5599738 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.021092184 = queryNorm
              0.2803793 = fieldWeight in 5225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.05092787 = weight(abstract_txt:case in 5225) [ClassicSimilarity], result of:
            0.05092787 = score(doc=5225,freq=1.0), product of:
              0.16950193 = queryWeight, product of:
                1.6716765 = boost
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.021092184 = queryNorm
              0.300456 = fieldWeight in 5225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.12749907 = weight(abstract_txt:measures in 5225) [ClassicSimilarity], result of:
            0.12749907 = score(doc=5225,freq=3.0), product of:
              0.21668819 = queryWeight, product of:
                1.8900902 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021092184 = queryNorm
              0.5883988 = fieldWeight in 5225, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
          0.20073327 = weight(abstract_txt:average in 5225) [ClassicSimilarity], result of:
            0.20073327 = score(doc=5225,freq=2.0), product of:
              0.38427103 = queryWeight, product of:
                3.0826864 = boost
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.021092184 = queryNorm
              0.5223742 = fieldWeight in 5225, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.0625 = fieldNorm(doc=5225)
        0.28 = coord(7/25)
    
  3. Frakes, W.B.: Stemming algorithms (1992) 0.16
    0.15759598 = sum of:
      0.15759598 = product of:
        0.98497486 = sum of:
          0.038476866 = weight(abstract_txt:retrieval in 3503) [ClassicSimilarity], result of:
            0.038476866 = score(doc=3503,freq=1.0), product of:
              0.08857628 = queryWeight, product of:
                1.2084359 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.021092184 = queryNorm
              0.43439242 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.018294927 = weight(abstract_txt:that in 3503) [ClassicSimilarity], result of:
            0.018294927 = score(doc=3503,freq=1.0), product of:
              0.061768707 = queryWeight, product of:
                1.2359326 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021092184 = queryNorm
              0.2961844 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.17049746 = weight(abstract_txt:algorithms in 3503) [ClassicSimilarity], result of:
            0.17049746 = score(doc=3503,freq=1.0), product of:
              0.23896241 = queryWeight, product of:
                1.9848592 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.021092184 = queryNorm
              0.7134907 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
          0.7577056 = weight(abstract_txt:stemming in 3503) [ClassicSimilarity], result of:
            0.7577056 = score(doc=3503,freq=4.0), product of:
              0.40690964 = queryWeight, product of:
                2.5900843 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.021092184 = queryNorm
              1.862098 = fieldWeight in 3503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.125 = fieldNorm(doc=3503)
        0.16 = coord(4/25)
    
  4. Smithson, S.: Information retrieval evaluation in practice : a case study approach (1994) 0.15
    0.14665836 = sum of:
      0.14665836 = product of:
        0.52377987 = sum of:
          0.062832505 = weight(abstract_txt:demonstrates in 7302) [ClassicSimilarity], result of:
            0.062832505 = score(doc=7302,freq=1.0), product of:
              0.13336562 = queryWeight, product of:
                1.0485083 = boost
                6.0304604 = idf(docFreq=288, maxDocs=44218)
                0.021092184 = queryNorm
              0.47112972 = fieldWeight in 7302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0304604 = idf(docFreq=288, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.03983374 = weight(abstract_txt:study in 7302) [ClassicSimilarity], result of:
            0.03983374 = score(doc=7302,freq=3.0), product of:
              0.085978776 = queryWeight, product of:
                1.1905854 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.021092184 = queryNorm
              0.46329734 = fieldWeight in 7302, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.02404804 = weight(abstract_txt:retrieval in 7302) [ClassicSimilarity], result of:
            0.02404804 = score(doc=7302,freq=1.0), product of:
              0.08857628 = queryWeight, product of:
                1.2084359 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.021092184 = queryNorm
              0.27149525 = fieldWeight in 7302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.019804839 = weight(abstract_txt:that in 7302) [ClassicSimilarity], result of:
            0.019804839 = score(doc=7302,freq=3.0), product of:
              0.061768707 = queryWeight, product of:
                1.2359326 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021092184 = queryNorm
              0.320629 = fieldWeight in 7302, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.13687047 = weight(abstract_txt:evaluation in 7302) [ClassicSimilarity], result of:
            0.13687047 = score(doc=7302,freq=7.0), product of:
              0.14760627 = queryWeight, product of:
                1.5599738 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.021092184 = queryNorm
              0.9272673 = fieldWeight in 7302, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.11026207 = weight(abstract_txt:case in 7302) [ClassicSimilarity], result of:
            0.11026207 = score(doc=7302,freq=3.0), product of:
              0.16950193 = queryWeight, product of:
                1.6716765 = boost
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.021092184 = queryNorm
              0.6505063 = fieldWeight in 7302, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
          0.1301282 = weight(abstract_txt:measures in 7302) [ClassicSimilarity], result of:
            0.1301282 = score(doc=7302,freq=2.0), product of:
              0.21668819 = queryWeight, product of:
                1.8900902 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021092184 = queryNorm
              0.60053205 = fieldWeight in 7302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=7302)
        0.28 = coord(7/25)
    
  5. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.13
    0.12852457 = sum of:
      0.12852457 = product of:
        0.6426228 = sum of:
          0.028857648 = weight(abstract_txt:retrieval in 5797) [ClassicSimilarity], result of:
            0.028857648 = score(doc=5797,freq=1.0), product of:
              0.08857628 = queryWeight, product of:
                1.2084359 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.021092184 = queryNorm
              0.3257943 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.013721195 = weight(abstract_txt:that in 5797) [ClassicSimilarity], result of:
            0.013721195 = score(doc=5797,freq=1.0), product of:
              0.061768707 = queryWeight, product of:
                1.2359326 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021092184 = queryNorm
              0.22213829 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.08779242 = weight(abstract_txt:evaluation in 5797) [ClassicSimilarity], result of:
            0.08779242 = score(doc=5797,freq=2.0), product of:
              0.14760627 = queryWeight, product of:
                1.5599738 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.021092184 = queryNorm
              0.5947743 = fieldWeight in 5797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.11041744 = weight(abstract_txt:measures in 5797) [ClassicSimilarity], result of:
            0.11041744 = score(doc=5797,freq=1.0), product of:
              0.21668819 = queryWeight, product of:
                1.8900902 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021092184 = queryNorm
              0.50956833 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.4018341 = weight(abstract_txt:stemming in 5797) [ClassicSimilarity], result of:
            0.4018341 = score(doc=5797,freq=2.0), product of:
              0.40690964 = queryWeight, product of:
                2.5900843 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.021092184 = queryNorm
              0.9875266 = fieldWeight in 5797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
        0.2 = coord(5/25)