Document (#35320)

Author
Dolamic, L.
Savoy, J.
Title
When stopword lists make the difference
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203
Year
2009
Series
Brief communication
Abstract
In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
Theme
Automatisches Indexieren

Similar documents (author)

  1. Savoy, J.: Stemming of French words based on grammatical categories (1993) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 4650) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 4650, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=4650)
    
  2. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 6511) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 6511, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=6511)
    
  3. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 7292) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 7292, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=7292)
    
  4. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 192) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 192, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=192)
    
  5. Savoy, J.: Searching information in legal hypertext systems (1993/94) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 757) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 757, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=757)
    

Similar documents (content)

  1. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.18
    0.17660542 = sum of:
      0.17660542 = product of:
        0.5518919 = sum of:
          0.029172672 = weight(abstract_txt:usually in 3301) [ClassicSimilarity], result of:
            0.029172672 = score(doc=3301,freq=1.0), product of:
              0.07630386 = queryWeight, product of:
                1.0036734 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.012428091 = queryNorm
              0.38232234 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.03570891 = weight(abstract_txt:lower in 3301) [ClassicSimilarity], result of:
            0.03570891 = score(doc=3301,freq=1.0), product of:
              0.08731325 = queryWeight, product of:
                1.0736414 = boost
                6.543596 = idf(docFreq=172, maxDocs=44218)
                0.012428091 = queryNorm
              0.40897474 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.543596 = idf(docFreq=172, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.057216085 = weight(abstract_txt:statistically in 3301) [ClassicSimilarity], result of:
            0.057216085 = score(doc=3301,freq=2.0), product of:
              0.09489234 = queryWeight, product of:
                1.1192697 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.012428091 = queryNorm
              0.6029579 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.07107372 = weight(abstract_txt:divergence in 3301) [ClassicSimilarity], result of:
            0.07107372 = score(doc=3301,freq=1.0), product of:
              0.13815558 = queryWeight, product of:
                1.3505274 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.012428091 = queryNorm
              0.514447 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.04383076 = weight(abstract_txt:performance in 3301) [ClassicSimilarity], result of:
            0.04383076 = score(doc=3301,freq=3.0), product of:
              0.08744158 = queryWeight, product of:
                1.5194737 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012428091 = queryNorm
              0.50125766 = fieldWeight in 3301, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.102478996 = weight(abstract_txt:randomness in 3301) [ClassicSimilarity], result of:
            0.102478996 = score(doc=3301,freq=1.0), product of:
              0.17632706 = queryWeight, product of:
                1.5257335 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012428091 = queryNorm
              0.581187 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.027294097 = weight(abstract_txt:when in 3301) [ClassicSimilarity], result of:
            0.027294097 = score(doc=3301,freq=1.0), product of:
              0.10527259 = queryWeight, product of:
                2.0419142 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.012428091 = queryNorm
              0.2592707 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.18511674 = weight(abstract_txt:okapi in 3301) [ClassicSimilarity], result of:
            0.18511674 = score(doc=3301,freq=2.0), product of:
              0.2615328 = queryWeight, product of:
                2.62783 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.012428091 = queryNorm
              0.7078146 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
        0.32 = coord(8/25)
    
  2. Johnson, B.; Peterson, E.: Reviewing initial stopword selection (1992) 0.15
    0.15228087 = sum of:
      0.15228087 = product of:
        1.9035109 = sum of:
          0.054009266 = weight(abstract_txt:drawn in 3629) [ClassicSimilarity], result of:
            0.054009266 = score(doc=3629,freq=1.0), product of:
              0.07922262 = queryWeight, product of:
                1.0226895 = boost
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.012428091 = queryNorm
              0.68174046 = fieldWeight in 3629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.109375 = fieldNorm(doc=3629)
          1.8495016 = weight(abstract_txt:stopword in 3629) [ClassicSimilarity], result of:
            1.8495016 = score(doc=3629,freq=5.0), product of:
              0.77553874 = queryWeight, product of:
                6.399572 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.012428091 = queryNorm
              2.384796 = fieldWeight in 3629, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.109375 = fieldNorm(doc=3629)
        0.08 = coord(2/25)
    
  3. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.09
    0.09437562 = sum of:
      0.09437562 = product of:
        0.7864635 = sum of:
          0.0536815 = weight(abstract_txt:performance in 1373) [ClassicSimilarity], result of:
            0.0536815 = score(doc=1373,freq=2.0), product of:
              0.08744158 = queryWeight, product of:
                1.5194737 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012428091 = queryNorm
              0.61391276 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.023820003 = weight(abstract_txt:using in 1373) [ClassicSimilarity], result of:
            0.023820003 = score(doc=1373,freq=1.0), product of:
              0.07336741 = queryWeight, product of:
                1.704635 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.012428091 = queryNorm
              0.32466736 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.70896196 = weight(abstract_txt:stopword in 1373) [ClassicSimilarity], result of:
            0.70896196 = score(doc=1373,freq=1.0), product of:
              0.77553874 = queryWeight, product of:
                6.399572 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.012428091 = queryNorm
              0.9141542 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
        0.12 = coord(3/25)
    
  4. Dadashkarimia, J.; Shakery, A.; Failia, H.; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents (2017) 0.09
    0.09006254 = sum of:
      0.09006254 = product of:
        0.28144544 = sum of:
          0.025526088 = weight(abstract_txt:usually in 3296) [ClassicSimilarity], result of:
            0.025526088 = score(doc=3296,freq=1.0), product of:
              0.07630386 = queryWeight, product of:
                1.0036734 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.012428091 = queryNorm
              0.33453205 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.02829658 = weight(abstract_txt:ones in 3296) [ClassicSimilarity], result of:
            0.02829658 = score(doc=3296,freq=1.0), product of:
              0.08172965 = queryWeight, product of:
                1.0387452 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.012428091 = queryNorm
              0.34622172 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.029995238 = weight(abstract_txt:french in 3296) [ClassicSimilarity], result of:
            0.029995238 = score(doc=3296,freq=1.0), product of:
              0.08496862 = queryWeight, product of:
                1.059128 = boost
                6.45514 = idf(docFreq=188, maxDocs=44218)
                0.012428091 = queryNorm
              0.35301548 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.45514 = idf(docFreq=188, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.00973143 = weight(abstract_txt:other in 3296) [ClassicSimilarity], result of:
            0.00973143 = score(doc=3296,freq=1.0), product of:
              0.050545767 = queryWeight, product of:
                1.1552516 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.012428091 = queryNorm
              0.1925271 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.062189505 = weight(abstract_txt:divergence in 3296) [ClassicSimilarity], result of:
            0.062189505 = score(doc=3296,freq=1.0), product of:
              0.13815558 = queryWeight, product of:
                1.3505274 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.012428091 = queryNorm
              0.4501411 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.02214249 = weight(abstract_txt:performance in 3296) [ClassicSimilarity], result of:
            0.02214249 = score(doc=3296,freq=1.0), product of:
              0.08744158 = queryWeight, product of:
                1.5194737 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012428091 = queryNorm
              0.2532261 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.089669116 = weight(abstract_txt:persian in 3296) [ClassicSimilarity], result of:
            0.089669116 = score(doc=3296,freq=1.0), product of:
              0.17632706 = queryWeight, product of:
                1.5257335 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012428091 = queryNorm
              0.5085386 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
          0.013895001 = weight(abstract_txt:using in 3296) [ClassicSimilarity], result of:
            0.013895001 = score(doc=3296,freq=1.0), product of:
              0.07336741 = queryWeight, product of:
                1.704635 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.012428091 = queryNorm
              0.18938929 = fieldWeight in 3296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3296)
        0.32 = coord(8/25)
    
  5. Kang, I.-H.; Kim, G.C.: Integration of multiple evidences based on a query type for web search (2004) 0.09
    0.08940914 = sum of:
      0.08940914 = product of:
        0.31931835 = sum of:
          0.028853524 = weight(abstract_txt:difference in 2568) [ClassicSimilarity], result of:
            0.028853524 = score(doc=2568,freq=1.0), product of:
              0.075746335 = queryWeight, product of:
                6.0947685 = idf(docFreq=270, maxDocs=44218)
                0.012428091 = queryNorm
              0.38092303 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0947685 = idf(docFreq=270, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.03570891 = weight(abstract_txt:lower in 2568) [ClassicSimilarity], result of:
            0.03570891 = score(doc=2568,freq=1.0), product of:
              0.08731325 = queryWeight, product of:
                1.0736414 = boost
                6.543596 = idf(docFreq=172, maxDocs=44218)
                0.012428091 = queryNorm
              0.40897474 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.543596 = idf(docFreq=172, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.011121634 = weight(abstract_txt:other in 2568) [ClassicSimilarity], result of:
            0.011121634 = score(doc=2568,freq=1.0), product of:
              0.050545767 = queryWeight, product of:
                1.1552516 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.012428091 = queryNorm
              0.22003098 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.035787668 = weight(abstract_txt:performance in 2568) [ClassicSimilarity], result of:
            0.035787668 = score(doc=2568,freq=2.0), product of:
              0.08744158 = queryWeight, product of:
                1.5194737 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012428091 = queryNorm
              0.40927517 = fieldWeight in 2568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.04965521 = weight(abstract_txt:short in 2568) [ClassicSimilarity], result of:
            0.04965521 = score(doc=2568,freq=1.0), product of:
              0.13705102 = queryWeight, product of:
                1.9022839 = boost
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.012428091 = queryNorm
              0.36231187 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.027294097 = weight(abstract_txt:when in 2568) [ClassicSimilarity], result of:
            0.027294097 = score(doc=2568,freq=1.0), product of:
              0.10527259 = queryWeight, product of:
                2.0419142 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.012428091 = queryNorm
              0.2592707 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
          0.1308973 = weight(abstract_txt:okapi in 2568) [ClassicSimilarity], result of:
            0.1308973 = score(doc=2568,freq=1.0), product of:
              0.2615328 = queryWeight, product of:
                2.62783 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.012428091 = queryNorm
              0.5005005 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=2568)
        0.28 = coord(7/25)