Document (#35321)

Author
Dolamic, L.
Savoy, J.
Title
When stopword lists make the difference
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203
Year
2009
Series
Brief communication
Abstract
In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
Theme
Automatisches Indexieren

Similar documents (author)

  1. Savoy, J.: Stemming of French words based on grammatical categories (1993) 5.21
    5.2066784 = sum of:
      5.2066784 = weight(author_txt:savoy in 4650) [ClassicSimilarity], result of:
        5.2066784 = fieldWeight in 4650, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.330686 = idf(docFreq=27, maxDocs=42740)
          0.625 = fieldNorm(doc=4650)
    
  2. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 5.21
    5.2066784 = sum of:
      5.2066784 = weight(author_txt:savoy in 6511) [ClassicSimilarity], result of:
        5.2066784 = fieldWeight in 6511, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.330686 = idf(docFreq=27, maxDocs=42740)
          0.625 = fieldNorm(doc=6511)
    
  3. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 5.21
    5.2066784 = sum of:
      5.2066784 = weight(author_txt:savoy in 7292) [ClassicSimilarity], result of:
        5.2066784 = fieldWeight in 7292, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.330686 = idf(docFreq=27, maxDocs=42740)
          0.625 = fieldNorm(doc=7292)
    
  4. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 5.21
    5.2066784 = sum of:
      5.2066784 = weight(author_txt:savoy in 261) [ClassicSimilarity], result of:
        5.2066784 = fieldWeight in 261, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.330686 = idf(docFreq=27, maxDocs=42740)
          0.625 = fieldNorm(doc=261)
    
  5. Savoy, J.: Searching information in legal hypertext systems (1993/94) 5.21
    5.2066784 = sum of:
      5.2066784 = weight(author_txt:savoy in 826) [ClassicSimilarity], result of:
        5.2066784 = fieldWeight in 826, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.330686 = idf(docFreq=27, maxDocs=42740)
          0.625 = fieldNorm(doc=826)
    

Similar documents (content)

  1. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.18
    0.17649354 = sum of:
      0.17649354 = product of:
        0.55154234 = sum of:
          0.029389964 = weight(abstract_txt:usually in 302) [ClassicSimilarity], result of:
            0.029389964 = score(doc=302,freq=1.0), product of:
              0.07676681 = queryWeight, product of:
                1.0025574 = boost
                6.1255565 = idf(docFreq=253, maxDocs=42740)
                0.01250025 = queryNorm
              0.38284728 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255565 = idf(docFreq=253, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.03697982 = weight(abstract_txt:lower in 302) [ClassicSimilarity], result of:
            0.03697982 = score(doc=302,freq=1.0), product of:
              0.08947135 = queryWeight, product of:
                1.0823419 = boost
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.01250025 = queryNorm
              0.41331467 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.05752974 = weight(abstract_txt:statistically in 302) [ClassicSimilarity], result of:
            0.05752974 = score(doc=302,freq=2.0), product of:
              0.09534378 = queryWeight, product of:
                1.1172972 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.01250025 = queryNorm
              0.60339266 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.07042906 = weight(abstract_txt:divergence in 302) [ClassicSimilarity], result of:
            0.07042906 = score(doc=302,freq=1.0), product of:
              0.13747025 = queryWeight, product of:
                1.3416117 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.01250025 = queryNorm
              0.5123222 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.10169464 = weight(abstract_txt:randomness in 302) [ClassicSimilarity], result of:
            0.10169464 = score(doc=302,freq=1.0), product of:
              0.17561954 = queryWeight, product of:
                1.516383 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.01250025 = queryNorm
              0.5790622 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.044326637 = weight(abstract_txt:performance in 302) [ClassicSimilarity], result of:
            0.044326637 = score(doc=302,freq=3.0), product of:
              0.08819695 = queryWeight, product of:
                1.5197225 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.01250025 = queryNorm
              0.50258696 = fieldWeight in 302, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.027818426 = weight(abstract_txt:when in 302) [ClassicSimilarity], result of:
            0.027818426 = score(doc=302,freq=1.0), product of:
              0.10673404 = queryWeight, product of:
                2.0475504 = boost
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.01250025 = queryNorm
              0.26063314 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.18337406 = weight(abstract_txt:okapi in 302) [ClassicSimilarity], result of:
            0.18337406 = score(doc=302,freq=2.0), product of:
              0.2601753 = queryWeight, product of:
                2.6101804 = boost
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.01250025 = queryNorm
              0.70480967 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
        0.32 = coord(8/25)
    
  2. Johnson, B.; Peterson, E.: Reviewing initial stopword selection (1992) 0.15
    0.15122949 = sum of:
      0.15122949 = product of:
        1.8903687 = sum of:
          0.054086726 = weight(abstract_txt:drawn in 3629) [ClassicSimilarity], result of:
            0.054086726 = score(doc=3629,freq=1.0), product of:
              0.07938575 = queryWeight, product of:
                1.0195154 = boost
                6.2291684 = idf(docFreq=228, maxDocs=42740)
                0.01250025 = queryNorm
              0.6813153 = fieldWeight in 3629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2291684 = idf(docFreq=228, maxDocs=42740)
                0.109375 = fieldNorm(doc=3629)
          1.836282 = weight(abstract_txt:stopword in 3629) [ClassicSimilarity], result of:
            1.836282 = score(doc=3629,freq=5.0), product of:
              0.77268946 = queryWeight, product of:
                6.3614335 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.01250025 = queryNorm
              2.3764813 = fieldWeight in 3629, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.109375 = fieldNorm(doc=3629)
        0.08 = coord(2/25)
    
  3. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.09
    0.093890734 = sum of:
      0.093890734 = product of:
        0.7824228 = sum of:
          0.05428882 = weight(abstract_txt:performance in 3374) [ClassicSimilarity], result of:
            0.05428882 = score(doc=3374,freq=2.0), product of:
              0.08819695 = queryWeight, product of:
                1.5197225 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.01250025 = queryNorm
              0.6155408 = fieldWeight in 3374, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.09375 = fieldNorm(doc=3374)
          0.024239456 = weight(abstract_txt:using in 3374) [ClassicSimilarity], result of:
            0.024239456 = score(doc=3374,freq=1.0), product of:
              0.07430801 = queryWeight, product of:
                1.7084447 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.01250025 = queryNorm
              0.32620248 = fieldWeight in 3374, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.09375 = fieldNorm(doc=3374)
          0.7038945 = weight(abstract_txt:stopword in 3374) [ClassicSimilarity], result of:
            0.7038945 = score(doc=3374,freq=1.0), product of:
              0.77268946 = queryWeight, product of:
                6.3614335 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.01250025 = queryNorm
              0.9109669 = fieldWeight in 3374, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=3374)
        0.12 = coord(3/25)
    
  4. Dadashkarimia, J.; Shakery, A.; Failia, H.; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents (2017) 0.09
    0.091133185 = sum of:
      0.091133185 = product of:
        0.2847912 = sum of:
          0.02571622 = weight(abstract_txt:usually in 5297) [ClassicSimilarity], result of:
            0.02571622 = score(doc=5297,freq=1.0), product of:
              0.07676681 = queryWeight, product of:
                1.0025574 = boost
                6.1255565 = idf(docFreq=253, maxDocs=42740)
                0.01250025 = queryNorm
              0.33499137 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255565 = idf(docFreq=253, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.029051498 = weight(abstract_txt:ones in 5297) [ClassicSimilarity], result of:
            0.029051498 = score(doc=5297,freq=1.0), product of:
              0.08326857 = queryWeight, product of:
                1.0441504 = boost
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.01250025 = queryNorm
              0.3488891 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.030224131 = weight(abstract_txt:french in 5297) [ClassicSimilarity], result of:
            0.030224131 = score(doc=5297,freq=1.0), product of:
              0.08549446 = queryWeight, product of:
                1.0580142 = boost
                6.4643936 = idf(docFreq=180, maxDocs=42740)
                0.01250025 = queryNorm
              0.35352153 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4643936 = idf(docFreq=180, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.009883941 = weight(abstract_txt:other in 5297) [ClassicSimilarity], result of:
            0.009883941 = score(doc=5297,freq=1.0), product of:
              0.05112879 = queryWeight, product of:
                1.1570983 = boost
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.01250025 = queryNorm
              0.1933146 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.061625425 = weight(abstract_txt:divergence in 5297) [ClassicSimilarity], result of:
            0.061625425 = score(doc=5297,freq=1.0), product of:
              0.13747025 = queryWeight, product of:
                1.3416117 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.01250025 = queryNorm
              0.4482819 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.022392998 = weight(abstract_txt:performance in 5297) [ClassicSimilarity], result of:
            0.022392998 = score(doc=5297,freq=1.0), product of:
              0.08819695 = queryWeight, product of:
                1.5197225 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.01250025 = queryNorm
              0.25389764 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.09175729 = weight(abstract_txt:persian in 5297) [ClassicSimilarity], result of:
            0.09175729 = score(doc=5297,freq=1.0), product of:
              0.17925136 = queryWeight, product of:
                1.5319822 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.01250025 = queryNorm
              0.5118917 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
          0.014139684 = weight(abstract_txt:using in 5297) [ClassicSimilarity], result of:
            0.014139684 = score(doc=5297,freq=1.0), product of:
              0.07430801 = queryWeight, product of:
                1.7084447 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.01250025 = queryNorm
              0.19028479 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5297)
        0.32 = coord(8/25)
    
  5. Kang, I.-H.; Kim, G.C.: Integration of multiple evidences based on a query type for web search (2004) 0.09
    0.08998235 = sum of:
      0.08998235 = product of:
        0.32136554 = sum of:
          0.029165627 = weight(abstract_txt:difference in 3569) [ClassicSimilarity], result of:
            0.029165627 = score(doc=3569,freq=1.0), product of:
              0.07637566 = queryWeight, product of:
                6.109931 = idf(docFreq=257, maxDocs=42740)
                0.01250025 = queryNorm
              0.3818707 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.109931 = idf(docFreq=257, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.03697982 = weight(abstract_txt:lower in 3569) [ClassicSimilarity], result of:
            0.03697982 = score(doc=3569,freq=1.0), product of:
              0.08947135 = queryWeight, product of:
                1.0823419 = boost
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.01250025 = queryNorm
              0.41331467 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6130347 = idf(docFreq=155, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.011295932 = weight(abstract_txt:other in 3569) [ClassicSimilarity], result of:
            0.011295932 = score(doc=3569,freq=1.0), product of:
              0.05112879 = queryWeight, product of:
                1.1570983 = boost
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.01250025 = queryNorm
              0.22093096 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.036192548 = weight(abstract_txt:performance in 3569) [ClassicSimilarity], result of:
            0.036192548 = score(doc=3569,freq=2.0), product of:
              0.08819695 = queryWeight, product of:
                1.5197225 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.01250025 = queryNorm
              0.41036054 = fieldWeight in 3569, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.050248157 = weight(abstract_txt:short in 3569) [ClassicSimilarity], result of:
            0.050248157 = score(doc=3569,freq=1.0), product of:
              0.13829215 = queryWeight, product of:
                1.902989 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.01250025 = queryNorm
              0.36334786 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.027818426 = weight(abstract_txt:when in 3569) [ClassicSimilarity], result of:
            0.027818426 = score(doc=3569,freq=1.0), product of:
              0.10673404 = queryWeight, product of:
                2.0475504 = boost
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.01250025 = queryNorm
              0.26063314 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1701303 = idf(docFreq=1794, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
          0.12966503 = weight(abstract_txt:okapi in 3569) [ClassicSimilarity], result of:
            0.12966503 = score(doc=3569,freq=1.0), product of:
              0.2601753 = queryWeight, product of:
                2.6101804 = boost
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.01250025 = queryNorm
              0.49837568 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.974011 = idf(docFreq=39, maxDocs=42740)
                0.0625 = fieldNorm(doc=3569)
        0.28 = coord(7/25)