Document (#32588)

Author
Lazarinis, F.
Title
Engineering and utilizing a stopword list in Greek Web retrieval
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1645-1652
Year
2007
Abstract
The main aim of the article is the presentation of the construction process of a stopword list for a non-Latin language and the evaluation of the effect of stopword elimination from user queries. The article presents the phases of engineering a stopword list for the Greek language as well as the problems faced and the inferences deduced from this procedure. A set of 32 authentic queries are proposed by users and are run in Google with and without the stopwords. The importance of eliminating the stopwords from the user queries is then evaluated, in terms of relevance, in the top-10 results from Google.

Similar documents (content)

  1. Johnson, B.; Peterson, E.: Reviewing initial stopword selection (1992) 0.27
    0.27155903 = sum of:
      0.27155903 = product of:
        2.262992 = sum of:
          0.13968194 = weight(abstract_txt:list in 3629) [ClassicSimilarity], result of:
            0.13968194 = score(doc=3629,freq=2.0), product of:
              0.16769078 = queryWeight, product of:
                2.975394 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.010465663 = queryNorm
              0.83297324 = fieldWeight in 3629, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.109375 = fieldNorm(doc=3629)
          0.37507603 = weight(abstract_txt:stopwords in 3629) [ClassicSimilarity], result of:
            0.37507603 = score(doc=3629,freq=1.0), product of:
              0.3565673 = queryWeight, product of:
                3.542542 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010465663 = queryNorm
              1.0519081 = fieldWeight in 3629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.109375 = fieldNorm(doc=3629)
          1.7482338 = weight(abstract_txt:stopword in 3629) [ClassicSimilarity], result of:
            1.7482338 = score(doc=3629,freq=5.0), product of:
              0.7330748 = queryWeight, product of:
                7.183455 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010465663 = queryNorm
              2.384796 = fieldWeight in 3629, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.109375 = fieldNorm(doc=3629)
        0.12 = coord(3/25)
    
  2. Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.17
    0.17245322 = sum of:
      0.17245322 = product of:
        1.0778327 = sum of:
          0.022028824 = weight(abstract_txt:language in 3319) [ClassicSimilarity], result of:
            0.022028824 = score(doc=3319,freq=1.0), product of:
              0.067423016 = queryWeight, product of:
                1.5404526 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.010465663 = queryNorm
              0.32672557 = fieldWeight in 3319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=3319)
          0.017985223 = weight(abstract_txt:from in 3319) [ClassicSimilarity], result of:
            0.017985223 = score(doc=3319,freq=2.0), product of:
              0.058896728 = queryWeight, product of:
                2.0361269 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.010465663 = queryNorm
              0.30536878 = fieldWeight in 3319, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=3319)
          0.07055003 = weight(abstract_txt:list in 3319) [ClassicSimilarity], result of:
            0.07055003 = score(doc=3319,freq=1.0), product of:
              0.16769078 = queryWeight, product of:
                2.975394 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.010465663 = queryNorm
              0.42071503 = fieldWeight in 3319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.078125 = fieldNorm(doc=3319)
          0.96726865 = weight(abstract_txt:stopword in 3319) [ClassicSimilarity], result of:
            0.96726865 = score(doc=3319,freq=3.0), product of:
              0.7330748 = queryWeight, product of:
                7.183455 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010465663 = queryNorm
              1.3194679 = fieldWeight in 3319, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=3319)
        0.16 = coord(4/25)
    
  3. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.14
    0.13654846 = sum of:
      0.13654846 = product of:
        0.8534279 = sum of:
          0.02643459 = weight(abstract_txt:language in 1373) [ClassicSimilarity], result of:
            0.02643459 = score(doc=1373,freq=1.0), product of:
              0.067423016 = queryWeight, product of:
                1.5404526 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.010465663 = queryNorm
              0.3920707 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.0721899 = weight(abstract_txt:queries in 1373) [ClassicSimilarity], result of:
            0.0721899 = score(doc=1373,freq=1.0), product of:
              0.15079069 = queryWeight, product of:
                2.821481 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.010465663 = queryNorm
              0.47874242 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.08466004 = weight(abstract_txt:list in 1373) [ClassicSimilarity], result of:
            0.08466004 = score(doc=1373,freq=1.0), product of:
              0.16769078 = queryWeight, product of:
                2.975394 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.010465663 = queryNorm
              0.504858 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
          0.67014337 = weight(abstract_txt:stopword in 1373) [ClassicSimilarity], result of:
            0.67014337 = score(doc=1373,freq=1.0), product of:
              0.7330748 = queryWeight, product of:
                7.183455 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010465663 = queryNorm
              0.9141542 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=1373)
        0.16 = coord(4/25)
    
  4. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.11
    0.10762971 = sum of:
      0.10762971 = product of:
        0.89691424 = sum of:
          0.07055003 = weight(abstract_txt:list in 4955) [ClassicSimilarity], result of:
            0.07055003 = score(doc=4955,freq=1.0), product of:
              0.16769078 = queryWeight, product of:
                2.975394 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.010465663 = queryNorm
              0.42071503 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.26791146 = weight(abstract_txt:stopwords in 4955) [ClassicSimilarity], result of:
            0.26791146 = score(doc=4955,freq=1.0), product of:
              0.3565673 = queryWeight, product of:
                3.542542 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010465663 = queryNorm
              0.751363 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.5584528 = weight(abstract_txt:stopword in 4955) [ClassicSimilarity], result of:
            0.5584528 = score(doc=4955,freq=1.0), product of:
              0.7330748 = queryWeight, product of:
                7.183455 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010465663 = queryNorm
              0.7617951 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
        0.12 = coord(3/25)
    
  5. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.10
    0.09506251 = sum of:
      0.09506251 = product of:
        0.7921876 = sum of:
          0.037384156 = weight(abstract_txt:language in 5797) [ClassicSimilarity], result of:
            0.037384156 = score(doc=5797,freq=2.0), product of:
              0.067423016 = queryWeight, product of:
                1.5404526 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.010465663 = queryNorm
              0.55447173 = fieldWeight in 5797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.08466004 = weight(abstract_txt:list in 5797) [ClassicSimilarity], result of:
            0.08466004 = score(doc=5797,freq=1.0), product of:
              0.16769078 = queryWeight, product of:
                2.975394 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.010465663 = queryNorm
              0.504858 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
          0.67014337 = weight(abstract_txt:stopword in 5797) [ClassicSimilarity], result of:
            0.67014337 = score(doc=5797,freq=1.0), product of:
              0.7330748 = queryWeight, product of:
                7.183455 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010465663 = queryNorm
              0.9141542 = fieldWeight in 5797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=5797)
        0.12 = coord(3/25)