Document (#29479)

Author
Tomov, D.T.
Title
Some critical remarks on the stop word lists of ISI publications
Source
Journal of documentation. 57(2001) no.6, S.798-808
Year
2001
Abstract
A semantic analysis of the "Weekly Subject Index Stop Word List" of Current Contents of the Institute for Scientific Information (ISI) as well as of the full-stop word and semi-stop word lists of the Permuterm Subject Index of Science Citation Index was carried out. Selected terms from the first issues for 1997, 1999 and 2000 of the CCODAb/Life Sciences, of the first issues for 1997 and 2000 of CCOD Proceedings, as well as from the SCI CDE for 1997 and January-June of 2000 were screened. True full-stop and semi-stop words commonly occur in the dictionaries of these databases which proves that there is an abundance of meaningless terms in titles and abstracts. On the other hand, many synonyms and antonyms are absent in these lists. Proper list enlarging could contribute to more effective preparation of both printed reference publications and large databases thus ensuring a more economic information retrieval by practical users and scientometricians. The necessity of an improved, semantically oriented policy in preparing the lists of fullstop words and semi-stop words used in modern databases worldwide is emphasised. Journal editors should encourage authors to reduce stop-word usage in article titles and keyword sets.
Footnote
Vgl. auch unter: http://www.emeraldinsight.com/10.1108/EUM0000000007101.
Object
Current Contents
Science citation index

Similar documents (content)

  1. Witschel, H.F.: Global term weights in distributed environments (2008) 0.23
    0.2324515 = sum of:
      0.2324515 = product of:
        0.96854794 = sum of:
          0.024208028 = weight(abstract_txt:terms in 2096) [ClassicSimilarity], result of:
            0.024208028 = score(doc=2096,freq=2.0), product of:
              0.054182313 = queryWeight, product of:
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01339864 = queryNorm
              0.44678837 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
          0.070017815 = weight(abstract_txt:list in 2096) [ClassicSimilarity], result of:
            0.070017815 = score(doc=2096,freq=3.0), product of:
              0.09608595 = queryWeight, product of:
                1.331684 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.01339864 = queryNorm
              0.7286998 = fieldWeight in 2096, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
          0.059557796 = weight(abstract_txt:words in 2096) [ClassicSimilarity], result of:
            0.059557796 = score(doc=2096,freq=1.0), product of:
              0.14241338 = queryWeight, product of:
                1.9856023 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01339864 = queryNorm
              0.41820365 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
          0.087520756 = weight(abstract_txt:lists in 2096) [ClassicSimilarity], result of:
            0.087520756 = score(doc=2096,freq=1.0), product of:
              0.20260274 = queryWeight, product of:
                2.734695 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01339864 = queryNorm
              0.4319821 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
          0.103917204 = weight(abstract_txt:word in 2096) [ClassicSimilarity], result of:
            0.103917204 = score(doc=2096,freq=1.0), product of:
              0.24471818 = queryWeight, product of:
                3.3602715 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.01339864 = queryNorm
              0.4246403 = fieldWeight in 2096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
          0.62332636 = weight(abstract_txt:stop in 2096) [ClassicSimilarity], result of:
            0.62332636 = score(doc=2096,freq=2.0), product of:
              0.7499776 = queryWeight, product of:
                7.4408984 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01339864 = queryNorm
              0.83112663 = fieldWeight in 2096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2096)
        0.24 = coord(6/25)
    
  2. Pritchard, J.: Information retrieval : smarter indexing (1991) 0.23
    0.23124237 = sum of:
      0.23124237 = product of:
        1.1562119 = sum of:
          0.049405385 = weight(abstract_txt:full in 4890) [ClassicSimilarity], result of:
            0.049405385 = score(doc=4890,freq=1.0), product of:
              0.0802905 = queryWeight, product of:
                1.2173159 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.01339864 = queryNorm
              0.6153329 = fieldWeight in 4890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.125 = fieldNorm(doc=4890)
          0.09529247 = weight(abstract_txt:words in 4890) [ClassicSimilarity], result of:
            0.09529247 = score(doc=4890,freq=1.0), product of:
              0.14241338 = queryWeight, product of:
                1.9856023 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01339864 = queryNorm
              0.66912585 = fieldWeight in 4890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.125 = fieldNorm(doc=4890)
          0.14003322 = weight(abstract_txt:lists in 4890) [ClassicSimilarity], result of:
            0.14003322 = score(doc=4890,freq=1.0), product of:
              0.20260274 = queryWeight, product of:
                2.734695 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01339864 = queryNorm
              0.69117135 = fieldWeight in 4890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.125 = fieldNorm(doc=4890)
          0.16626751 = weight(abstract_txt:word in 4890) [ClassicSimilarity], result of:
            0.16626751 = score(doc=4890,freq=1.0), product of:
              0.24471818 = queryWeight, product of:
                3.3602715 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.01339864 = queryNorm
              0.67942446 = fieldWeight in 4890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.125 = fieldNorm(doc=4890)
          0.7052133 = weight(abstract_txt:stop in 4890) [ClassicSimilarity], result of:
            0.7052133 = score(doc=4890,freq=1.0), product of:
              0.7499776 = queryWeight, product of:
                7.4408984 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01339864 = queryNorm
              0.9403125 = fieldWeight in 4890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.125 = fieldNorm(doc=4890)
        0.2 = coord(5/25)
    
  3. O'Neill, E.T.; Kammerer, K.A.; Bennett, R.: ¬The aboutness of words (2017) 0.18
    0.17626755 = sum of:
      0.17626755 = product of:
        0.88133776 = sum of:
          0.04852247 = weight(abstract_txt:titles in 3835) [ClassicSimilarity], result of:
            0.04852247 = score(doc=3835,freq=1.0), product of:
              0.108523354 = queryWeight, product of:
                1.4152489 = boost
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.01339864 = queryNorm
              0.44711545 = fieldWeight in 3835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.078125 = fieldNorm(doc=3835)
          0.15757512 = weight(abstract_txt:words in 3835) [ClassicSimilarity], result of:
            0.15757512 = score(doc=3835,freq=7.0), product of:
              0.14241338 = queryWeight, product of:
                1.9856023 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01339864 = queryNorm
              1.1064628 = fieldWeight in 3835, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=3835)
          0.087520756 = weight(abstract_txt:lists in 3835) [ClassicSimilarity], result of:
            0.087520756 = score(doc=3835,freq=1.0), product of:
              0.20260274 = queryWeight, product of:
                2.734695 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01339864 = queryNorm
              0.4319821 = fieldWeight in 3835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.078125 = fieldNorm(doc=3835)
          0.14696111 = weight(abstract_txt:word in 3835) [ClassicSimilarity], result of:
            0.14696111 = score(doc=3835,freq=2.0), product of:
              0.24471818 = queryWeight, product of:
                3.3602715 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.01339864 = queryNorm
              0.60053205 = fieldWeight in 3835, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=3835)
          0.44075832 = weight(abstract_txt:stop in 3835) [ClassicSimilarity], result of:
            0.44075832 = score(doc=3835,freq=1.0), product of:
              0.7499776 = queryWeight, product of:
                7.4408984 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01339864 = queryNorm
              0.5876953 = fieldWeight in 3835, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=3835)
        0.2 = coord(5/25)
    
  4. Kim, W.; Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms (2001) 0.17
    0.17077503 = sum of:
      0.17077503 = product of:
        0.71156263 = sum of:
          0.020541191 = weight(abstract_txt:terms in 5188) [ClassicSimilarity], result of:
            0.020541191 = score(doc=5188,freq=4.0), product of:
              0.054182313 = queryWeight, product of:
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01339864 = queryNorm
              0.37911248 = fieldWeight in 5188, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.015904095 = weight(abstract_txt:first in 5188) [ClassicSimilarity], result of:
            0.015904095 = score(doc=5188,freq=2.0), product of:
              0.057560302 = queryWeight, product of:
                1.0307012 = boost
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.01339864 = queryNorm
              0.27630317 = fieldWeight in 5188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.06189428 = weight(abstract_txt:words in 5188) [ClassicSimilarity], result of:
            0.06189428 = score(doc=5188,freq=3.0), product of:
              0.14241338 = queryWeight, product of:
                1.9856023 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01339864 = queryNorm
              0.43461 = fieldWeight in 5188, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.074263826 = weight(abstract_txt:lists in 5188) [ClassicSimilarity], result of:
            0.074263826 = score(doc=5188,freq=2.0), product of:
              0.20260274 = queryWeight, product of:
                2.734695 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.01339864 = queryNorm
              0.36654896 = fieldWeight in 5188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.16496342 = weight(abstract_txt:word in 5188) [ClassicSimilarity], result of:
            0.16496342 = score(doc=5188,freq=7.0), product of:
              0.24471818 = queryWeight, product of:
                3.3602715 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.01339864 = queryNorm
              0.6740955 = fieldWeight in 5188, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
          0.3739958 = weight(abstract_txt:stop in 5188) [ClassicSimilarity], result of:
            0.3739958 = score(doc=5188,freq=2.0), product of:
              0.7499776 = queryWeight, product of:
                7.4408984 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01339864 = queryNorm
              0.498676 = fieldWeight in 5188, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.046875 = fieldNorm(doc=5188)
        0.24 = coord(6/25)
    
  5. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.15
    0.15450877 = sum of:
      0.15450877 = product of:
        0.64378655 = sum of:
          0.019366423 = weight(abstract_txt:terms in 604) [ClassicSimilarity], result of:
            0.019366423 = score(doc=604,freq=2.0), product of:
              0.054182313 = queryWeight, product of:
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01339864 = queryNorm
              0.3574307 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.021205457 = weight(abstract_txt:first in 604) [ClassicSimilarity], result of:
            0.021205457 = score(doc=604,freq=2.0), product of:
              0.057560302 = queryWeight, product of:
                1.0307012 = boost
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.01339864 = queryNorm
              0.3684042 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.037746694 = weight(abstract_txt:databases in 604) [ClassicSimilarity], result of:
            0.037746694 = score(doc=604,freq=2.0), product of:
              0.09677749 = queryWeight, product of:
                1.6368318 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.01339864 = queryNorm
              0.3900359 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.09529247 = weight(abstract_txt:words in 604) [ClassicSimilarity], result of:
            0.09529247 = score(doc=604,freq=4.0), product of:
              0.14241338 = queryWeight, product of:
                1.9856023 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01339864 = queryNorm
              0.66912585 = fieldWeight in 604, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.11756889 = weight(abstract_txt:word in 604) [ClassicSimilarity], result of:
            0.11756889 = score(doc=604,freq=2.0), product of:
              0.24471818 = queryWeight, product of:
                3.3602715 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.01339864 = queryNorm
              0.48042563 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.35260665 = weight(abstract_txt:stop in 604) [ClassicSimilarity], result of:
            0.35260665 = score(doc=604,freq=1.0), product of:
              0.7499776 = queryWeight, product of:
                7.4408984 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01339864 = queryNorm
              0.47015625 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
        0.24 = coord(6/25)