Document (#28563)

Author
Price, L.
Thelwall, M.
Title
¬The clustering power of low frequency words in academic webs
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.883-888
Year
2005
Series
Brief communication
Abstract
The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.

Similar documents (author)

  1. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 5.42
    5.4249363 = sum of:
      5.4249363 = sum of:
        1.9694003 = weight(author_txt:thelwall in 897) [ClassicSimilarity], result of:
          1.9694003 = score(doc=897,freq=1.0), product of:
            0.5664754 = queryWeight, product of:
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.08147007 = queryNorm
            3.4765859 = fieldWeight in 897, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.5 = fieldNorm(doc=897)
        3.4555361 = weight(author_txt:price in 897) [ClassicSimilarity], result of:
          3.4555361 = score(doc=897,freq=1.0), product of:
            0.8240787 = queryWeight, product of:
              1.2061292 = boost
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.08147007 = queryNorm
            4.1932116 = fieldWeight in 897, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.5 = fieldNorm(doc=897)
    
  2. Harries, G.; Wilkinson, D.; Price, L.; Fairclough, R.; Thelwall, M.: Hyperlinks as a data source for science mapping : making sense of it all (2005) 3.39
    3.3905854 = sum of:
      3.3905854 = sum of:
        1.2308751 = weight(author_txt:thelwall in 655) [ClassicSimilarity], result of:
          1.2308751 = score(doc=655,freq=1.0), product of:
            0.5664754 = queryWeight, product of:
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.08147007 = queryNorm
            2.172866 = fieldWeight in 655, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.3125 = fieldNorm(doc=655)
        2.1597102 = weight(author_txt:price in 655) [ClassicSimilarity], result of:
          2.1597102 = score(doc=655,freq=1.0), product of:
            0.8240787 = queryWeight, product of:
              1.2061292 = boost
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.08147007 = queryNorm
            2.620757 = fieldWeight in 655, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.3125 = fieldNorm(doc=655)
    
  3. Thelwall, M.; Binns, R.; Harries, G.; Page-Kennedy, T.; Price, L.; Wilkinson, D.: Custom interfaces for advanced queries in search engines (2001) 2.71
    2.7124681 = sum of:
      2.7124681 = sum of:
        0.98470014 = weight(author_txt:thelwall in 1823) [ClassicSimilarity], result of:
          0.98470014 = score(doc=1823,freq=1.0), product of:
            0.5664754 = queryWeight, product of:
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.08147007 = queryNorm
            1.7382929 = fieldWeight in 1823, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9531717 = idf(docFreq=108, maxDocs=41962)
              0.25 = fieldNorm(doc=1823)
        1.7277681 = weight(author_txt:price in 1823) [ClassicSimilarity], result of:
          1.7277681 = score(doc=1823,freq=1.0), product of:
            0.8240787 = queryWeight, product of:
              1.2061292 = boost
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.08147007 = queryNorm
            2.0966058 = fieldWeight in 1823, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.386423 = idf(docFreq=25, maxDocs=41962)
              0.25 = fieldNorm(doc=1823)
    
  4. Price, B.J.: ¬A talking terminal for the blind (1985) 2.16
    2.1597102 = sum of:
      2.1597102 = product of:
        4.3194203 = sum of:
          4.3194203 = weight(author_txt:price in 2152) [ClassicSimilarity], result of:
            4.3194203 = score(doc=2152,freq=1.0), product of:
              0.8240787 = queryWeight, product of:
                1.2061292 = boost
                8.386423 = idf(docFreq=25, maxDocs=41962)
                0.08147007 = queryNorm
              5.241514 = fieldWeight in 2152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.386423 = idf(docFreq=25, maxDocs=41962)
                0.625 = fieldNorm(doc=2152)
        0.5 = coord(1/2)
    
  5. Price, M.S.: ¬The National Union Catalog programme (1987) 2.16
    2.1597102 = sum of:
      2.1597102 = product of:
        4.3194203 = sum of:
          4.3194203 = weight(author_txt:price in 2538) [ClassicSimilarity], result of:
            4.3194203 = score(doc=2538,freq=1.0), product of:
              0.8240787 = queryWeight, product of:
                1.2061292 = boost
                8.386423 = idf(docFreq=25, maxDocs=41962)
                0.08147007 = queryNorm
              5.241514 = fieldWeight in 2538, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.386423 = idf(docFreq=25, maxDocs=41962)
                0.625 = fieldNorm(doc=2538)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.37
    0.37396714 = sum of:
      0.37396714 = product of:
        1.3355969 = sum of:
          0.10863252 = weight(abstract_txt:zealand in 4464) [ClassicSimilarity], result of:
            0.10863252 = score(doc=4464,freq=1.0), product of:
              0.17065467 = queryWeight, product of:
                1.4270658 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.0146765 = queryNorm
              0.6365634 = fieldWeight in 4464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.10863252 = weight(abstract_txt:webs in 4464) [ClassicSimilarity], result of:
            0.10863252 = score(doc=4464,freq=1.0), product of:
              0.17065467 = queryWeight, product of:
                1.4270658 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.0146765 = queryNorm
              0.6365634 = fieldWeight in 4464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.13491468 = weight(abstract_txt:sites in 4464) [ClassicSimilarity], result of:
            0.13491468 = score(doc=4464,freq=2.0), product of:
              0.22570862 = queryWeight, product of:
                2.8426254 = boost
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.0146765 = queryNorm
              0.59773827 = fieldWeight in 4464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.12088596 = weight(abstract_txt:academic in 4464) [ClassicSimilarity], result of:
            0.12088596 = score(doc=4464,freq=2.0), product of:
              0.2308902 = queryWeight, product of:
                3.319844 = boost
                4.7387667 = idf(docFreq=997, maxDocs=41962)
                0.0146765 = queryNorm
              0.5235647 = fieldWeight in 4464, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7387667 = idf(docFreq=997, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.24900967 = weight(abstract_txt:words in 4464) [ClassicSimilarity], result of:
            0.24900967 = score(doc=4464,freq=4.0), product of:
              0.29668054 = queryWeight, product of:
                3.7632186 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0146765 = queryNorm
              0.83931917 = fieldWeight in 4464, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.3709117 = weight(abstract_txt:frequency in 4464) [ClassicSimilarity], result of:
            0.3709117 = score(doc=4464,freq=3.0), product of:
              0.45878395 = queryWeight, product of:
                5.232076 = boost
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.0146765 = queryNorm
              0.80846703 = fieldWeight in 4464, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
          0.24260987 = weight(abstract_txt:clustering in 4464) [ClassicSimilarity], result of:
            0.24260987 = score(doc=4464,freq=1.0), product of:
              0.4985866 = queryWeight, product of:
                5.454315 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.0146765 = queryNorm
              0.48659527 = fieldWeight in 4464, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.078125 = fieldNorm(doc=4464)
        0.28 = coord(7/25)
    
  2. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.12057846 = sum of:
      0.12057846 = product of:
        0.7536154 = sum of:
          0.05185307 = weight(abstract_txt:relative in 207) [ClassicSimilarity], result of:
            0.05185307 = score(doc=207,freq=2.0), product of:
              0.09599706 = queryWeight, product of:
                1.0703206 = boost
                6.11113 = idf(docFreq=252, maxDocs=41962)
                0.0146765 = queryNorm
              0.54015267 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.11113 = idf(docFreq=252, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.019874774 = weight(abstract_txt:results in 207) [ClassicSimilarity], result of:
            0.019874774 = score(doc=207,freq=2.0), product of:
              0.06381945 = queryWeight, product of:
                1.2341743 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0146765 = queryNorm
              0.31142187 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.29881158 = weight(abstract_txt:words in 207) [ClassicSimilarity], result of:
            0.29881158 = score(doc=207,freq=9.0), product of:
              0.29668054 = queryWeight, product of:
                3.7632186 = boost
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0146765 = queryNorm
              1.007183 = fieldWeight in 207, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.3716426 = idf(docFreq=529, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
          0.38307598 = weight(abstract_txt:frequency in 207) [ClassicSimilarity], result of:
            0.38307598 = score(doc=207,freq=5.0), product of:
              0.45878395 = queryWeight, product of:
                5.232076 = boost
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.0146765 = queryNorm
              0.8349812 = fieldWeight in 207, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.0625 = fieldNorm(doc=207)
        0.16 = coord(4/25)
    
  3. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.10
    0.1011724 = sum of:
      0.1011724 = product of:
        0.505862 = sum of:
          0.041963127 = weight(abstract_txt:average in 2682) [ClassicSimilarity], result of:
            0.041963127 = score(doc=2682,freq=1.0), product of:
              0.09051561 = queryWeight, product of:
                1.0393138 = boost
                5.9340925 = idf(docFreq=301, maxDocs=41962)
                0.0146765 = queryNorm
              0.463601 = fieldWeight in 2682, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9340925 = idf(docFreq=301, maxDocs=41962)
                0.078125 = fieldNorm(doc=2682)
          0.10863252 = weight(abstract_txt:zealand in 2682) [ClassicSimilarity], result of:
            0.10863252 = score(doc=2682,freq=1.0), product of:
              0.17065467 = queryWeight, product of:
                1.4270658 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.0146765 = queryNorm
              0.6365634 = fieldWeight in 2682, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.078125 = fieldNorm(doc=2682)
          0.10863252 = weight(abstract_txt:webs in 2682) [ClassicSimilarity], result of:
            0.10863252 = score(doc=2682,freq=1.0), product of:
              0.17065467 = queryWeight, product of:
                1.4270658 = boost
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.0146765 = queryNorm
              0.6365634 = fieldWeight in 2682, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148012 = idf(docFreq=32, maxDocs=41962)
                0.078125 = fieldNorm(doc=2682)
          0.11171916 = weight(abstract_txt:power in 2682) [ClassicSimilarity], result of:
            0.11171916 = score(doc=2682,freq=2.0), product of:
              0.17387216 = queryWeight, product of:
                2.0371122 = boost
                5.815574 = idf(docFreq=339, maxDocs=41962)
                0.0146765 = queryNorm
              0.6425362 = fieldWeight in 2682, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.815574 = idf(docFreq=339, maxDocs=41962)
                0.078125 = fieldNorm(doc=2682)
          0.13491468 = weight(abstract_txt:sites in 2682) [ClassicSimilarity], result of:
            0.13491468 = score(doc=2682,freq=2.0), product of:
              0.22570862 = queryWeight, product of:
                2.8426254 = boost
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.0146765 = queryNorm
              0.59773827 = fieldWeight in 2682, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.078125 = fieldNorm(doc=2682)
        0.2 = coord(5/25)
    
  4. Park, G.; Baek, Y.; Lee, H.-K.: Re-ranking algorithm using post-retrieval clustering for content-based image retrieval (2005) 0.10
    0.09845947 = sum of:
      0.09845947 = product of:
        0.6153717 = sum of:
          0.047475856 = weight(abstract_txt:average in 3006) [ClassicSimilarity], result of:
            0.047475856 = score(doc=3006,freq=2.0), product of:
              0.09051561 = queryWeight, product of:
                1.0393138 = boost
                5.9340925 = idf(docFreq=301, maxDocs=41962)
                0.0146765 = queryNorm
              0.5245046 = fieldWeight in 3006, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9340925 = idf(docFreq=301, maxDocs=41962)
                0.0625 = fieldNorm(doc=3006)
          0.028107176 = weight(abstract_txt:results in 3006) [ClassicSimilarity], result of:
            0.028107176 = score(doc=3006,freq=4.0), product of:
              0.06381945 = queryWeight, product of:
                1.2341743 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0146765 = queryNorm
              0.44041705 = fieldWeight in 3006, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0625 = fieldNorm(doc=3006)
          0.10579491 = weight(abstract_txt:dissimilar in 3006) [ClassicSimilarity], result of:
            0.10579491 = score(doc=3006,freq=1.0), product of:
              0.19456354 = queryWeight, product of:
                1.5237567 = boost
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.0146765 = queryNorm
              0.54375505 = fieldWeight in 3006, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.700081 = idf(docFreq=18, maxDocs=41962)
                0.0625 = fieldNorm(doc=3006)
          0.43399373 = weight(abstract_txt:clustering in 3006) [ClassicSimilarity], result of:
            0.43399373 = score(doc=3006,freq=5.0), product of:
              0.4985866 = queryWeight, product of:
                5.454315 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.0146765 = queryNorm
              0.87044805 = fieldWeight in 3006, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.0625 = fieldNorm(doc=3006)
        0.16 = coord(4/25)
    
  5. Bane, A.F.; Milheim, W.D.: Internet insights : how academics are using the Internet (1995) 0.10
    0.09624835 = sum of:
      0.09624835 = product of:
        0.6015522 = sum of:
          0.021080382 = weight(abstract_txt:results in 2276) [ClassicSimilarity], result of:
            0.021080382 = score(doc=2276,freq=1.0), product of:
              0.06381945 = queryWeight, product of:
                1.2341743 = boost
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.0146765 = queryNorm
              0.3303128 = fieldWeight in 2276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5233364 = idf(docFreq=3364, maxDocs=41962)
                0.09375 = fieldNorm(doc=2276)
          0.1144789 = weight(abstract_txt:sites in 2276) [ClassicSimilarity], result of:
            0.1144789 = score(doc=2276,freq=1.0), product of:
              0.22570862 = queryWeight, product of:
                2.8426254 = boost
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.0146765 = queryNorm
              0.50719774 = fieldWeight in 2276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.410109 = idf(docFreq=509, maxDocs=41962)
                0.09375 = fieldNorm(doc=2276)
          0.10257514 = weight(abstract_txt:academic in 2276) [ClassicSimilarity], result of:
            0.10257514 = score(doc=2276,freq=1.0), product of:
              0.2308902 = queryWeight, product of:
                3.319844 = boost
                4.7387667 = idf(docFreq=997, maxDocs=41962)
                0.0146765 = queryNorm
              0.44425938 = fieldWeight in 2276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7387667 = idf(docFreq=997, maxDocs=41962)
                0.09375 = fieldNorm(doc=2276)
          0.36341777 = weight(abstract_txt:frequency in 2276) [ClassicSimilarity], result of:
            0.36341777 = score(doc=2276,freq=2.0), product of:
              0.45878395 = queryWeight, product of:
                5.232076 = boost
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.0146765 = queryNorm
              0.7921327 = fieldWeight in 2276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.974639 = idf(docFreq=289, maxDocs=41962)
                0.09375 = fieldNorm(doc=2276)
        0.16 = coord(4/25)