Document (#28562)

Author
Price, L.
Thelwall, M.
Title
¬The clustering power of low frequency words in academic webs
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.883-888
Year
2005
Series
Brief communication
Abstract
The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.

Similar documents (author)

  1. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 5.42
    5.4207625 = sum of:
      5.4207625 = sum of:
        1.9467672 = weight(author_txt:thelwall in 5896) [ClassicSimilarity], result of:
          1.9467672 = score(doc=5896,freq=1.0), product of:
            0.5621456 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.08116216 = queryNorm
            3.4631014 = fieldWeight in 5896, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.5 = fieldNorm(doc=5896)
        3.4739952 = weight(author_txt:price in 5896) [ClassicSimilarity], result of:
          3.4739952 = score(doc=5896,freq=1.0), product of:
            0.8270383 = queryWeight, product of:
              1.2129375 = boost
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.08116216 = queryNorm
            4.2005253 = fieldWeight in 5896, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.5 = fieldNorm(doc=5896)
    
  2. Harries, G.; Wilkinson, D.; Price, L.; Fairclough, R.; Thelwall, M.: Hyperlinks as a data source for science mapping : making sense of it all (2005) 3.39
    3.3879764 = sum of:
      3.3879764 = sum of:
        1.2167294 = weight(author_txt:thelwall in 4654) [ClassicSimilarity], result of:
          1.2167294 = score(doc=4654,freq=1.0), product of:
            0.5621456 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.08116216 = queryNorm
            2.1644382 = fieldWeight in 4654, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.3125 = fieldNorm(doc=4654)
        2.171247 = weight(author_txt:price in 4654) [ClassicSimilarity], result of:
          2.171247 = score(doc=4654,freq=1.0), product of:
            0.8270383 = queryWeight, product of:
              1.2129375 = boost
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.08116216 = queryNorm
            2.6253283 = fieldWeight in 4654, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.3125 = fieldNorm(doc=4654)
    
  3. Thelwall, M.; Binns, R.; Harries, G.; Page-Kennedy, T.; Price, L.; Wilkinson, D.: Custom interfaces for advanced queries in search engines (2001) 2.71
    2.7103813 = sum of:
      2.7103813 = sum of:
        0.9733836 = weight(author_txt:thelwall in 697) [ClassicSimilarity], result of:
          0.9733836 = score(doc=697,freq=1.0), product of:
            0.5621456 = queryWeight, product of:
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.08116216 = queryNorm
            1.7315507 = fieldWeight in 697, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.926203 = idf(docFreq=117, maxDocs=44218)
              0.25 = fieldNorm(doc=697)
        1.7369976 = weight(author_txt:price in 697) [ClassicSimilarity], result of:
          1.7369976 = score(doc=697,freq=1.0), product of:
            0.8270383 = queryWeight, product of:
              1.2129375 = boost
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.08116216 = queryNorm
            2.1002626 = fieldWeight in 697, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.401051 = idf(docFreq=26, maxDocs=44218)
              0.25 = fieldNorm(doc=697)
    
  4. Price, B.J.: ¬A talking terminal for the blind (1985) 2.17
    2.171247 = sum of:
      2.171247 = product of:
        4.342494 = sum of:
          4.342494 = weight(author_txt:price in 2152) [ClassicSimilarity], result of:
            4.342494 = score(doc=2152,freq=1.0), product of:
              0.8270383 = queryWeight, product of:
                1.2129375 = boost
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.08116216 = queryNorm
              5.2506566 = fieldWeight in 2152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.625 = fieldNorm(doc=2152)
        0.5 = coord(1/2)
    
  5. Price, M.S.: ¬The National Union Catalog programme (1987) 2.17
    2.171247 = sum of:
      2.171247 = product of:
        4.342494 = sum of:
          4.342494 = weight(author_txt:price in 2469) [ClassicSimilarity], result of:
            4.342494 = score(doc=2469,freq=1.0), product of:
              0.8270383 = queryWeight, product of:
                1.2129375 = boost
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.08116216 = queryNorm
              5.2506566 = fieldWeight in 2469, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.625 = fieldNorm(doc=2469)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.37
    0.37343654 = sum of:
      0.37343654 = product of:
        1.333702 = sum of:
          0.10829281 = weight(abstract_txt:zealand in 3463) [ClassicSimilarity], result of:
            0.10829281 = score(doc=3463,freq=1.0), product of:
              0.17084742 = queryWeight, product of:
                1.4201711 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.014827453 = queryNorm
              0.6338569 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.11059773 = weight(abstract_txt:webs in 3463) [ClassicSimilarity], result of:
            0.11059773 = score(doc=3463,freq=1.0), product of:
              0.17326313 = queryWeight, product of:
                1.4301763 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.014827453 = queryNorm
              0.63832235 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.13544407 = weight(abstract_txt:sites in 3463) [ClassicSimilarity], result of:
            0.13544407 = score(doc=3463,freq=2.0), product of:
              0.22702783 = queryWeight, product of:
                2.8355458 = boost
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.014827453 = queryNorm
              0.5965968 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.11759418 = weight(abstract_txt:academic in 3463) [ClassicSimilarity], result of:
            0.11759418 = score(doc=3463,freq=2.0), product of:
              0.22740982 = queryWeight, product of:
                3.2769597 = boost
                4.6802773 = idf(docFreq=1114, maxDocs=44218)
                0.014827453 = queryNorm
              0.5171025 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6802773 = idf(docFreq=1114, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.24881668 = weight(abstract_txt:words in 3463) [ClassicSimilarity], result of:
            0.24881668 = score(doc=3463,freq=4.0), product of:
              0.29748267 = queryWeight, product of:
                3.7479804 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014827453 = queryNorm
              0.8364073 = fieldWeight in 3463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.36942714 = weight(abstract_txt:frequency in 3463) [ClassicSimilarity], result of:
            0.36942714 = score(doc=3463,freq=3.0), product of:
              0.45903322 = queryWeight, product of:
                5.2052736 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014827453 = queryNorm
              0.8047939 = fieldWeight in 3463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.24352928 = weight(abstract_txt:clustering in 3463) [ClassicSimilarity], result of:
            0.24352928 = score(doc=3463,freq=1.0), product of:
              0.50145596 = queryWeight, product of:
                5.4404883 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.014827453 = queryNorm
              0.4856444 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
        0.28 = coord(7/25)
    
  2. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.120275326 = sum of:
      0.120275326 = product of:
        0.7517208 = sum of:
          0.0522216 = weight(abstract_txt:relative in 5206) [ClassicSimilarity], result of:
            0.0522216 = score(doc=5206,freq=2.0), product of:
              0.096762136 = queryWeight, product of:
                1.0687822 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.014827453 = queryNorm
              0.53969043 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.019376498 = weight(abstract_txt:results in 5206) [ClassicSimilarity], result of:
            0.019376498 = score(doc=5206,freq=2.0), product of:
              0.062950455 = queryWeight, product of:
                1.2191325 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.014827453 = queryNorm
              0.30780554 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.29858002 = weight(abstract_txt:words in 5206) [ClassicSimilarity], result of:
            0.29858002 = score(doc=5206,freq=9.0), product of:
              0.29748267 = queryWeight, product of:
                3.7479804 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014827453 = queryNorm
              1.0036888 = fieldWeight in 5206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.38154268 = weight(abstract_txt:frequency in 5206) [ClassicSimilarity], result of:
            0.38154268 = score(doc=5206,freq=5.0), product of:
              0.45903322 = queryWeight, product of:
                5.2052736 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014827453 = queryNorm
              0.83118755 = fieldWeight in 5206, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.16 = coord(4/25)
    
  3. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.11
    0.11102188 = sum of:
      0.11102188 = product of:
        0.69388676 = sum of:
          0.013701254 = weight(abstract_txt:results in 5045) [ClassicSimilarity], result of:
            0.013701254 = score(doc=5045,freq=1.0), product of:
              0.062950455 = queryWeight, product of:
                1.2191325 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.014827453 = queryNorm
              0.21765138 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.31473097 = weight(abstract_txt:words in 5045) [ClassicSimilarity], result of:
            0.31473097 = score(doc=5045,freq=10.0), product of:
              0.29748267 = queryWeight, product of:
                3.7479804 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014827453 = queryNorm
              1.0579809 = fieldWeight in 5045, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.17063108 = weight(abstract_txt:frequency in 5045) [ClassicSimilarity], result of:
            0.17063108 = score(doc=5045,freq=1.0), product of:
              0.45903322 = queryWeight, product of:
                5.2052736 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014827453 = queryNorm
              0.37171838 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.19482343 = weight(abstract_txt:clustering in 5045) [ClassicSimilarity], result of:
            0.19482343 = score(doc=5045,freq=1.0), product of:
              0.50145596 = queryWeight, product of:
                5.4404883 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.014827453 = queryNorm
              0.38851553 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
        0.16 = coord(4/25)
    
  4. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.10
    0.10109057 = sum of:
      0.10109057 = product of:
        0.5054529 = sum of:
          0.041855864 = weight(abstract_txt:average in 1681) [ClassicSimilarity], result of:
            0.041855864 = score(doc=1681,freq=1.0), product of:
              0.09065245 = queryWeight, product of:
                1.0344899 = boost
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.014827453 = queryNorm
              0.46171796 = fieldWeight in 1681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.078125 = fieldNorm(doc=1681)
          0.10829281 = weight(abstract_txt:zealand in 1681) [ClassicSimilarity], result of:
            0.10829281 = score(doc=1681,freq=1.0), product of:
              0.17084742 = queryWeight, product of:
                1.4201711 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.014827453 = queryNorm
              0.6338569 = fieldWeight in 1681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.078125 = fieldNorm(doc=1681)
          0.11059773 = weight(abstract_txt:webs in 1681) [ClassicSimilarity], result of:
            0.11059773 = score(doc=1681,freq=1.0), product of:
              0.17326313 = queryWeight, product of:
                1.4301763 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.014827453 = queryNorm
              0.63832235 = fieldWeight in 1681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.078125 = fieldNorm(doc=1681)
          0.109262355 = weight(abstract_txt:power in 1681) [ClassicSimilarity], result of:
            0.109262355 = score(doc=1681,freq=2.0), product of:
              0.17186563 = queryWeight, product of:
                2.0144014 = boost
                5.754088 = idf(docFreq=380, maxDocs=44218)
                0.014827453 = queryNorm
              0.6357429 = fieldWeight in 1681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.754088 = idf(docFreq=380, maxDocs=44218)
                0.078125 = fieldNorm(doc=1681)
          0.13544407 = weight(abstract_txt:sites in 1681) [ClassicSimilarity], result of:
            0.13544407 = score(doc=1681,freq=2.0), product of:
              0.22702783 = queryWeight, product of:
                2.8355458 = boost
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.014827453 = queryNorm
              0.5965968 = fieldWeight in 1681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.078125 = fieldNorm(doc=1681)
        0.2 = coord(5/25)
    
  5. Park, G.; Baek, Y.; Lee, H.-K.: Re-ranking algorithm using post-retrieval clustering for content-based image retrieval (2005) 0.10
    0.09876094 = sum of:
      0.09876094 = product of:
        0.61725587 = sum of:
          0.047354504 = weight(abstract_txt:average in 1005) [ClassicSimilarity], result of:
            0.047354504 = score(doc=1005,freq=2.0), product of:
              0.09065245 = queryWeight, product of:
                1.0344899 = boost
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.014827453 = queryNorm
              0.5223742 = fieldWeight in 1005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.0625 = fieldNorm(doc=1005)
          0.027402507 = weight(abstract_txt:results in 1005) [ClassicSimilarity], result of:
            0.027402507 = score(doc=1005,freq=4.0), product of:
              0.062950455 = queryWeight, product of:
                1.2191325 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.014827453 = queryNorm
              0.43530276 = fieldWeight in 1005, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=1005)
          0.10686039 = weight(abstract_txt:dissimilar in 1005) [ClassicSimilarity], result of:
            0.10686039 = score(doc=1005,freq=1.0), product of:
              0.19649878 = queryWeight, product of:
                1.5230579 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.014827453 = queryNorm
              0.54382217 = fieldWeight in 1005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1005)
          0.43563846 = weight(abstract_txt:clustering in 1005) [ClassicSimilarity], result of:
            0.43563846 = score(doc=1005,freq=5.0), product of:
              0.50145596 = queryWeight, product of:
                5.4404883 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.014827453 = queryNorm
              0.8687472 = fieldWeight in 1005, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=1005)
        0.16 = coord(4/25)