Document (#20674)

Author
Wolfekuhler, M.R.
Punch, W.F.
Title
Finding salient features for personal Web pages categories
Source
Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
Year
1997
Abstract
Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Footnote
Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Theme
Internet
Automatisches Indexieren
Metadaten

Similar documents (content)

  1. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.18
    0.17587586 = sum of:
      0.17587586 = product of:
        0.62812805 = sum of:
          0.055742312 = weight(abstract_txt:accuracy in 4462) [ClassicSimilarity], result of:
            0.055742312 = score(doc=4462,freq=1.0), product of:
              0.14939214 = queryWeight, product of:
                1.1513743 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.021733716 = queryNorm
              0.37312746 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.125481 = weight(abstract_txt:clusters in 4462) [ClassicSimilarity], result of:
            0.125481 = score(doc=4462,freq=3.0), product of:
              0.17791641 = queryWeight, product of:
                1.2564948 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.021733716 = queryNorm
              0.70528066 = fieldWeight in 4462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.08736528 = weight(abstract_txt:extracting in 4462) [ClassicSimilarity], result of:
            0.08736528 = score(doc=4462,freq=1.0), product of:
              0.20157206 = queryWeight, product of:
                1.3374201 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.021733716 = queryNorm
              0.4334196 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.018108882 = weight(abstract_txt:that in 4462) [ClassicSimilarity], result of:
            0.018108882 = score(doc=4462,freq=3.0), product of:
              0.07059905 = queryWeight, product of:
                1.370922 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021733716 = queryNorm
              0.2565032 = fieldWeight in 4462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.04870064 = weight(abstract_txt:techniques in 4462) [ClassicSimilarity], result of:
            0.04870064 = score(doc=4462,freq=1.0), product of:
              0.17201681 = queryWeight, product of:
                1.7472422 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.021733716 = queryNorm
              0.2831156 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.14572614 = weight(abstract_txt:word in 4462) [ClassicSimilarity], result of:
            0.14572614 = score(doc=4462,freq=3.0), product of:
              0.2476656 = queryWeight, product of:
                2.0965273 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021733716 = queryNorm
              0.5883988 = fieldWeight in 4462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.14700384 = weight(abstract_txt:features in 4462) [ClassicSimilarity], result of:
            0.14700384 = score(doc=4462,freq=4.0), product of:
              0.259086 = queryWeight, product of:
                2.6262453 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.021733716 = queryNorm
              0.56739396 = fieldWeight in 4462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
        0.28 = coord(7/25)
    
  2. Sebastian, Y.: Literature-based discovery by learning heterogeneous bibliographic information networks (2017) 0.16
    0.1617761 = sum of:
      0.1617761 = product of:
        0.5055503 = sum of:
          0.037564524 = weight(abstract_txt:finding in 535) [ClassicSimilarity], result of:
            0.037564524 = score(doc=535,freq=1.0), product of:
              0.12552136 = queryWeight, product of:
                1.0553864 = boost
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.021733716 = queryNorm
              0.29926798 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.040037338 = weight(abstract_txt:technique in 535) [ClassicSimilarity], result of:
            0.040037338 = score(doc=535,freq=1.0), product of:
              0.13097121 = queryWeight, product of:
                1.0780542 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.021733716 = queryNorm
              0.3056957 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.04877452 = weight(abstract_txt:accuracy in 535) [ClassicSimilarity], result of:
            0.04877452 = score(doc=535,freq=1.0), product of:
              0.14939214 = queryWeight, product of:
                1.1513743 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.021733716 = queryNorm
              0.32648653 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.06339068 = weight(abstract_txt:clusters in 535) [ClassicSimilarity], result of:
            0.06339068 = score(doc=535,freq=1.0), product of:
              0.17791641 = queryWeight, product of:
                1.2564948 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.021733716 = queryNorm
              0.35629475 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.077898085 = weight(abstract_txt:necessarily in 535) [ClassicSimilarity], result of:
            0.077898085 = score(doc=535,freq=1.0), product of:
              0.20411907 = queryWeight, product of:
                1.3458432 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.021733716 = queryNorm
              0.3816306 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.020456158 = weight(abstract_txt:that in 535) [ClassicSimilarity], result of:
            0.020456158 = score(doc=535,freq=5.0), product of:
              0.07059905 = queryWeight, product of:
                1.370922 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021733716 = queryNorm
              0.28975117 = fieldWeight in 535, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.07361816 = weight(abstract_txt:word in 535) [ClassicSimilarity], result of:
            0.07361816 = score(doc=535,freq=1.0), product of:
              0.2476656 = queryWeight, product of:
                2.0965273 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021733716 = queryNorm
              0.2972482 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.14381088 = weight(abstract_txt:features in 535) [ClassicSimilarity], result of:
            0.14381088 = score(doc=535,freq=5.0), product of:
              0.259086 = queryWeight, product of:
                2.6262453 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.021733716 = queryNorm
              0.55507004 = fieldWeight in 535, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
        0.32 = coord(8/25)
    
  3. Lin, Y.-R.; Margolin, D.; Lazer, D.: Uncovering social semantics from textual traces : a theory-driven approach and evidence from public statements of U.S. Members of Congress (2016) 0.15
    0.15412445 = sum of:
      0.15412445 = product of:
        0.4816389 = sum of:
          0.053663604 = weight(abstract_txt:finding in 3078) [ClassicSimilarity], result of:
            0.053663604 = score(doc=3078,freq=1.0), product of:
              0.12552136 = queryWeight, product of:
                1.0553864 = boost
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.021733716 = queryNorm
              0.42752567 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.057196196 = weight(abstract_txt:technique in 3078) [ClassicSimilarity], result of:
            0.057196196 = score(doc=3078,freq=1.0), product of:
              0.13097121 = queryWeight, product of:
                1.0780542 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.021733716 = queryNorm
              0.43670815 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.022982124 = weight(abstract_txt:which in 3078) [ClassicSimilarity], result of:
            0.022982124 = score(doc=3078,freq=2.0), product of:
              0.0713167 = queryWeight, product of:
                1.1250279 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.021733716 = queryNorm
              0.32225448 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.0184823 = weight(abstract_txt:that in 3078) [ClassicSimilarity], result of:
            0.0184823 = score(doc=3078,freq=2.0), product of:
              0.07059905 = queryWeight, product of:
                1.370922 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021733716 = queryNorm
              0.26179248 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.045845516 = weight(abstract_txt:documents in 3078) [ClassicSimilarity], result of:
            0.045845516 = score(doc=3078,freq=1.0), product of:
              0.14238764 = queryWeight, product of:
                1.5896585 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.021733716 = queryNorm
              0.32197678 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.060875803 = weight(abstract_txt:techniques in 3078) [ClassicSimilarity], result of:
            0.060875803 = score(doc=3078,freq=1.0), product of:
              0.17201681 = queryWeight, product of:
                1.7472422 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.021733716 = queryNorm
              0.3538945 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.092659116 = weight(abstract_txt:similar in 3078) [ClassicSimilarity], result of:
            0.092659116 = score(doc=3078,freq=1.0), product of:
              0.22761446 = queryWeight, product of:
                2.0098684 = boost
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.021733716 = queryNorm
              0.40708798 = fieldWeight in 3078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
          0.12993427 = weight(abstract_txt:features in 3078) [ClassicSimilarity], result of:
            0.12993427 = score(doc=3078,freq=2.0), product of:
              0.259086 = queryWeight, product of:
                2.6262453 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.021733716 = queryNorm
              0.50151014 = fieldWeight in 3078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=3078)
        0.32 = coord(8/25)
    
  4. Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.15
    0.15316993 = sum of:
      0.15316993 = product of:
        0.54703546 = sum of:
          0.06325516 = weight(abstract_txt:sets in 1048) [ClassicSimilarity], result of:
            0.06325516 = score(doc=1048,freq=3.0), product of:
              0.11269241 = queryWeight, product of:
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.021733716 = queryNorm
              0.5613081 = fieldWeight in 1048, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.185142 = idf(docFreq=672, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.022517793 = weight(abstract_txt:which in 1048) [ClassicSimilarity], result of:
            0.022517793 = score(doc=1048,freq=3.0), product of:
              0.0713167 = queryWeight, product of:
                1.1250279 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.021733716 = queryNorm
              0.31574363 = fieldWeight in 1048, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.010455168 = weight(abstract_txt:that in 1048) [ClassicSimilarity], result of:
            0.010455168 = score(doc=1048,freq=1.0), product of:
              0.07059905 = queryWeight, product of:
                1.370922 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021733716 = queryNorm
              0.1480922 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.11002923 = weight(abstract_txt:documents in 1048) [ClassicSimilarity], result of:
            0.11002923 = score(doc=1048,freq=9.0), product of:
              0.14238764 = queryWeight, product of:
                1.5896585 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.021733716 = queryNorm
              0.77274424 = fieldWeight in 1048, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.08413503 = weight(abstract_txt:word in 1048) [ClassicSimilarity], result of:
            0.08413503 = score(doc=1048,freq=1.0), product of:
              0.2476656 = queryWeight, product of:
                2.0965273 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.021733716 = queryNorm
              0.33971223 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.092287816 = weight(abstract_txt:pages in 1048) [ClassicSimilarity], result of:
            0.092287816 = score(doc=1048,freq=1.0), product of:
              0.26341712 = queryWeight, product of:
                2.1621692 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.021733716 = queryNorm
              0.3503486 = fieldWeight in 1048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
          0.1643553 = weight(abstract_txt:features in 1048) [ClassicSimilarity], result of:
            0.1643553 = score(doc=1048,freq=5.0), product of:
              0.259086 = queryWeight, product of:
                2.6262453 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.021733716 = queryNorm
              0.63436574 = fieldWeight in 1048, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1048)
        0.28 = coord(7/25)
    
  5. Scholer, F.; Williams, H.E.; Turpin, A.: Query association surrogates for Web search (2004) 0.14
    0.1417947 = sum of:
      0.1417947 = product of:
        0.50640965 = sum of:
          0.07589179 = weight(abstract_txt:finding in 2236) [ClassicSimilarity], result of:
            0.07589179 = score(doc=2236,freq=2.0), product of:
              0.12552136 = queryWeight, product of:
                1.0553864 = boost
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.021733716 = queryNorm
              0.6046126 = fieldWeight in 2236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.057196196 = weight(abstract_txt:technique in 2236) [ClassicSimilarity], result of:
            0.057196196 = score(doc=2236,freq=1.0), product of:
              0.13097121 = queryWeight, product of:
                1.0780542 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.021733716 = queryNorm
              0.43670815 = fieldWeight in 2236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.06967789 = weight(abstract_txt:accuracy in 2236) [ClassicSimilarity], result of:
            0.06967789 = score(doc=2236,freq=1.0), product of:
              0.14939214 = queryWeight, product of:
                1.1513743 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.021733716 = queryNorm
              0.46640933 = fieldWeight in 2236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.026137922 = weight(abstract_txt:that in 2236) [ClassicSimilarity], result of:
            0.026137922 = score(doc=2236,freq=4.0), product of:
              0.07059905 = queryWeight, product of:
                1.370922 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.021733716 = queryNorm
              0.3702305 = fieldWeight in 2236, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.07940675 = weight(abstract_txt:documents in 2236) [ClassicSimilarity], result of:
            0.07940675 = score(doc=2236,freq=3.0), product of:
              0.14238764 = queryWeight, product of:
                1.5896585 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.021733716 = queryNorm
              0.5576801 = fieldWeight in 2236, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.10543999 = weight(abstract_txt:techniques in 2236) [ClassicSimilarity], result of:
            0.10543999 = score(doc=2236,freq=3.0), product of:
              0.17201681 = queryWeight, product of:
                1.7472422 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.021733716 = queryNorm
              0.61296326 = fieldWeight in 2236, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
          0.092659116 = weight(abstract_txt:similar in 2236) [ClassicSimilarity], result of:
            0.092659116 = score(doc=2236,freq=1.0), product of:
              0.22761446 = queryWeight, product of:
                2.0098684 = boost
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.021733716 = queryNorm
              0.40708798 = fieldWeight in 2236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.078125 = fieldNorm(doc=2236)
        0.28 = coord(7/25)