Document (#30276)

Author
Sun, A.
Lim, E.-P.
Title
Web unit-based mining of homepage relationships
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.394-407
Year
2006
Abstract
Homepages usually describe important semantic information about conceptual or physical entities; hence, they are the main targets for searching and browsing. To facilitate semantic-based information retrieval (IR) at a Web site, homepages can be identified and classified under some predefined concepts and these concepts are then used in query or browsing criteria, e.g., finding professor homepages containing information retrieval. In some Web sites, relationships may also exist among homepages. These relationship instances (also known as homepage relationships) enrich our knowledge about these Web sites and allow more expressive semantic-based IR. In this article, we investigate the features to be used in mining homepage relationships. We systematically develop different classes of inter-homepage features, namely, navigation, relative-location, and common-item features. We also propose deriving for each homepage a set of support pages to obtain richer and more complete content about the entity described by the homepage. The homepage together with its support pages are known to be a Web unit. By extracting inter-homepage features from Web units, our experiments on the WebKB dataset show that better homepage relationship mining accuracies can be achieved.

Similar documents (content)

  1. Barjak, F.; Li, X.; Thelwall, M.: Which factors explain the Web impact of scientists' personal homepages? (2007) 0.37
    0.36837164 = sum of:
      0.36837164 = product of:
        1.315613 = sum of:
          0.036291886 = weight(abstract_txt:relationship in 2074) [ClassicSimilarity], result of:
            0.036291886 = score(doc=2074,freq=3.0), product of:
              0.07700156 = queryWeight, product of:
                1.3177359 = boost
                4.975782 = idf(docFreq=801, maxDocs=42740)
                0.0117438305 = queryNorm
              0.47131366 = fieldWeight in 2074, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.975782 = idf(docFreq=801, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.010308932 = weight(abstract_txt:also in 2074) [ClassicSimilarity], result of:
            0.010308932 = score(doc=2074,freq=1.0), product of:
              0.054933812 = queryWeight, product of:
                1.3631516 = boost
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.0117438305 = queryNorm
              0.18766095 = fieldWeight in 2074, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.029634146 = weight(abstract_txt:pages in 2074) [ClassicSimilarity], result of:
            0.029634146 = score(doc=2074,freq=1.0), product of:
              0.09702013 = queryWeight, product of:
                1.4791409 = boost
                5.5852485 = idf(docFreq=435, maxDocs=42740)
                0.0117438305 = queryNorm
              0.3054433 = fieldWeight in 2074, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5852485 = idf(docFreq=435, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.015696784 = weight(abstract_txt:about in 2074) [ClassicSimilarity], result of:
            0.015696784 = score(doc=2074,freq=1.0), product of:
              0.07270614 = queryWeight, product of:
                1.5682303 = boost
                3.947767 = idf(docFreq=2241, maxDocs=42740)
                0.0117438305 = queryNorm
              0.2158935 = fieldWeight in 2074, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.947767 = idf(docFreq=2241, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.05394912 = weight(abstract_txt:relationships in 2074) [ClassicSimilarity], result of:
            0.05394912 = score(doc=2074,freq=2.0), product of:
              0.14465158 = queryWeight, product of:
                2.5542018 = boost
                4.822344 = idf(docFreq=934, maxDocs=42740)
                0.0117438305 = queryNorm
              0.3729591 = fieldWeight in 2074, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.822344 = idf(docFreq=934, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.41425627 = weight(abstract_txt:homepages in 2074) [ClassicSimilarity], result of:
            0.41425627 = score(doc=2074,freq=5.0), product of:
              0.4148263 = queryWeight, product of:
                4.325405 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0117438305 = queryNorm
              0.9986259 = fieldWeight in 2074, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
          0.7554758 = weight(abstract_txt:homepage in 2074) [ClassicSimilarity], result of:
            0.7554758 = score(doc=2074,freq=6.0), product of:
              0.7635445 = queryWeight, product of:
                8.80242 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0117438305 = queryNorm
              0.9894326 = fieldWeight in 2074, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2074)
        0.28 = coord(7/25)
    
  2. Shakes, J.; Langheinrich, M.; Etzioni, O.: Dynamic Reference Sifting : a case study in the homepage domain (1997) 0.22
    0.21657509 = sum of:
      0.21657509 = product of:
        1.3535943 = sum of:
          0.09156289 = weight(abstract_txt:targets in 3678) [ClassicSimilarity], result of:
            0.09156289 = score(doc=3678,freq=2.0), product of:
              0.1022175 = queryWeight, product of:
                1.0735599 = boost
                8.107542 = idf(docFreq=34, maxDocs=42740)
                0.0117438305 = queryNorm
              0.8957653 = fieldWeight in 3678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.107542 = idf(docFreq=34, maxDocs=42740)
                0.078125 = fieldNorm(doc=3678)
          0.012155868 = weight(abstract_txt:these in 3678) [ClassicSimilarity], result of:
            0.012155868 = score(doc=3678,freq=1.0), product of:
              0.048337776 = queryWeight, product of:
                1.2786969 = boost
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.0117438305 = queryNorm
              0.2514776 = fieldWeight in 3678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.078125 = fieldNorm(doc=3678)
          0.26465863 = weight(abstract_txt:homepages in 3678) [ClassicSimilarity], result of:
            0.26465863 = score(doc=3678,freq=1.0), product of:
              0.4148263 = queryWeight, product of:
                4.325405 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0117438305 = queryNorm
              0.63799864 = fieldWeight in 3678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.078125 = fieldNorm(doc=3678)
          0.985217 = weight(abstract_txt:homepage in 3678) [ClassicSimilarity], result of:
            0.985217 = score(doc=3678,freq=5.0), product of:
              0.7635445 = queryWeight, product of:
                8.80242 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0117438305 = queryNorm
              1.2903203 = fieldWeight in 3678, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.078125 = fieldNorm(doc=3678)
        0.16 = coord(4/25)
    
  3. Ma, Y.; Diodato, V.: Icons as visual form of knowledge representation on the World Wide Web : a semiotic analysis (1999) 0.19
    0.18993846 = sum of:
      0.18993846 = product of:
        0.94969225 = sum of:
          0.009724694 = weight(abstract_txt:these in 675) [ClassicSimilarity], result of:
            0.009724694 = score(doc=675,freq=1.0), product of:
              0.048337776 = queryWeight, product of:
                1.2786969 = boost
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.0117438305 = queryNorm
              0.20118208 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.0625 = fieldNorm(doc=675)
          0.016661748 = weight(abstract_txt:also in 675) [ClassicSimilarity], result of:
            0.016661748 = score(doc=675,freq=2.0), product of:
              0.054933812 = queryWeight, product of:
                1.3631516 = boost
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.0117438305 = queryNorm
              0.3033059 = fieldWeight in 675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4315145 = idf(docFreq=3756, maxDocs=42740)
                0.0625 = fieldNorm(doc=675)
          0.05220098 = weight(abstract_txt:features in 675) [ClassicSimilarity], result of:
            0.05220098 = score(doc=675,freq=2.0), product of:
              0.12945676 = queryWeight, product of:
                2.4163287 = boost
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.0117438305 = queryNorm
              0.40323102 = fieldWeight in 675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.0625 = fieldNorm(doc=675)
          0.5186229 = weight(abstract_txt:homepages in 675) [ClassicSimilarity], result of:
            0.5186229 = score(doc=675,freq=6.0), product of:
              0.4148263 = queryWeight, product of:
                4.325405 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0117438305 = queryNorm
              1.250217 = fieldWeight in 675, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0625 = fieldNorm(doc=675)
          0.35248193 = weight(abstract_txt:homepage in 675) [ClassicSimilarity], result of:
            0.35248193 = score(doc=675,freq=1.0), product of:
              0.7635445 = queryWeight, product of:
                8.80242 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0117438305 = queryNorm
              0.46163902 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0625 = fieldNorm(doc=675)
        0.2 = coord(5/25)
    
  4. Fichtner, M.: Home, sweet home (1996) 0.16
    0.15797848 = sum of:
      0.15797848 = product of:
        1.974731 = sum of:
          0.74104416 = weight(abstract_txt:homepages in 3559) [ClassicSimilarity], result of:
            0.74104416 = score(doc=3559,freq=1.0), product of:
              0.4148263 = queryWeight, product of:
                4.325405 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0117438305 = queryNorm
              1.7863963 = fieldWeight in 3559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.21875 = fieldNorm(doc=3559)
          1.2336868 = weight(abstract_txt:homepage in 3559) [ClassicSimilarity], result of:
            1.2336868 = score(doc=3559,freq=1.0), product of:
              0.7635445 = queryWeight, product of:
                8.80242 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0117438305 = queryNorm
              1.6157366 = fieldWeight in 3559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.21875 = fieldNorm(doc=3559)
        0.08 = coord(2/25)
    
  5. Byers, D.F.; Wilson, L.: ¬The Web as a teaching tool (1996) 0.14
    0.1427308 = sum of:
      0.1427308 = product of:
        1.1894233 = sum of:
          0.061005633 = weight(abstract_txt:sites in 5924) [ClassicSimilarity], result of:
            0.061005633 = score(doc=5924,freq=1.0), product of:
              0.090482704 = queryWeight, product of:
                1.4284381 = boost
                5.393794 = idf(docFreq=527, maxDocs=42740)
                0.0117438305 = queryNorm
              0.67422426 = fieldWeight in 5924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393794 = idf(docFreq=527, maxDocs=42740)
                0.125 = fieldNorm(doc=5924)
          0.4234538 = weight(abstract_txt:homepages in 5924) [ClassicSimilarity], result of:
            0.4234538 = score(doc=5924,freq=1.0), product of:
              0.4148263 = queryWeight, product of:
                4.325405 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0117438305 = queryNorm
              1.0207978 = fieldWeight in 5924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.125 = fieldNorm(doc=5924)
          0.70496386 = weight(abstract_txt:homepage in 5924) [ClassicSimilarity], result of:
            0.70496386 = score(doc=5924,freq=1.0), product of:
              0.7635445 = queryWeight, product of:
                8.80242 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0117438305 = queryNorm
              0.92327803 = fieldWeight in 5924, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.125 = fieldNorm(doc=5924)
        0.12 = coord(3/25)