Document (#29923)

Author
Calado, P.
Cristo, M.
Gonçalves, M.A.
Moura, E.S. de
Ribeiro-Neto, B.
Ziviani, N.
Title
Link-based similarity measures for the classification of Web documents
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.2, S.208-221
Year
2006
Abstract
Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 5.59
    5.592396 = sum of:
      5.592396 = sum of:
        0.7510306 = weight(author_txt:gonçalves in 3711) [ClassicSimilarity], result of:
          0.7510306 = score(doc=3711,freq=1.0), product of:
            0.3524302 = queryWeight, product of:
              8.524021 = idf(docFreq=22, maxDocs=42596)
              0.041345533 = queryNorm
            2.1310053 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.524021 = idf(docFreq=22, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
        0.7885819 = weight(author_txt:moura in 3711) [ClassicSimilarity], result of:
          0.7885819 = score(doc=3711,freq=1.0), product of:
            0.364082 = queryWeight, product of:
              1.0163963 = boost
              8.663783 = idf(docFreq=19, maxDocs=42596)
              0.041345533 = queryNorm
            2.1659458 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.663783 = idf(docFreq=19, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
        0.8026712 = weight(author_txt:ribeiro in 3711) [ClassicSimilarity], result of:
          0.8026712 = score(doc=3711,freq=1.0), product of:
            0.36840582 = queryWeight, product of:
              1.0224137 = boost
              8.715076 = idf(docFreq=18, maxDocs=42596)
              0.041345533 = queryNorm
            2.178769 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.715076 = idf(docFreq=18, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
        1.027337 = weight(author_txt:neto in 3711) [ClassicSimilarity], result of:
          1.027337 = score(doc=3711,freq=1.0), product of:
            0.4342868 = queryWeight, product of:
              1.1100736 = boost
              9.462291 = idf(docFreq=8, maxDocs=42596)
              0.041345533 = queryNorm
            2.3655727 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.462291 = idf(docFreq=8, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
        1.1113875 = weight(author_txt:ziviani in 3711) [ClassicSimilarity], result of:
          1.1113875 = score(doc=3711,freq=1.0), product of:
            0.4576622 = queryWeight, product of:
              1.1395568 = boost
              9.713606 = idf(docFreq=6, maxDocs=42596)
              0.041345533 = queryNorm
            2.4284015 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.713606 = idf(docFreq=6, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
        1.1113875 = weight(author_txt:cristo in 3711) [ClassicSimilarity], result of:
          1.1113875 = score(doc=3711,freq=1.0), product of:
            0.4576622 = queryWeight, product of:
              1.1395568 = boost
              9.713606 = idf(docFreq=6, maxDocs=42596)
              0.041345533 = queryNorm
            2.4284015 = fieldWeight in 3711, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.713606 = idf(docFreq=6, maxDocs=42596)
              0.25 = fieldNorm(doc=3711)
    
  2. Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.F.; Gonçalves, M.A.: ¬A generic Web-based entity resolution framework (2011) 2.46
    2.4616177 = sum of:
      2.4616177 = product of:
        3.6924264 = sum of:
          0.7510306 = weight(author_txt:gonçalves in 451) [ClassicSimilarity], result of:
            0.7510306 = score(doc=451,freq=1.0), product of:
              0.3524302 = queryWeight, product of:
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.041345533 = queryNorm
              2.1310053 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.25 = fieldNorm(doc=451)
          0.8026712 = weight(author_txt:ribeiro in 451) [ClassicSimilarity], result of:
            0.8026712 = score(doc=451,freq=1.0), product of:
              0.36840582 = queryWeight, product of:
                1.0224137 = boost
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.041345533 = queryNorm
              2.178769 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.25 = fieldNorm(doc=451)
          1.027337 = weight(author_txt:neto in 451) [ClassicSimilarity], result of:
            1.027337 = score(doc=451,freq=1.0), product of:
              0.4342868 = queryWeight, product of:
                1.1100736 = boost
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.041345533 = queryNorm
              2.3655727 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.25 = fieldNorm(doc=451)
          1.1113875 = weight(author_txt:ziviani in 451) [ClassicSimilarity], result of:
            1.1113875 = score(doc=451,freq=1.0), product of:
              0.4576622 = queryWeight, product of:
                1.1395568 = boost
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.041345533 = queryNorm
              2.4284015 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.25 = fieldNorm(doc=451)
        0.6666667 = coord(4/6)
    
  3. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 2.25
    2.246414 = sum of:
      2.246414 = product of:
        3.3696208 = sum of:
          0.7510306 = weight(author_txt:gonçalves in 120) [ClassicSimilarity], result of:
            0.7510306 = score(doc=120,freq=1.0), product of:
              0.3524302 = queryWeight, product of:
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.041345533 = queryNorm
              2.1310053 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.25 = fieldNorm(doc=120)
          0.7885819 = weight(author_txt:moura in 120) [ClassicSimilarity], result of:
            0.7885819 = score(doc=120,freq=1.0), product of:
              0.364082 = queryWeight, product of:
                1.0163963 = boost
                8.663783 = idf(docFreq=19, maxDocs=42596)
                0.041345533 = queryNorm
              2.1659458 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.663783 = idf(docFreq=19, maxDocs=42596)
                0.25 = fieldNorm(doc=120)
          0.8026712 = weight(author_txt:ribeiro in 120) [ClassicSimilarity], result of:
            0.8026712 = score(doc=120,freq=1.0), product of:
              0.36840582 = queryWeight, product of:
                1.0224137 = boost
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.041345533 = queryNorm
              2.178769 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.25 = fieldNorm(doc=120)
          1.027337 = weight(author_txt:neto in 120) [ClassicSimilarity], result of:
            1.027337 = score(doc=120,freq=1.0), product of:
              0.4342868 = queryWeight, product of:
                1.1100736 = boost
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.041345533 = queryNorm
              2.3655727 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.25 = fieldNorm(doc=120)
        0.6666667 = coord(4/6)
    
  4. Silva, A.J.C.; Gonçalves, M.A.; Laender, A.H.F.; Modesto, M.A.B.; Cristo, M.; Ziviani, N.: Finding what is missing from a digital library : a case study in the computer science field (2009) 1.49
    1.4869028 = sum of:
      1.4869028 = product of:
        2.9738057 = sum of:
          0.7510306 = weight(author_txt:gonçalves in 220) [ClassicSimilarity], result of:
            0.7510306 = score(doc=220,freq=1.0), product of:
              0.3524302 = queryWeight, product of:
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.041345533 = queryNorm
              2.1310053 = fieldWeight in 220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.524021 = idf(docFreq=22, maxDocs=42596)
                0.25 = fieldNorm(doc=220)
          1.1113875 = weight(author_txt:ziviani in 220) [ClassicSimilarity], result of:
            1.1113875 = score(doc=220,freq=1.0), product of:
              0.4576622 = queryWeight, product of:
                1.1395568 = boost
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.041345533 = queryNorm
              2.4284015 = fieldWeight in 220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.25 = fieldNorm(doc=220)
          1.1113875 = weight(author_txt:cristo in 220) [ClassicSimilarity], result of:
            1.1113875 = score(doc=220,freq=1.0), product of:
              0.4576622 = queryWeight, product of:
                1.1395568 = boost
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.041345533 = queryNorm
              2.4284015 = fieldWeight in 220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.713606 = idf(docFreq=6, maxDocs=42596)
                0.25 = fieldNorm(doc=220)
        0.5 = coord(3/6)
    
  5. Silveira, M.; Ribeiro-Neto, B.: Concept-based ranking : a case study in the juridical domain (2004) 1.07
    1.0675049 = sum of:
      1.0675049 = product of:
        3.2025146 = sum of:
          1.4046746 = weight(author_txt:ribeiro in 3340) [ClassicSimilarity], result of:
            1.4046746 = score(doc=3340,freq=1.0), product of:
              0.36840582 = queryWeight, product of:
                1.0224137 = boost
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.041345533 = queryNorm
              3.812846 = fieldWeight in 3340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.715076 = idf(docFreq=18, maxDocs=42596)
                0.4375 = fieldNorm(doc=3340)
          1.7978399 = weight(author_txt:neto in 3340) [ClassicSimilarity], result of:
            1.7978399 = score(doc=3340,freq=1.0), product of:
              0.4342868 = queryWeight, product of:
                1.1100736 = boost
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.041345533 = queryNorm
              4.1397524 = fieldWeight in 3340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.4375 = fieldNorm(doc=3340)
        0.33333334 = coord(2/6)
    

Similar documents (content)

  1. Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.31
    0.30986282 = sum of:
      0.30986282 = product of:
        0.86073005 = sum of:
          0.090643965 = weight(abstract_txt:hyperlink in 3707) [ClassicSimilarity], result of:
            0.090643965 = score(doc=3707,freq=1.0), product of:
              0.1846849 = queryWeight, product of:
                1.037843 = boost
                7.8528533 = idf(docFreq=44, maxDocs=42596)
                0.022660643 = queryNorm
              0.49080333 = fieldWeight in 3707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8528533 = idf(docFreq=44, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.018550485 = weight(abstract_txt:based in 3707) [ClassicSimilarity], result of:
            0.018550485 = score(doc=3707,freq=1.0), product of:
              0.09250163 = queryWeight, product of:
                1.2721881 = boost
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.022660643 = queryNorm
              0.20054224 = fieldWeight in 3707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.08572114 = weight(abstract_txt:topic in 3707) [ClassicSimilarity], result of:
            0.08572114 = score(doc=3707,freq=3.0), product of:
              0.15544151 = queryWeight, product of:
                1.3465252 = boost
                5.0942507 = idf(docFreq=709, maxDocs=42596)
                0.022660643 = queryNorm
              0.5514688 = fieldWeight in 3707, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0942507 = idf(docFreq=709, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.037279654 = weight(abstract_txt:text in 3707) [ClassicSimilarity], result of:
            0.037279654 = score(doc=3707,freq=1.0), product of:
              0.14730828 = queryWeight, product of:
                1.6054256 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.022660643 = queryNorm
              0.25307238 = fieldWeight in 3707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.06220264 = weight(abstract_txt:document in 3707) [ClassicSimilarity], result of:
            0.06220264 = score(doc=3707,freq=2.0), product of:
              0.16447823 = queryWeight, product of:
                1.69641 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.022660643 = queryNorm
              0.3781816 = fieldWeight in 3707, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.046903033 = weight(abstract_txt:structure in 3707) [ClassicSimilarity], result of:
            0.046903033 = score(doc=3707,freq=1.0), product of:
              0.17167741 = queryWeight, product of:
                1.7331382 = boost
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.022660643 = queryNorm
              0.27320445 = fieldWeight in 3707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.104234576 = weight(abstract_txt:documents in 3707) [ClassicSimilarity], result of:
            0.104234576 = score(doc=3707,freq=4.0), product of:
              0.20271225 = queryWeight, product of:
                2.174633 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.022660643 = queryNorm
              0.5141997 = fieldWeight in 3707, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.103698395 = weight(abstract_txt:classification in 3707) [ClassicSimilarity], result of:
            0.103698395 = score(doc=3707,freq=3.0), product of:
              0.23951705 = queryWeight, product of:
                2.64283 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.022660643 = queryNorm
              0.43294787 = fieldWeight in 3707, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
          0.3114961 = weight(abstract_txt:link in 3707) [ClassicSimilarity], result of:
            0.3114961 = score(doc=3707,freq=5.0), product of:
              0.39042464 = queryWeight, product of:
                3.0179677 = boost
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.022660643 = queryNorm
              0.7978393 = fieldWeight in 3707, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.0625 = fieldNorm(doc=3707)
        0.36 = coord(9/25)
    
  2. Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.27
    0.27362067 = sum of:
      0.27362067 = product of:
        0.7600574 = sum of:
          0.132803 = weight(abstract_txt:gains in 3711) [ClassicSimilarity], result of:
            0.132803 = score(doc=3711,freq=2.0), product of:
              0.18908948 = queryWeight, product of:
                1.050146 = boost
                7.9459434 = idf(docFreq=40, maxDocs=42596)
                0.022660643 = queryNorm
              0.7023288 = fieldWeight in 3711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.9459434 = idf(docFreq=40, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.03781863 = weight(abstract_txt:traditional in 3711) [ClassicSimilarity], result of:
            0.03781863 = score(doc=3711,freq=1.0), product of:
              0.12992299 = queryWeight, product of:
                1.2310451 = boost
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.022660643 = queryNorm
              0.29108498 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.039299153 = weight(abstract_txt:further in 3711) [ClassicSimilarity], result of:
            0.039299153 = score(doc=3711,freq=1.0), product of:
              0.13329205 = queryWeight, product of:
                1.2469043 = boost
                4.717359 = idf(docFreq=1034, maxDocs=42596)
                0.022660643 = queryNorm
              0.29483494 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.717359 = idf(docFreq=1034, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.049079966 = weight(abstract_txt:based in 3711) [ClassicSimilarity], result of:
            0.049079966 = score(doc=3711,freq=7.0), product of:
              0.09250163 = queryWeight, product of:
                1.2721881 = boost
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.022660643 = queryNorm
              0.5305849 = fieldWeight in 3711, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.064570256 = weight(abstract_txt:text in 3711) [ClassicSimilarity], result of:
            0.064570256 = score(doc=3711,freq=3.0), product of:
              0.14730828 = queryWeight, product of:
                1.6054256 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.022660643 = queryNorm
              0.43833423 = fieldWeight in 3711, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.12749144 = weight(abstract_txt:measures in 3711) [ClassicSimilarity], result of:
            0.12749144 = score(doc=3711,freq=2.0), product of:
              0.2653933 = queryWeight, product of:
                2.1548724 = boost
                5.434957 = idf(docFreq=504, maxDocs=42596)
                0.022660643 = queryNorm
              0.48038685 = fieldWeight in 3711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.434957 = idf(docFreq=504, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.052117288 = weight(abstract_txt:documents in 3711) [ClassicSimilarity], result of:
            0.052117288 = score(doc=3711,freq=1.0), product of:
              0.20271225 = queryWeight, product of:
                2.174633 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.022660643 = queryNorm
              0.25709984 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.0598703 = weight(abstract_txt:classification in 3711) [ClassicSimilarity], result of:
            0.0598703 = score(doc=3711,freq=1.0), product of:
              0.23951705 = queryWeight, product of:
                2.64283 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.022660643 = queryNorm
              0.24996258 = fieldWeight in 3711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
          0.19700743 = weight(abstract_txt:link in 3711) [ClassicSimilarity], result of:
            0.19700743 = score(doc=3711,freq=2.0), product of:
              0.39042464 = queryWeight, product of:
                3.0179677 = boost
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.022660643 = queryNorm
              0.50459784 = fieldWeight in 3711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.0625 = fieldNorm(doc=3711)
        0.36 = coord(9/25)
    
  3. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.25
    0.24573077 = sum of:
      0.24573077 = product of:
        0.7679087 = sum of:
          0.03583512 = weight(abstract_txt:they in 1373) [ClassicSimilarity], result of:
            0.03583512 = score(doc=1373,freq=2.0), product of:
              0.08573103 = queryWeight, product of:
                3.7832568 = idf(docFreq=2633, maxDocs=42596)
                0.022660643 = queryNorm
              0.41799474 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7832568 = idf(docFreq=2633, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.023188105 = weight(abstract_txt:based in 1373) [ClassicSimilarity], result of:
            0.023188105 = score(doc=1373,freq=1.0), product of:
              0.09250163 = queryWeight, product of:
                1.2721881 = boost
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.022660643 = queryNorm
              0.2506778 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.04659957 = weight(abstract_txt:text in 1373) [ClassicSimilarity], result of:
            0.04659957 = score(doc=1373,freq=1.0), product of:
              0.14730828 = queryWeight, product of:
                1.6054256 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.022660643 = queryNorm
              0.31634048 = fieldWeight in 1373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.10995977 = weight(abstract_txt:document in 1373) [ClassicSimilarity], result of:
            0.10995977 = score(doc=1373,freq=4.0), product of:
              0.16447823 = queryWeight, product of:
                1.69641 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.022660643 = queryNorm
              0.66853696 = fieldWeight in 1373, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.1593643 = weight(abstract_txt:measures in 1373) [ClassicSimilarity], result of:
            0.1593643 = score(doc=1373,freq=2.0), product of:
              0.2653933 = queryWeight, product of:
                2.1548724 = boost
                5.434957 = idf(docFreq=504, maxDocs=42596)
                0.022660643 = queryNorm
              0.60048354 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.434957 = idf(docFreq=504, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.09213122 = weight(abstract_txt:documents in 1373) [ClassicSimilarity], result of:
            0.09213122 = score(doc=1373,freq=2.0), product of:
              0.20271225 = queryWeight, product of:
                2.174633 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.022660643 = queryNorm
              0.4544926 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.19499385 = weight(abstract_txt:similarity in 1373) [ClassicSimilarity], result of:
            0.19499385 = score(doc=1373,freq=2.0), product of:
              0.30360565 = queryWeight, product of:
                2.304791 = boost
                5.813077 = idf(docFreq=345, maxDocs=42596)
                0.022660643 = queryNorm
              0.6422603 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.813077 = idf(docFreq=345, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
          0.105836734 = weight(abstract_txt:classification in 1373) [ClassicSimilarity], result of:
            0.105836734 = score(doc=1373,freq=2.0), product of:
              0.23951705 = queryWeight, product of:
                2.64283 = boost
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.022660643 = queryNorm
              0.44187558 = fieldWeight in 1373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9994013 = idf(docFreq=2121, maxDocs=42596)
                0.078125 = fieldNorm(doc=1373)
        0.32 = coord(8/25)
    
  4. Haveliwala, T.: Context-Sensitive Web search (2005) 0.22
    0.22141765 = sum of:
      0.22141765 = product of:
        0.615049 = sum of:
          0.037735216 = weight(abstract_txt:provide in 3568) [ClassicSimilarity], result of:
            0.037735216 = score(doc=3568,freq=3.0), product of:
              0.09832582 = queryWeight, product of:
                1.0709391 = boost
                4.0516376 = idf(docFreq=2013, maxDocs=42596)
                0.022660643 = queryNorm
              0.3837773 = fieldWeight in 3568, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0516376 = idf(docFreq=2013, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.057315808 = weight(abstract_txt:traditional in 3568) [ClassicSimilarity], result of:
            0.057315808 = score(doc=3568,freq=3.0), product of:
              0.12992299 = queryWeight, product of:
                1.2310451 = boost
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.022660643 = queryNorm
              0.4411522 = fieldWeight in 3568, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.03346443 = weight(abstract_txt:authors in 3568) [ClassicSimilarity], result of:
            0.03346443 = score(doc=3568,freq=1.0), product of:
              0.13089782 = queryWeight, product of:
                1.2356548 = boost
                4.6747994 = idf(docFreq=1079, maxDocs=42596)
                0.022660643 = queryNorm
              0.25565308 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6747994 = idf(docFreq=1079, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.016231675 = weight(abstract_txt:based in 3568) [ClassicSimilarity], result of:
            0.016231675 = score(doc=3568,freq=1.0), product of:
              0.09250163 = queryWeight, product of:
                1.2721881 = boost
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.022660643 = queryNorm
              0.17547446 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.058039542 = weight(abstract_txt:structure in 3568) [ClassicSimilarity], result of:
            0.058039542 = score(doc=3568,freq=2.0), product of:
              0.17167741 = queryWeight, product of:
                1.7331382 = boost
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.022660643 = queryNorm
              0.33807325 = fieldWeight in 3568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.09776115 = weight(abstract_txt:alone in 3568) [ClassicSimilarity], result of:
            0.09776115 = score(doc=3568,freq=1.0), product of:
              0.26749825 = queryWeight, product of:
                1.7664098 = boost
                6.6827817 = idf(docFreq=144, maxDocs=42596)
                0.022660643 = queryNorm
              0.36546463 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6827817 = idf(docFreq=144, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.045602627 = weight(abstract_txt:documents in 3568) [ClassicSimilarity], result of:
            0.045602627 = score(doc=3568,freq=1.0), product of:
              0.20271225 = queryWeight, product of:
                2.174633 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.022660643 = queryNorm
              0.22496235 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.09651704 = weight(abstract_txt:similarity in 3568) [ClassicSimilarity], result of:
            0.09651704 = score(doc=3568,freq=1.0), product of:
              0.30360565 = queryWeight, product of:
                2.304791 = boost
                5.813077 = idf(docFreq=345, maxDocs=42596)
                0.022660643 = queryNorm
              0.31790265 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813077 = idf(docFreq=345, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
          0.1723815 = weight(abstract_txt:link in 3568) [ClassicSimilarity], result of:
            0.1723815 = score(doc=3568,freq=2.0), product of:
              0.39042464 = queryWeight, product of:
                3.0179677 = boost
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.022660643 = queryNorm
              0.4415231 = fieldWeight in 3568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.0546875 = fieldNorm(doc=3568)
        0.36 = coord(9/25)
    
  5. Addison, E.R.; Nelson, P.E.: Intelligent hypertext (1992) 0.22
    0.21792468 = sum of:
      0.21792468 = product of:
        0.68101466 = sum of:
          0.056727942 = weight(abstract_txt:traditional in 2095) [ClassicSimilarity], result of:
            0.056727942 = score(doc=2095,freq=1.0), product of:
              0.12992299 = queryWeight, product of:
                1.2310451 = boost
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.022660643 = queryNorm
              0.43662745 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6573596 = idf(docFreq=1098, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.0573676 = weight(abstract_txt:authors in 2095) [ClassicSimilarity], result of:
            0.0573676 = score(doc=2095,freq=1.0), product of:
              0.13089782 = queryWeight, product of:
                1.2356548 = boost
                4.6747994 = idf(docFreq=1079, maxDocs=42596)
                0.022660643 = queryNorm
              0.43826246 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6747994 = idf(docFreq=1079, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.027825728 = weight(abstract_txt:based in 2095) [ClassicSimilarity], result of:
            0.027825728 = score(doc=2095,freq=1.0), product of:
              0.09250163 = queryWeight, product of:
                1.2721881 = boost
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.022660643 = queryNorm
              0.30081338 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2086759 = idf(docFreq=4678, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.055919483 = weight(abstract_txt:text in 2095) [ClassicSimilarity], result of:
            0.055919483 = score(doc=2095,freq=1.0), product of:
              0.14730828 = queryWeight, product of:
                1.6054256 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.022660643 = queryNorm
              0.37960857 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.09330396 = weight(abstract_txt:document in 2095) [ClassicSimilarity], result of:
            0.09330396 = score(doc=2095,freq=2.0), product of:
              0.16447823 = queryWeight, product of:
                1.69641 = boost
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.022660643 = queryNorm
              0.5672724 = fieldWeight in 2095, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2786365 = idf(docFreq=1604, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.07035455 = weight(abstract_txt:structure in 2095) [ClassicSimilarity], result of:
            0.07035455 = score(doc=2095,freq=1.0), product of:
              0.17167741 = queryWeight, product of:
                1.7331382 = boost
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.022660643 = queryNorm
              0.40980667 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.371271 = idf(docFreq=1462, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.11055747 = weight(abstract_txt:documents in 2095) [ClassicSimilarity], result of:
            0.11055747 = score(doc=2095,freq=2.0), product of:
              0.20271225 = queryWeight, product of:
                2.174633 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.022660643 = queryNorm
              0.54539114 = fieldWeight in 2095, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
          0.20895794 = weight(abstract_txt:link in 2095) [ClassicSimilarity], result of:
            0.20895794 = score(doc=2095,freq=1.0), product of:
              0.39042464 = queryWeight, product of:
                3.0179677 = boost
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.022660643 = queryNorm
              0.53520685 = fieldWeight in 2095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7088733 = idf(docFreq=383, maxDocs=42596)
                0.09375 = fieldNorm(doc=2095)
        0.32 = coord(8/25)