Document (#32954)

Author
Shen, D.
Yang, Q.
Chen, Z.
Title
Noise reduction through summarization for Web-page classification
Source
Information processing and management. 43(2007) no.6, S.1735-1747
Year
2007
Abstract
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
Theme
Automatisches Abstracting

Similar documents (author)

  1. Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 3.16
    3.1649513 = sum of:
      3.1649513 = sum of:
        0.5975134 = weight(author_txt:chen in 4132) [ClassicSimilarity], result of:
          0.5975134 = score(doc=4132,freq=1.0), product of:
            0.38851857 = queryWeight, product of:
              6.1517096 = idf(docFreq=255, maxDocs=44218)
              0.063156195 = queryNorm
            1.5379274 = fieldWeight in 4132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.1517096 = idf(docFreq=255, maxDocs=44218)
              0.25 = fieldNorm(doc=4132)
        0.9568167 = weight(author_txt:yang in 4132) [ClassicSimilarity], result of:
          0.9568167 = score(doc=4132,freq=1.0), product of:
            0.53178066 = queryWeight, product of:
              1.1699313 = boost
              7.1970778 = idf(docFreq=89, maxDocs=44218)
              0.063156195 = queryNorm
            1.7992694 = fieldWeight in 4132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.1970778 = idf(docFreq=89, maxDocs=44218)
              0.25 = fieldNorm(doc=4132)
        1.6106212 = weight(author_txt:shen in 4132) [ClassicSimilarity], result of:
          1.6106212 = score(doc=4132,freq=1.0), product of:
            0.7525043 = queryWeight, product of:
              1.3917096 = boost
              8.561393 = idf(docFreq=22, maxDocs=44218)
              0.063156195 = queryNorm
            2.1403482 = fieldWeight in 4132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.561393 = idf(docFreq=22, maxDocs=44218)
              0.25 = fieldNorm(doc=4132)
    
  2. Chen, Y.-H.; Germain, C.A.; Yang, H.: ¬An exploration into the practices of library Web usability in ARL academic libraries (2009) 1.55
    1.5543301 = sum of:
      1.5543301 = product of:
        2.331495 = sum of:
          0.89627 = weight(author_txt:chen in 2798) [ClassicSimilarity], result of:
            0.89627 = score(doc=2798,freq=1.0), product of:
              0.38851857 = queryWeight, product of:
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.063156195 = queryNorm
              2.306891 = fieldWeight in 2798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.375 = fieldNorm(doc=2798)
          1.435225 = weight(author_txt:yang in 2798) [ClassicSimilarity], result of:
            1.435225 = score(doc=2798,freq=1.0), product of:
              0.53178066 = queryWeight, product of:
                1.1699313 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.063156195 = queryNorm
              2.698904 = fieldWeight in 2798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.375 = fieldNorm(doc=2798)
        0.6666667 = coord(2/3)
    
  3. Liu, D.-R.; Chen, Y.-H.; Shen, M.; Lu, P.-J.: Complementary QA network analysis for QA retrieval in social question-answering websites (2015) 1.47
    1.4720898 = sum of:
      1.4720898 = product of:
        2.2081347 = sum of:
          0.5975134 = weight(author_txt:chen in 1611) [ClassicSimilarity], result of:
            0.5975134 = score(doc=1611,freq=1.0), product of:
              0.38851857 = queryWeight, product of:
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.063156195 = queryNorm
              1.5379274 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.25 = fieldNorm(doc=1611)
          1.6106212 = weight(author_txt:shen in 1611) [ClassicSimilarity], result of:
            1.6106212 = score(doc=1611,freq=1.0), product of:
              0.7525043 = queryWeight, product of:
                1.3917096 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.063156195 = queryNorm
              2.1403482 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.25 = fieldNorm(doc=1611)
        0.6666667 = coord(2/3)
    
  4. Shen, X.-L.; Li, Y.-J.; Sun, Y.; Chen, J.; Wang, F.: Knowledge withholding in online knowledge spaces : social deviance behavior and secondary control perspective (2019) 1.47
    1.4720898 = sum of:
      1.4720898 = product of:
        2.2081347 = sum of:
          0.5975134 = weight(author_txt:chen in 5016) [ClassicSimilarity], result of:
            0.5975134 = score(doc=5016,freq=1.0), product of:
              0.38851857 = queryWeight, product of:
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.063156195 = queryNorm
              1.5379274 = fieldWeight in 5016, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.25 = fieldNorm(doc=5016)
          1.6106212 = weight(author_txt:shen in 5016) [ClassicSimilarity], result of:
            1.6106212 = score(doc=5016,freq=1.0), product of:
              0.7525043 = queryWeight, product of:
                1.3917096 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.063156195 = queryNorm
              2.1403482 = fieldWeight in 5016, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.25 = fieldNorm(doc=5016)
        0.6666667 = coord(2/3)
    
  5. Shen, Z.: CJK: the unique need of Chinese, Japanese, and Korean language cataloging (1993) 1.34
    1.3421844 = sum of:
      1.3421844 = product of:
        4.026553 = sum of:
          4.026553 = weight(author_txt:shen in 3726) [ClassicSimilarity], result of:
            4.026553 = score(doc=3726,freq=1.0), product of:
              0.7525043 = queryWeight, product of:
                1.3917096 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.063156195 = queryNorm
              5.3508706 = fieldWeight in 3726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.625 = fieldNorm(doc=3726)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Reeve, L.H.; Han, H.; Brooks, A.D.: ¬The use of domain-specific concepts in biomedical text summarization (2007) 0.22
    0.2194923 = sum of:
      0.2194923 = product of:
        0.7839011 = sum of:
          0.051875666 = weight(abstract_txt:summaries in 955) [ClassicSimilarity], result of:
            0.051875666 = score(doc=955,freq=2.0), product of:
              0.08344501 = queryWeight, product of:
                1.0310419 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011506832 = queryNorm
              0.62167484 = fieldWeight in 955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.0558404 = weight(abstract_txt:reduction in 955) [ClassicSimilarity], result of:
            0.0558404 = score(doc=955,freq=2.0), product of:
              0.08764429 = queryWeight, product of:
                1.0566665 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.011506832 = queryNorm
              0.6371254 = fieldWeight in 955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.047093607 = weight(abstract_txt:method in 955) [ClassicSimilarity], result of:
            0.047093607 = score(doc=955,freq=6.0), product of:
              0.06834427 = queryWeight, product of:
                1.3195996 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.011506832 = queryNorm
              0.68906444 = fieldWeight in 955, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.020933172 = weight(abstract_txt:performance in 955) [ClassicSimilarity], result of:
            0.020933172 = score(doc=955,freq=1.0), product of:
              0.07233269 = queryWeight, product of:
                1.357558 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011506832 = queryNorm
              0.28940126 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.010246853 = weight(abstract_txt:based in 955) [ClassicSimilarity], result of:
            0.010246853 = score(doc=955,freq=1.0), product of:
              0.051428284 = queryWeight, product of:
                1.4019669 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.011506832 = queryNorm
              0.19924548 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.06235615 = weight(abstract_txt:text in 955) [ClassicSimilarity], result of:
            0.06235615 = score(doc=955,freq=5.0), product of:
              0.11033605 = queryWeight, product of:
                2.3711817 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.011506832 = queryNorm
              0.5651476 = fieldWeight in 955, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
          0.53555524 = weight(abstract_txt:summarization in 955) [ClassicSimilarity], result of:
            0.53555524 = score(doc=955,freq=4.0), product of:
              0.6006896 = queryWeight, product of:
                7.318974 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011506832 = queryNorm
              0.89156735 = fieldWeight in 955, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=955)
        0.28 = coord(7/25)
    
  2. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.22
    0.21774246 = sum of:
      0.21774246 = product of:
        0.90726024 = sum of:
          0.020933172 = weight(abstract_txt:performance in 2693) [ClassicSimilarity], result of:
            0.020933172 = score(doc=2693,freq=1.0), product of:
              0.07233269 = queryWeight, product of:
                1.357558 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011506832 = queryNorm
              0.28940126 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.01774807 = weight(abstract_txt:based in 2693) [ClassicSimilarity], result of:
            0.01774807 = score(doc=2693,freq=3.0), product of:
              0.051428284 = queryWeight, product of:
                1.4019669 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.011506832 = queryNorm
              0.3451033 = fieldWeight in 2693, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.025727522 = weight(abstract_txt:improve in 2693) [ClassicSimilarity], result of:
            0.025727522 = score(doc=2693,freq=1.0), product of:
              0.082993336 = queryWeight, product of:
                1.4541618 = boost
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.011506832 = queryNorm
              0.30999503 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.048300862 = weight(abstract_txt:text in 2693) [ClassicSimilarity], result of:
            0.048300862 = score(doc=2693,freq=3.0), product of:
              0.11033605 = queryWeight, product of:
                2.3711817 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.011506832 = queryNorm
              0.4377614 = fieldWeight in 2693, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.13863209 = weight(abstract_txt:algorithms in 2693) [ClassicSimilarity], result of:
            0.13863209 = score(doc=2693,freq=2.0), product of:
              0.27478337 = queryWeight, product of:
                4.1836596 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.011506832 = queryNorm
              0.5045141 = fieldWeight in 2693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.65591854 = weight(abstract_txt:summarization in 2693) [ClassicSimilarity], result of:
            0.65591854 = score(doc=2693,freq=6.0), product of:
              0.6006896 = queryWeight, product of:
                7.318974 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011506832 = queryNorm
              1.0919425 = fieldWeight in 2693, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
        0.24 = coord(6/25)
    
  3. Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.21
    0.20902725 = sum of:
      0.20902725 = product of:
        0.5806312 = sum of:
          0.044074975 = weight(abstract_txt:achieves in 5043) [ClassicSimilarity], result of:
            0.044074975 = score(doc=5043,freq=1.0), product of:
              0.094311066 = queryWeight, product of:
                1.0961183 = boost
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.011506832 = queryNorm
              0.4673362 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.05813607 = weight(abstract_txt:noisy in 5043) [ClassicSimilarity], result of:
            0.05813607 = score(doc=5043,freq=1.0), product of:
              0.113430984 = queryWeight, product of:
                1.2021037 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.011506832 = queryNorm
              0.5125237 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.020933172 = weight(abstract_txt:performance in 5043) [ClassicSimilarity], result of:
            0.020933172 = score(doc=5043,freq=1.0), product of:
              0.07233269 = queryWeight, product of:
                1.357558 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011506832 = queryNorm
              0.28940126 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.014491239 = weight(abstract_txt:based in 5043) [ClassicSimilarity], result of:
            0.014491239 = score(doc=5043,freq=2.0), product of:
              0.051428284 = queryWeight, product of:
                1.4019669 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.011506832 = queryNorm
              0.28177565 = fieldWeight in 5043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.017612323 = weight(abstract_txt:more in 5043) [ClassicSimilarity], result of:
            0.017612323 = score(doc=5043,freq=2.0), product of:
              0.058570117 = queryWeight, product of:
                1.4961488 = boost
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.011506832 = queryNorm
              0.30070493 = fieldWeight in 5043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.01869031 = weight(abstract_txt:than in 5043) [ClassicSimilarity], result of:
            0.01869031 = score(doc=5043,freq=1.0), product of:
              0.07677492 = queryWeight, product of:
                1.7129569 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.011506832 = queryNorm
              0.24344292 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.099477984 = weight(abstract_txt:noise in 5043) [ClassicSimilarity], result of:
            0.099477984 = score(doc=5043,freq=1.0), product of:
              0.20445414 = queryWeight, product of:
                2.2823858 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.011506832 = queryNorm
              0.48655403 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.03943749 = weight(abstract_txt:text in 5043) [ClassicSimilarity], result of:
            0.03943749 = score(doc=5043,freq=2.0), product of:
              0.11033605 = queryWeight, product of:
                2.3711817 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.011506832 = queryNorm
              0.3574307 = fieldWeight in 5043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.26777762 = weight(abstract_txt:summarization in 5043) [ClassicSimilarity], result of:
            0.26777762 = score(doc=5043,freq=1.0), product of:
              0.6006896 = queryWeight, product of:
                7.318974 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011506832 = queryNorm
              0.44578367 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
        0.36 = coord(9/25)
    
  4. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.21
    0.20680113 = sum of:
      0.20680113 = product of:
        1.0340056 = sum of:
          0.07941807 = weight(abstract_txt:summaries in 563) [ClassicSimilarity], result of:
            0.07941807 = score(doc=563,freq=3.0), product of:
              0.08344501 = queryWeight, product of:
                1.0310419 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.011506832 = queryNorm
              0.95174134 = fieldWeight in 563, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=563)
          0.03485815 = weight(abstract_txt:text in 563) [ClassicSimilarity], result of:
            0.03485815 = score(doc=563,freq=1.0), product of:
              0.11033605 = queryWeight, product of:
                2.3711817 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.011506832 = queryNorm
              0.3159271 = fieldWeight in 563, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=563)
          0.058687825 = weight(abstract_txt:classification in 563) [ClassicSimilarity], result of:
            0.058687825 = score(doc=563,freq=1.0), product of:
              0.18817385 = queryWeight, product of:
                4.0964227 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.011506832 = queryNorm
              0.3118809 = fieldWeight in 563, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=563)
          0.281286 = weight(abstract_txt:page in 563) [ClassicSimilarity], result of:
            0.281286 = score(doc=563,freq=2.0), product of:
              0.42457002 = queryWeight, product of:
                6.1531825 = boost
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.011506832 = queryNorm
              0.6625197 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.078125 = fieldNorm(doc=563)
          0.57975554 = weight(abstract_txt:summarization in 563) [ClassicSimilarity], result of:
            0.57975554 = score(doc=563,freq=3.0), product of:
              0.6006896 = queryWeight, product of:
                7.318974 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.011506832 = queryNorm
              0.96514994 = fieldWeight in 563, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=563)
        0.2 = coord(5/25)
    
  5. Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.18
    0.17839117 = sum of:
      0.17839117 = product of:
        0.6371113 = sum of:
          0.03346723 = weight(abstract_txt:directory in 1070) [ClassicSimilarity], result of:
            0.03346723 = score(doc=1070,freq=1.0), product of:
              0.07849605 = queryWeight, product of:
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.011506832 = queryNorm
              0.42635563 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.033300206 = weight(abstract_txt:method in 1070) [ClassicSimilarity], result of:
            0.033300206 = score(doc=1070,freq=3.0), product of:
              0.06834427 = queryWeight, product of:
                1.3195996 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.011506832 = queryNorm
              0.4872421 = fieldWeight in 1070, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.029603973 = weight(abstract_txt:performance in 1070) [ClassicSimilarity], result of:
            0.029603973 = score(doc=1070,freq=2.0), product of:
              0.07233269 = queryWeight, product of:
                1.357558 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011506832 = queryNorm
              0.40927517 = fieldWeight in 1070, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.010246853 = weight(abstract_txt:based in 1070) [ClassicSimilarity], result of:
            0.010246853 = score(doc=1070,freq=1.0), product of:
              0.051428284 = queryWeight, product of:
                1.4019669 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.011506832 = queryNorm
              0.19924548 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.025727522 = weight(abstract_txt:improve in 1070) [ClassicSimilarity], result of:
            0.025727522 = score(doc=1070,freq=1.0), product of:
              0.082993336 = queryWeight, product of:
                1.4541618 = boost
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.011506832 = queryNorm
              0.30999503 = fieldWeight in 1070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.115004174 = weight(abstract_txt:classification in 1070) [ClassicSimilarity], result of:
            0.115004174 = score(doc=1070,freq=6.0), product of:
              0.18817385 = queryWeight, product of:
                4.0964227 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.011506832 = queryNorm
              0.6111592 = fieldWeight in 1070, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
          0.38976133 = weight(abstract_txt:page in 1070) [ClassicSimilarity], result of:
            0.38976133 = score(doc=1070,freq=6.0), product of:
              0.42457002 = queryWeight, product of:
                6.1531825 = boost
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.011506832 = queryNorm
              0.9180142 = fieldWeight in 1070, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9964437 = idf(docFreq=298, maxDocs=44218)
                0.0625 = fieldNorm(doc=1070)
        0.28 = coord(7/25)