Search (1 results, page 1 of 1)

  • × author_ss:"Chen, Z."
  • × theme_ss:"Automatisches Abstracting"
  1. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.03
    0.027587567 = product of:
      0.055175133 = sum of:
        0.055175133 = product of:
          0.110350266 = sum of:
            0.110350266 = weight(_text_:web in 953) [ClassicSimilarity], result of:
              0.110350266 = score(doc=953,freq=18.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.64902663 = fieldWeight in 953, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=953)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.