Search (9 results, page 1 of 1)

Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.02
```
0.015679711 = product of:
  0.094078265 = sum of:
    0.094078265 = weight(_text_:web in 953) [ClassicSimilarity], result of:
      0.094078265 = score(doc=953,freq=18.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.64902663 = fieldWeight in 953, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=953)
  0.16666667 = coord(1/6)
```
Abstract

Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.

Craven, T.C.: Presentation of repeated phrases in a computer-assisted abstracting tool kit (2001) 0.02

0.015292614 = product of:
  0.09175568 = sum of:
    0.09175568 = weight(_text_:computer in 3667) [ClassicSimilarity], result of:
      0.09175568 = score(doc=3667,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.56527805 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.109375 = fieldNorm(doc=3667)
  0.16666667 = coord(1/6)

Saggion, H.; Lapalme, G.: Selective analysis for the automatic generation of summaries (2000) 0.01
```
0.007646307 = product of:
  0.04587784 = sum of:
    0.04587784 = weight(_text_:computer in 132) [ClassicSimilarity], result of:
      0.04587784 = score(doc=132,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.28263903 = fieldWeight in 132, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=132)
  0.16666667 = coord(1/6)
```
Abstract

Selective Analysis is a new method for text summarization of technical articles whose design is based on the study of a corpus of professional abstracts and technical documents The method emphasizes the selection of particular types of information and its elaboration exploring the issue of dynamical summarization. A computer prototype was developed to demonstrate the viability of the approach and the automatic abstracts were evaluated using human informants. The results so far obtained indicate that the summaries are acceptable in content and text quality

Craven, T.C.: Abstracts produced using computer assistance (2000) 0.01

0.0065539777 = product of:
  0.039323866 = sum of:
    0.039323866 = weight(_text_:computer in 4809) [ClassicSimilarity], result of:
      0.039323866 = score(doc=4809,freq=2.0), product of:
        0.16231956 = queryWeight, product of:
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.044416238 = queryNorm
        0.24226204 = fieldWeight in 4809, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6545093 = idf(docFreq=3109, maxDocs=44218)
          0.046875 = fieldNorm(doc=4809)
  0.16666667 = coord(1/6)

Liang, S.-F.; Devlin, S.; Tait, J.: Investigating sentence weighting components for automatic summarisation (2007) 0.01
```
0.0052265706 = product of:
  0.031359423 = sum of:
    0.031359423 = weight(_text_:web in 899) [ClassicSimilarity], result of:
      0.031359423 = score(doc=899,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.21634221 = fieldWeight in 899, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=899)
  0.16666667 = coord(1/6)
```
Abstract

The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. It subsequently proved to be a reliable indicator for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
Ou, S.; Khoo, C.S.G.; Goh, D.H.: Multi-document summarization of news articles using an event-based framework (2006) 0.00
```
0.004355476 = product of:
  0.026132854 = sum of:
    0.026132854 = weight(_text_:web in 657) [ClassicSimilarity], result of:
      0.026132854 = score(doc=657,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.18028519 = fieldWeight in 657, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=657)
  0.16666667 = coord(1/6)
```
Abstract

Purpose - The purpose of this research is to develop a method for automatic construction of multi-document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query. Design/methodology/approach - Based on the cross-document discourse analysis, an event-based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree-view interface was implemented for displaying a multi-document summary based on the framework. A preliminary user evaluation was performed by comparing the framework-based summaries against the sentence-based summaries. Findings - In a small evaluation, all the human subjects preferred the framework-based summaries to the sentence-based summaries. It indicates that the event-based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events. Research limitations/implications - Limited to event-based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event-based framework is being implemented. Practical implications - Multi-document summarization of news articles can adopt the proposed event-based framework. Originality/value - An event-based framework for summarizing sets of news articles was developed and evaluated using a tree-view interface for displaying such summaries.
Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.00
```
0.004355476 = product of:
  0.026132854 = sum of:
    0.026132854 = weight(_text_:web in 522) [ClassicSimilarity], result of:
      0.026132854 = score(doc=522,freq=2.0), product of:
        0.14495286 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.044416238 = queryNorm
        0.18028519 = fieldWeight in 522, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=522)
  0.16666667 = coord(1/6)
```
Abstract

The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method-with or without the use of a taxonomy-were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.00
```
0.0030088935 = product of:
  0.01805336 = sum of:
    0.01805336 = product of:
      0.03610672 = sum of:
        0.03610672 = weight(_text_:22 in 948) [ClassicSimilarity], result of:
          0.03610672 = score(doc=948,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.23214069 = fieldWeight in 948, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=948)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.

Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.00

0.0025074114 = product of:
  0.0150444675 = sum of:
    0.0150444675 = product of:
      0.030088935 = sum of:
        0.030088935 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
          0.030088935 = score(doc=5290,freq=2.0), product of:
            0.1555381 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044416238 = queryNorm
            0.19345059 = fieldWeight in 5290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5290)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 22. 7.2006 17:25:48

Search (9 results, page 1 of 1)

Authors