Search (49 results, page 2 of 3)

  • × theme_ss:"Automatisches Abstracting"
  • × year_i:[2000 TO 2010}
  1. Endres-Niggemeyer, B.: SimSum : an empirically founded simulation of summarizing (2000) 0.00
    0.0030970925 = product of:
      0.009291277 = sum of:
        0.009291277 = weight(_text_:a in 3343) [ClassicSimilarity], result of:
          0.009291277 = score(doc=3343,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17835285 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=3343)
      0.33333334 = coord(1/3)
    
    Type
    a
  2. Pinto, M.: Engineering the production of meta-information : the abstracting concern (2003) 0.00
    0.0030970925 = product of:
      0.009291277 = sum of:
        0.009291277 = weight(_text_:a in 4667) [ClassicSimilarity], result of:
          0.009291277 = score(doc=4667,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17835285 = fieldWeight in 4667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4667)
      0.33333334 = coord(1/3)
    
    Type
    a
  3. Saggion, H.; Lapalme, G.: Selective analysis for the automatic generation of summaries (2000) 0.00
    0.0030970925 = product of:
      0.009291277 = sum of:
        0.009291277 = weight(_text_:a in 132) [ClassicSimilarity], result of:
          0.009291277 = score(doc=132,freq=8.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17835285 = fieldWeight in 132, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=132)
      0.33333334 = coord(1/3)
    
    Abstract
    Selective Analysis is a new method for text summarization of technical articles whose design is based on the study of a corpus of professional abstracts and technical documents The method emphasizes the selection of particular types of information and its elaboration exploring the issue of dynamical summarization. A computer prototype was developed to demonstrate the viability of the approach and the automatic abstracts were evaluated using human informants. The results so far obtained indicate that the summaries are acceptable in content and text quality
    Type
    a
  4. Soricut, R.; Marcu, D.: Abstractive headline generation using WIDL-expressions (2007) 0.00
    0.0030970925 = product of:
      0.009291277 = sum of:
        0.009291277 = weight(_text_:a in 943) [ClassicSimilarity], result of:
          0.009291277 = score(doc=943,freq=8.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17835285 = fieldWeight in 943, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=943)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a new paradigm for the automatic creation of document headlines that is based on direct transformation of relevant textual information into well-formed textual output. Starting from an input document, we automatically create compact representations of weighted finite sets of strings, called WIDL-expressions, which encode the most important topics in the document. A generic natural language generation engine performs the headline generation task, driven by both statistical knowledge encapsulated in WIDL-expressions (representing topic biases induced by the input document) and statistical knowledge encapsulated in language models (representing biases induced by the target language). Our evaluation shows similar performance in quality with a state-of-the-art, extractive approach to headline generation, and significant improvements in quality over previously proposed solutions to abstractive headline generation.
    Type
    a
  5. Harabagiu, S.; Hickl, A.; Lacatusu, F.: Satisfying information needs with multi-document summaries (2007) 0.00
    0.003065327 = product of:
      0.009195981 = sum of:
        0.009195981 = weight(_text_:a in 939) [ClassicSimilarity], result of:
          0.009195981 = score(doc=939,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17652355 = fieldWeight in 939, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=939)
      0.33333334 = coord(1/3)
    
    Abstract
    Generating summaries that meet the information needs of a user relies on (1) several forms of question decomposition; (2) different summarization approaches; and (3) textual inference for combining the summarization strategies. This novel framework for summarization has the advantage of producing highly responsive summaries, as indicated by the evaluation results.
    Type
    a
  6. Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.00
    0.003065327 = product of:
      0.009195981 = sum of:
        0.009195981 = weight(_text_:a in 947) [ClassicSimilarity], result of:
          0.009195981 = score(doc=947,freq=24.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.17652355 = fieldWeight in 947, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=947)
      0.33333334 = coord(1/3)
    
    Abstract
    Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system-the Query, Cluster, Summarize (QCS) system-which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence "trimming" and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.
    Type
    a
  7. Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003) 0.00
    0.00296799 = product of:
      0.00890397 = sum of:
        0.00890397 = weight(_text_:a in 2115) [ClassicSimilarity], result of:
          0.00890397 = score(doc=2115,freq=10.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.1709182 = fieldWeight in 2115, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2115)
      0.33333334 = coord(1/3)
    
    Abstract
    This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
    Type
    a
  8. Liang, S.-F.; Devlin, S.; Tait, J.: Investigating sentence weighting components for automatic summarisation (2007) 0.00
    0.00296799 = product of:
      0.00890397 = sum of:
        0.00890397 = weight(_text_:a in 899) [ClassicSimilarity], result of:
          0.00890397 = score(doc=899,freq=10.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.1709182 = fieldWeight in 899, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=899)
      0.33333334 = coord(1/3)
    
    Abstract
    The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. It subsequently proved to be a reliable indicator for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
    Type
    a
  9. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.00
    0.0029264777 = product of:
      0.008779433 = sum of:
        0.008779433 = weight(_text_:a in 1719) [ClassicSimilarity], result of:
          0.008779433 = score(doc=1719,freq=14.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.1685276 = fieldWeight in 1719, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1719)
      0.33333334 = coord(1/3)
    
    Abstract
    Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals - mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents.
    Type
    a
  10. Wei, F.; Li, W.; Lu, Q.; He, Y.: Applying two-level reinforcement ranking in query-oriented multidocument summarization (2009) 0.00
    0.0027093915 = product of:
      0.008128175 = sum of:
        0.008128175 = weight(_text_:a in 3120) [ClassicSimilarity], result of:
          0.008128175 = score(doc=3120,freq=12.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.15602624 = fieldWeight in 3120, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3120)
      0.33333334 = coord(1/3)
    
    Abstract
    Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the newly emerging graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that links sentences together. In general, the existing PageRank-like algorithms can model well the phenomena that a sentence is important if it is linked by many other important sentences. Or they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multidocument summarization these algorithms often assemble a set of documents into one large file. The document dimension is totally ignored. In this article we present a framework to model the two-level mutual reinforcement among sentences as well as documents. Under this framework we design and develop a novel ranking algorithm such that the document reinforcement is taken into account in the process of sentence ranking. The convergence issue is examined. We also explore an interesting and important property of the proposed algorithm. When evaluated on the DUC 2005 and 2006 query-oriented multidocument summarization datasets, significant results are achieved.
    Type
    a
  11. Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009) 0.00
    0.002682161 = product of:
      0.008046483 = sum of:
        0.008046483 = weight(_text_:a in 2448) [ClassicSimilarity], result of:
          0.008046483 = score(doc=2448,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.1544581 = fieldWeight in 2448, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2448)
      0.33333334 = coord(1/3)
    
    Abstract
    In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.
    Type
    a
  12. Haag, M.: Automatic text summarization (2002) 0.00
    0.002654651 = product of:
      0.007963953 = sum of:
        0.007963953 = weight(_text_:a in 5662) [ClassicSimilarity], result of:
          0.007963953 = score(doc=5662,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.15287387 = fieldWeight in 5662, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=5662)
      0.33333334 = coord(1/3)
    
    Type
    a
  13. Over, P.; Dang, H.; Harman, D.: DUC in context (2007) 0.00
    0.0025028288 = product of:
      0.0075084865 = sum of:
        0.0075084865 = weight(_text_:a in 934) [ClassicSimilarity], result of:
          0.0075084865 = score(doc=934,freq=4.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.14413087 = fieldWeight in 934, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=934)
      0.33333334 = coord(1/3)
    
    Abstract
    Recent years have seen increased interest in text summarization with emphasis on evaluation of prototype systems. Many factors can affect the design of such evaluations, requiring choices among competing alternatives. This paper examines several major themes running through three evaluations: SUMMAC, NTCIR, and DUC, with a concentration on DUC. The themes are extrinsic and intrinsic evaluation, evaluation procedures and methods, generic versus focused summaries, single- and multi-document summaries, length and compression issues, extracts versus abstracts, and issues with genre.
    Type
    a
  14. Reeve, L.H.; Han, H.; Brooks, A.D.: ¬The use of domain-specific concepts in biomedical text summarization (2007) 0.00
    0.002473325 = product of:
      0.0074199745 = sum of:
        0.0074199745 = weight(_text_:a in 955) [ClassicSimilarity], result of:
          0.0074199745 = score(doc=955,freq=10.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.14243183 = fieldWeight in 955, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=955)
      0.33333334 = coord(1/3)
    
    Abstract
    Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician's evaluation of three randomly-selected papers from an evaluation corpus to show that the author's abstract does not always reflect the entire contents of the full-text.
    Type
    a
  15. Craven, T.C.: Abstracts produced using computer assistance (2000) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 4809) [ClassicSimilarity], result of:
          0.006896985 = score(doc=4809,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 4809, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4809)
      0.33333334 = coord(1/3)
    
    Abstract
    Experimental subjects wrote abstracts using a simplified version of the TEXNET abstracting assistance software. In addition to the full text, subjects were presented with either keywords or phrases extracted automatically. The resulting abstracts, and the times taken, were recorded automatically; some additional information was gathered by oral questionnaire. Selected abstracts produced were evaluated on various criteria by independent raters. Results showed considerable variation among subjects, but 37% found the keywords or phrases 'quite' or 'very' useful in writing their abstracts. Statistical analysis failed to support several hypothesized relations: phrases were not viewed as significantly more helpful than keywords; and abstracting experience did not correlate with originality of wording, approximation of the author abstract, or greater conciseness. Requiring further study are some unanticipated strong correlations including the following: Windows experience and writing an abstract like the author's; experience reading abstracts and thinking one had written a good abstract; gender and abstract length; gender and use of words and phrases from the original text. Results have also suggested possible modifications to the TEXNET software
    Type
    a
  16. Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 1965) [ClassicSimilarity], result of:
          0.006896985 = score(doc=1965,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 1965, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1965)
      0.33333334 = coord(1/3)
    
    Abstract
    We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as weIl as in the documents based an co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as weIl as retrieving English documents from Chinese queries. Extensive experiments have been conducted an a large-scale parallel corpus enabling studies an retrieval performance for two different cross-lingual settings of full-length documents as weIl as automated summaries.
    Type
    a
  17. Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 942) [ClassicSimilarity], result of:
          0.006896985 = score(doc=942,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 942, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.33333334 = coord(1/3)
    
    Abstract
    The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
    Type
    a
  18. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.00
    0.0022989952 = product of:
      0.006896985 = sum of:
        0.006896985 = weight(_text_:a in 953) [ClassicSimilarity], result of:
          0.006896985 = score(doc=953,freq=6.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.13239266 = fieldWeight in 953, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
      0.33333334 = coord(1/3)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
    Type
    a
  19. Hahn, U.: ¬Die Verdichtung textuellen Wissens zu Information : vom Wandel methodischer Paradigmen beim automatischen Abstracting (2004) 0.00
    0.002212209 = product of:
      0.0066366266 = sum of:
        0.0066366266 = weight(_text_:a in 4667) [ClassicSimilarity], result of:
          0.0066366266 = score(doc=4667,freq=2.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.12739488 = fieldWeight in 4667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=4667)
      0.33333334 = coord(1/3)
    
    Type
    a
  20. Haag, M.: Automatic text summarization : Evaluation des Copernic Summarizer und mögliche Einsatzfelder in der Fachinformation der DaimlerCrysler AG (2002) 0.00
    0.0018771215 = product of:
      0.0056313644 = sum of:
        0.0056313644 = weight(_text_:a in 649) [ClassicSimilarity], result of:
          0.0056313644 = score(doc=649,freq=4.0), product of:
            0.05209492 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045180224 = queryNorm
            0.10809815 = fieldWeight in 649, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=649)
      0.33333334 = coord(1/3)
    
    Abstract
    An evaluation of the Copernic Summarizer, a software for automatically summarizing text in various data formats, is being presented. It shall be assessed if and how the Copernic Summarizer can reasonably be used in the DaimlerChrysler Information Division in order to enhance the quality of its information services. First, an introduction into Automatic Text Summarization is given and the Copernic Summarizer is being presented. Various methods for evaluating Automatic Text Summarization systems and software ergonomics are presented. Two evaluation forms are developed with which the employees of the Information Division shall evaluate the quality and relevance of the extracted keywords and summaries as well as the software's usability. The quality and relevance assessment is done by comparing the original text to the summaries. Finally, a recommendation is given concerning the use of the Copernic Summarizer.

Languages

  • e 40
  • d 9

Types

  • a 47
  • m 1
  • x 1
  • More… Less…