Search (51 results, page 1 of 3)

  • × theme_ss:"Automatisches Abstracting"
  1. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.02
    0.021556493 = product of:
      0.07544772 = sum of:
        0.054281104 = weight(_text_:based in 6751) [ClassicSimilarity], result of:
          0.054281104 = score(doc=6751,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.4612686 = fieldWeight in 6751, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=6751)
        0.021166623 = product of:
          0.042333245 = sum of:
            0.042333245 = weight(_text_:22 in 6751) [ClassicSimilarity], result of:
              0.042333245 = score(doc=6751,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.30952093 = fieldWeight in 6751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6751)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Presents a system for summarizing quantitative data in natural language, focusing on the use of a corpus of basketball game summaries, drawn from online news services, to empirically shape the system design and to evaluate the approach. Initial corpus analysis revealed characteristics of textual summaries that challenge the capabilities of current language generation systems. A revision based corpus analysis was used to identify and encode the revision rules of the system. Presents a quantitative evaluation, using several test corpora, to measure the robustness of the new revision based model
    Date
    6. 3.1997 16:22:15
  2. Kim, H.H.; Kim, Y.H.: Generic speech summarization of transcribed lecture videos : using tags and their semantic relations (2016) 0.01
    0.01497233 = product of:
      0.052403152 = sum of:
        0.039174013 = weight(_text_:based in 2640) [ClassicSimilarity], result of:
          0.039174013 = score(doc=2640,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 2640, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2640)
        0.013229139 = product of:
          0.026458278 = sum of:
            0.026458278 = weight(_text_:22 in 2640) [ClassicSimilarity], result of:
              0.026458278 = score(doc=2640,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.19345059 = fieldWeight in 2640, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2640)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer- and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.
    Date
    22. 1.2016 12:29:41
  3. Ou, S.; Khoo, C.S.G.; Goh, D.H.: Multi-document summarization of news articles using an event-based framework (2006) 0.01
    0.010469696 = product of:
      0.07328787 = sum of:
        0.07328787 = weight(_text_:based in 657) [ClassicSimilarity], result of:
          0.07328787 = score(doc=657,freq=28.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.6227838 = fieldWeight in 657, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=657)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - The purpose of this research is to develop a method for automatic construction of multi-document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query. Design/methodology/approach - Based on the cross-document discourse analysis, an event-based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree-view interface was implemented for displaying a multi-document summary based on the framework. A preliminary user evaluation was performed by comparing the framework-based summaries against the sentence-based summaries. Findings - In a small evaluation, all the human subjects preferred the framework-based summaries to the sentence-based summaries. It indicates that the event-based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events. Research limitations/implications - Limited to event-based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event-based framework is being implemented. Practical implications - Multi-document summarization of news articles can adopt the proposed event-based framework. Originality/value - An event-based framework for summarizing sets of news articles was developed and evaluated using a tree-view interface for displaying such summaries.
  4. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.01
    0.009376042 = product of:
      0.032816146 = sum of:
        0.019587006 = weight(_text_:based in 1012) [ClassicSimilarity], result of:
          0.019587006 = score(doc=1012,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.16644597 = fieldWeight in 1012, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
        0.013229139 = product of:
          0.026458278 = sum of:
            0.026458278 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
              0.026458278 = score(doc=1012,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.19345059 = fieldWeight in 1012, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1012)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
    Date
    22. 6.2023 14:55:20
  5. Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.01
    0.007403193 = product of:
      0.05182235 = sum of:
        0.05182235 = weight(_text_:based in 522) [ClassicSimilarity], result of:
          0.05182235 = score(doc=522,freq=14.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.44037464 = fieldWeight in 522, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=522)
      0.14285715 = coord(1/7)
    
    Abstract
    The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method-with or without the use of a taxonomy-were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
  6. Xiong, S.; Ji, D.: Query-focused multi-document summarization using hypergraph-based ranking (2016) 0.01
    0.006785138 = product of:
      0.047495965 = sum of:
        0.047495965 = weight(_text_:based in 2972) [ClassicSimilarity], result of:
          0.047495965 = score(doc=2972,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.40361002 = fieldWeight in 2972, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2972)
      0.14285715 = coord(1/7)
    
    Abstract
    General graph random walk has been successfully applied in multi-document summarization, but it has some limitations to process documents by this way. In this paper, we propose a novel hypergraph based vertex-reinforced random walk framework for multi-document summarization. The framework first exploits the Hierarchical Dirichlet Process (HDP) topic model to learn a word-topic probability distribution in sentences. Then the hypergraph is used to capture both cluster relationship based on the word-topic probability distribution and pairwise similarity among sentences. Finally, a time-variant random walk algorithm for hypergraphs is developed to rank sentences which ensures sentence diversity by vertex-reinforcement in summaries. Experimental results on the public available dataset demonstrate the effectiveness of our framework.
  7. Hobson, S.P.; Dorr, B.J.; Monz, C.; Schwartz, R.: Task-based evaluation of text summarization using Relevance Prediction (2007) 0.01
    0.0067155454 = product of:
      0.047008816 = sum of:
        0.047008816 = weight(_text_:based in 938) [ClassicSimilarity], result of:
          0.047008816 = score(doc=938,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.39947033 = fieldWeight in 938, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=938)
      0.14285715 = coord(1/7)
    
    Abstract
    This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user - not an independent user - decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate - as a proof-of-concept methodology for automatic metric developers - that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.
  8. Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 0.01
    0.005815833 = product of:
      0.04071083 = sum of:
        0.04071083 = weight(_text_:based in 1965) [ClassicSimilarity], result of:
          0.04071083 = score(doc=1965,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.34595144 = fieldWeight in 1965, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1965)
      0.14285715 = coord(1/7)
    
    Abstract
    We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as weIl as in the documents based an co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as weIl as retrieving English documents from Chinese queries. Extensive experiments have been conducted an a large-scale parallel corpus enabling studies an retrieval performance for two different cross-lingual settings of full-length documents as weIl as automated summaries.
  9. Ye, S.; Chua, T.-S.; Kan, M.-Y.; Qiu, L.: Document concept lattice for text understanding and summarization (2007) 0.01
    0.005815833 = product of:
      0.04071083 = sum of:
        0.04071083 = weight(_text_:based in 941) [ClassicSimilarity], result of:
          0.04071083 = score(doc=941,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.34595144 = fieldWeight in 941, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
      0.14285715 = coord(1/7)
    
    Abstract
    We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or concrete entity or its action often expressed by diverse terms in text. Summary generation can thus be considered as an optimization problem of selecting a set of sentences with minimal answer loss. In this paper, we propose a document concept lattice that indexes the hierarchy of local topics tied to a set of frequent concepts and the corresponding sentences containing these topics. The local topics will specify the promising sub-spaces related to the selected concepts and sentences. Based on this lattice, the summary is an optimized selection of a set of distinct and salient local topics that lead to maximal coverage of concepts with the given number of sentences. Our summarizer based on the concept lattice has demonstrated competitive performance in Document Understanding Conference 2005 and 2006 evaluations as well as follow-on tests.
  10. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.01
    0.005815833 = product of:
      0.04071083 = sum of:
        0.04071083 = weight(_text_:based in 953) [ClassicSimilarity], result of:
          0.04071083 = score(doc=953,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.34595144 = fieldWeight in 953, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
      0.14285715 = coord(1/7)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
  11. Kim, H.H.; Kim, Y.H.: ERP/MMR algorithm for classifying topic-relevant and topic-irrelevant visual shots of documentary videos (2019) 0.01
    0.005596288 = product of:
      0.039174013 = sum of:
        0.039174013 = weight(_text_:based in 5358) [ClassicSimilarity], result of:
          0.039174013 = score(doc=5358,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 5358, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5358)
      0.14285715 = coord(1/7)
    
    Abstract
    We propose and evaluate a video summarization method based on a topic relevance model, a maximal marginal relevance (MMR), and discriminant analysis to generate a semantically meaningful video skim. The topic relevance model uses event-related potential (ERP) components to describe the process of topic relevance judgment. More specifically, the topic relevance model indicates that N400 and P600, which have been successfully applied to the mismatch process of a stimulus and the discourse-internal reorganization and integration process of a stimulus, respectively, are used for the topic mismatch process of a topic-irrelevant video shot and the topic formation process of a topic-relevant video shot. To evaluate our proposed ERP/MMR-based method, we compared the video skims generated by the ERP/MMR-based, ERP-based, and shot boundary detection (SBD) methods with ground truth skims. The results showed that at a significance level of 0.05, the ROUGE-1 scores of the ERP/MMR method are statistically higher than those of the SBD method, and the diversity scores of the ERP/MMR method are statistically higher than those of the ERP method. This study suggested that the proposed method may be applied to the construction of a video skim without operational intervention, such as the insertion of a black screen between video shots.
  12. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.00
    0.0048465277 = product of:
      0.033925693 = sum of:
        0.033925693 = weight(_text_:based in 3459) [ClassicSimilarity], result of:
          0.033925693 = score(doc=3459,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28829288 = fieldWeight in 3459, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3459)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set.
  13. Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.00
    0.0048465277 = product of:
      0.033925693 = sum of:
        0.033925693 = weight(_text_:based in 4770) [ClassicSimilarity], result of:
          0.033925693 = score(doc=4770,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28829288 = fieldWeight in 4770, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4770)
      0.14285715 = coord(1/7)
    
    Abstract
    Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept- and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept- and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.
  14. Galgani, F.; Compton, P.; Hoffmann, A.: Summarization based on bi-directional citation analysis (2015) 0.00
    0.0048465277 = product of:
      0.033925693 = sum of:
        0.033925693 = weight(_text_:based in 2685) [ClassicSimilarity], result of:
          0.033925693 = score(doc=2685,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28829288 = fieldWeight in 2685, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2685)
      0.14285715 = coord(1/7)
    
    Abstract
    Automatic document summarization using citations is based on summarizing what others explicitly say about the document, by extracting a summary from text around the citations (citances). While this technique works quite well for summarizing the impact of scientific articles, other genres of documents as well as other types of summaries require different approaches. In this paper, we introduce a new family of methods that we developed for legal documents summarization to generate catchphrases for legal cases (where catchphrases are a form of legal summary). Our methods use both incoming and outgoing citations, and we show how citances can be combined with other elements of cited and citing documents, including the full text of the target document, and catchphrases of cited and citing cases. On a legal summarization corpus, our methods outperform competitive baselines. The combination of full text sentences and catchphrases from cited and citing cases is particularly successful. We also apply and evaluate the methods on scientific paper summarization, where they perform at the level of state-of-the-art techniques. Our family of citation-based summarization methods is powerful and flexible enough to target successfully a range of different domains and summarization tasks.
  15. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.00
    0.0048465277 = product of:
      0.033925693 = sum of:
        0.033925693 = weight(_text_:based in 2693) [ClassicSimilarity], result of:
          0.033925693 = score(doc=2693,freq=6.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28829288 = fieldWeight in 2693, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2693)
      0.14285715 = coord(1/7)
    
    Abstract
    Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.
  16. Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007) 0.00
    0.0047486075 = product of:
      0.03324025 = sum of:
        0.03324025 = weight(_text_:based in 942) [ClassicSimilarity], result of:
          0.03324025 = score(doc=942,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28246817 = fieldWeight in 942, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.14285715 = coord(1/7)
    
    Abstract
    The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
  17. Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.00
    0.0047486075 = product of:
      0.03324025 = sum of:
        0.03324025 = weight(_text_:based in 2832) [ClassicSimilarity], result of:
          0.03324025 = score(doc=2832,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.28246817 = fieldWeight in 2832, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2832)
      0.14285715 = coord(1/7)
    
    Abstract
    Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
  18. Brandow, R.; Mitze, K.; Rau, L.F.: Automatic condensation of electronic publications by sentence selection (1995) 0.00
    0.00447703 = product of:
      0.03133921 = sum of:
        0.03133921 = weight(_text_:based in 2929) [ClassicSimilarity], result of:
          0.03133921 = score(doc=2929,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.26631355 = fieldWeight in 2929, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=2929)
      0.14285715 = coord(1/7)
    
    Abstract
    Description of a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portions of the texts (the löead), up to the target length of the summaries. 3 lengths of articles were evaluated for 250 documents by both systems, totalling 1.500 suitability judgements in all. The lead-based summaries outperformed the 'intelligent' summaries significantly, achieving acceptability ratings of over 90%, compared to 74,7%
  19. Ahmad, K.: Text summarisation : the role of lexical cohesion analysis (1995) 0.00
    0.00447703 = product of:
      0.03133921 = sum of:
        0.03133921 = weight(_text_:based in 5795) [ClassicSimilarity], result of:
          0.03133921 = score(doc=5795,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.26631355 = fieldWeight in 5795, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=5795)
      0.14285715 = coord(1/7)
    
    Abstract
    The work in automatic text summary focuses mainly on computational models of texts. The artificial intelligence related work in text summary deals mainly with narrative texts such as newspaper reports and stories. Presents a study on the summary of non-narrative texts such as those in scientific and technical communication. Discusses syntactic cohesion; lexical cohesion; complex lexical repetition; simple and complex paraphrase; bonds and links; and Tele-pattan; an architecture for cohesion based text analysis and summarisation system working on SGML
  20. Xianghao, G.; Yixin, Z.; Li, Y.: ¬A new method of news test understanding and abstracting based on speech acts theory (1998) 0.00
    0.00447703 = product of:
      0.03133921 = sum of:
        0.03133921 = weight(_text_:based in 3532) [ClassicSimilarity], result of:
          0.03133921 = score(doc=3532,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.26631355 = fieldWeight in 3532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=3532)
      0.14285715 = coord(1/7)
    

Years

Languages

  • e 50
  • chi 1
  • More… Less…

Types

  • a 51
  • el 1
  • More… Less…