Search (26 results, page 1 of 2)

  • × theme_ss:"Automatisches Abstracting"
  1. Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.03
    0.028386448 = product of:
      0.11354579 = sum of:
        0.11354579 = sum of:
          0.064563714 = weight(_text_:methods in 6974) [ClassicSimilarity], result of:
            0.064563714 = score(doc=6974,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.35535768 = fieldWeight in 6974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0625 = fieldNorm(doc=6974)
          0.048982073 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
            0.048982073 = score(doc=6974,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.30952093 = fieldWeight in 6974, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=6974)
      0.25 = coord(1/4)
    
    Abstract
    Describes the application of weighting strategies to model uncertainties and probabilities in automatic abstracting systems, particularly in the concept selection phase. The weights were originally assigned in an ad hoc manner and were then refined by manual analysis of the results. The new method attempts to derive a more systematic methods and performs this using a genetic algorithm
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  2. Oh, H.; Nam, S.; Zhu, Y.: Structured abstract summarization of scientific articles : summarization using full-text section information (2023) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 889) [ClassicSimilarity], result of:
            0.040352322 = score(doc=889,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 889, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=889)
          0.030613795 = weight(_text_:22 in 889) [ClassicSimilarity], result of:
            0.030613795 = score(doc=889,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 889, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=889)
      0.25 = coord(1/4)
    
    Abstract
    The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text using a flat, unstructured abstract. However, the structured format of scientific articles and characteristics of each section have not been fully explored, despite their importance. The lack of a sufficient investigation and discussion of various characteristics for each section and their influence on summarization results has hindered the practical use of automatic summarization for scientific articles. To provide a balanced abstract proportionally emphasizing each section of a scientific article, the community introduced the structured abstract, an abstract with distinct, labeled sections. Using this information, in this study, we aim to understand tasks ranging from data preparation to model evaluation from diverse viewpoints. Specifically, we provide a preprocessed large-scale dataset and propose a summarization method applying the introduction, methods, results, and discussion (IMRaD) format reflecting the characteristics of each section. We also discuss the objective benchmarks and perspectives of state-of-the-art algorithms and present the challenges and research directions in this area.
    Date
    22. 1.2023 18:57:12
  3. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.01
    0.013345277 = product of:
      0.053381108 = sum of:
        0.053381108 = product of:
          0.106762215 = sum of:
            0.106762215 = weight(_text_:methods in 946) [ClassicSimilarity], result of:
              0.106762215 = score(doc=946,freq=14.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.5876176 = fieldWeight in 946, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=946)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the literature. In this paper, we present a study of methods for automatically generating gene summaries from biomedical literature. Unlike most existing work on automatic text summarization, in which the generated summary is often a list of extracted sentences, we propose to generate a semi-structured summary which consists of sentences covering specific semantic aspects of a gene. Such a semi-structured summary is more appropriate for describing genes and poses special challenges for automatic text summarization. We propose a two-stage approach to generate such a summary for a given gene - first retrieving articles about a gene and then extracting sentences for each specified semantic aspect. We address the issue of gene name variation in the first stage and propose several different methods for sentence extraction in the second stage. We evaluate the proposed methods using a test set with 20 genes. Experiment results show that the proposed methods can generate useful semi-structured gene summaries automatically from biomedical literature, and our proposed methods outperform general purpose summarization methods. Among all the proposed methods for sentence extraction, a probabilistic language modeling approach that models gene context performs the best.
  4. Galgani, F.; Compton, P.; Hoffmann, A.: Summarization based on bi-directional citation analysis (2015) 0.01
    0.011278818 = product of:
      0.045115273 = sum of:
        0.045115273 = product of:
          0.09023055 = sum of:
            0.09023055 = weight(_text_:methods in 2685) [ClassicSimilarity], result of:
              0.09023055 = score(doc=2685,freq=10.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4966275 = fieldWeight in 2685, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2685)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Automatic document summarization using citations is based on summarizing what others explicitly say about the document, by extracting a summary from text around the citations (citances). While this technique works quite well for summarizing the impact of scientific articles, other genres of documents as well as other types of summaries require different approaches. In this paper, we introduce a new family of methods that we developed for legal documents summarization to generate catchphrases for legal cases (where catchphrases are a form of legal summary). Our methods use both incoming and outgoing citations, and we show how citances can be combined with other elements of cited and citing documents, including the full text of the target document, and catchphrases of cited and citing cases. On a legal summarization corpus, our methods outperform competitive baselines. The combination of full text sentences and catchphrases from cited and citing cases is particularly successful. We also apply and evaluate the methods on scientific paper summarization, where they perform at the level of state-of-the-art techniques. Our family of citation-based summarization methods is powerful and flexible enough to target successfully a range of different domains and summarization tasks.
  5. Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007) 0.01
    0.010483841 = product of:
      0.041935366 = sum of:
        0.041935366 = product of:
          0.08387073 = sum of:
            0.08387073 = weight(_text_:methods in 942) [ClassicSimilarity], result of:
              0.08387073 = score(doc=942,freq=6.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4616232 = fieldWeight in 942, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=942)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
  6. Su, H.: Automatic abstracting (1996) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 150) [ClassicSimilarity], result of:
              0.080704644 = score(doc=150,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 150, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.078125 = fieldNorm(doc=150)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Presents an introductory overview of research into the automatic construction of abstracts from the texts of documents. Discusses the origin and definition of automatic abstracting; reasons for using automatic abstracting; methods of automatic abstracting; and evaluation problems
  7. Reeve, L.H.; Han, H.; Brooks, A.D.: ¬The use of domain-specific concepts in biomedical text summarization (2007) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 955) [ClassicSimilarity], result of:
              0.080704644 = score(doc=955,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 955, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=955)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician's evaluation of three randomly-selected papers from an evaluation corpus to show that the author's abstract does not always reflect the entire contents of the full-text.
  8. Salton, G.: Automatic text structuring and summarization (1997) 0.01
    0.009986691 = product of:
      0.039946765 = sum of:
        0.039946765 = product of:
          0.07989353 = sum of:
            0.07989353 = weight(_text_:methods in 145) [ClassicSimilarity], result of:
              0.07989353 = score(doc=145,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.43973273 = fieldWeight in 145, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=145)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Applies the ideas from the automatic link generation research to automatic text summarisation. Using techniques for inter-document link generation, generates intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, characterises the structure of the text. Applies the knowledge of text structure to do automatic text summarisation by passage extraction. Evaluates a set of 50 summaries generated using these techniques by comparing the to paragraph extracts constructed by humans. The automatic summarisation methods perform well, especially in view of the fact that the summaries generates by 2 humans for the same article are surprisingly dissimilar
    Footnote
    Contribution to a special issue on methods and tools for the automatic construction of hypertext
  9. Craven, T.C.: ¬A computer-aided abstracting tool kit (1993) 0.01
    0.008070464 = product of:
      0.032281857 = sum of:
        0.032281857 = product of:
          0.064563714 = sum of:
            0.064563714 = weight(_text_:methods in 6506) [ClassicSimilarity], result of:
              0.064563714 = score(doc=6506,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.35535768 = fieldWeight in 6506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6506)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Describes the abstracting assistance features being prototyped in the TEXNET text network management system. Sentence weighting methods include: weithing negatively or positively on the stems in a selected passage; weighting on general lists of cue words, adjusting weights of selected segments; and weighting of occurrence of frequent stems. The user may adjust a number of parameters: the minimum strength of extracts; the threshold for frequent word/stems and the amount sentence weight is to be adjusted for each weighting type
  10. Over, P.; Dang, H.; Harman, D.: DUC in context (2007) 0.01
    0.008070464 = product of:
      0.032281857 = sum of:
        0.032281857 = product of:
          0.064563714 = sum of:
            0.064563714 = weight(_text_:methods in 934) [ClassicSimilarity], result of:
              0.064563714 = score(doc=934,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.35535768 = fieldWeight in 934, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=934)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Recent years have seen increased interest in text summarization with emphasis on evaluation of prototype systems. Many factors can affect the design of such evaluations, requiring choices among competing alternatives. This paper examines several major themes running through three evaluations: SUMMAC, NTCIR, and DUC, with a concentration on DUC. The themes are extrinsic and intrinsic evaluation, evaluation procedures and methods, generic versus focused summaries, single- and multi-document summaries, length and compression issues, extracts versus abstracts, and issues with genre.
  11. Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009) 0.01
    0.0070616566 = product of:
      0.028246626 = sum of:
        0.028246626 = product of:
          0.056493253 = sum of:
            0.056493253 = weight(_text_:methods in 2448) [ClassicSimilarity], result of:
              0.056493253 = score(doc=2448,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.31093797 = fieldWeight in 2448, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2448)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.
  12. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.01
    0.006122759 = product of:
      0.024491036 = sum of:
        0.024491036 = product of:
          0.048982073 = sum of:
            0.048982073 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
              0.048982073 = score(doc=6599,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.30952093 = fieldWeight in 6599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6599)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    26. 2.1997 10:22:43
  13. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.01
    0.006122759 = product of:
      0.024491036 = sum of:
        0.024491036 = product of:
          0.048982073 = sum of:
            0.048982073 = weight(_text_:22 in 6751) [ClassicSimilarity], result of:
              0.048982073 = score(doc=6751,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.30952093 = fieldWeight in 6751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6751)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    6. 3.1997 16:22:15
  14. Haag, M.: Automatic text summarization : Evaluation des Copernic Summarizer und mögliche Einsatzfelder in der Fachinformation der DaimlerCrysler AG (2002) 0.01
    0.0060528484 = product of:
      0.024211394 = sum of:
        0.024211394 = product of:
          0.048422787 = sum of:
            0.048422787 = weight(_text_:methods in 649) [ClassicSimilarity], result of:
              0.048422787 = score(doc=649,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.26651827 = fieldWeight in 649, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=649)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    An evaluation of the Copernic Summarizer, a software for automatically summarizing text in various data formats, is being presented. It shall be assessed if and how the Copernic Summarizer can reasonably be used in the DaimlerChrysler Information Division in order to enhance the quality of its information services. First, an introduction into Automatic Text Summarization is given and the Copernic Summarizer is being presented. Various methods for evaluating Automatic Text Summarization systems and software ergonomics are presented. Two evaluation forms are developed with which the employees of the Information Division shall evaluate the quality and relevance of the extracted keywords and summaries as well as the software's usability. The quality and relevance assessment is done by comparing the original text to the summaries. Finally, a recommendation is given concerning the use of the Copernic Summarizer.
  15. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.01
    0.0060528484 = product of:
      0.024211394 = sum of:
        0.024211394 = product of:
          0.048422787 = sum of:
            0.048422787 = weight(_text_:methods in 953) [ClassicSimilarity], result of:
              0.048422787 = score(doc=953,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.26651827 = fieldWeight in 953, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=953)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
  16. Abdi, A.; Shamsuddin, S.M.; Aliguliyev, R.M.: QMOS: Query-based multi-documents opinion-oriented summarization (2018) 0.01
    0.0057066805 = product of:
      0.022826722 = sum of:
        0.022826722 = product of:
          0.045653444 = sum of:
            0.045653444 = weight(_text_:methods in 5089) [ClassicSimilarity], result of:
              0.045653444 = score(doc=5089,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.25127584 = fieldWeight in 5089, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5089)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users' needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.
  17. Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.01
    0.0050440403 = product of:
      0.020176161 = sum of:
        0.020176161 = product of:
          0.040352322 = sum of:
            0.040352322 = weight(_text_:methods in 522) [ClassicSimilarity], result of:
              0.040352322 = score(doc=522,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.22209854 = fieldWeight in 522, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=522)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method-with or without the use of a taxonomy-were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
  18. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.01
    0.0050440403 = product of:
      0.020176161 = sum of:
        0.020176161 = product of:
          0.040352322 = sum of:
            0.040352322 = weight(_text_:methods in 2693) [ClassicSimilarity], result of:
              0.040352322 = score(doc=2693,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.22209854 = fieldWeight in 2693, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2693)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.
  19. Atanassova, I.; Bertin, M.; Larivière, V.: On the composition of scientific abstracts (2016) 0.01
    0.0050440403 = product of:
      0.020176161 = sum of:
        0.020176161 = product of:
          0.040352322 = sum of:
            0.040352322 = weight(_text_:methods in 3028) [ClassicSimilarity], result of:
              0.040352322 = score(doc=3028,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.22209854 = fieldWeight in 3028, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3028)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - Scientific abstracts reproduce only part of the information and the complexity of argumentation in a scientific article. The purpose of this paper provides a first analysis of the similarity between the text of scientific abstracts and the body of articles, using sentences as the basic textual unit. It contributes to the understanding of the structure of abstracts. Design/methodology/approach - Using sentence-based similarity metrics, the authors quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the introduction, methods, results and discussion structure, using a corpus of over 85,000 research articles published in the seven Public Library of Science journals. Findings - The authors provide evidence that 84 percent of abstract have at least one sentence in common with the body of the paper. Studying the distributions of sentences in the body of the articles that are re-used in abstracts, the authors show that there exists a strong relation between the rhetorical structure of articles and the zones that authors re-use when writing abstracts, with sentences mainly coming from the beginning of the introduction and the end of the conclusion. Originality/value - Scientific abstracts contain what is considered by the author(s) as information that best describe documents' content. This is a first study that examines the relation between the contents of abstracts and the rhetorical structure of scientific articles. The work might provide new insight for improving automatic abstracting tools as well as information retrieval approaches, in which text organization and structure are important features.
  20. Finegan-Dollak, C.; Radev, D.R.: Sentence simplification, compression, and disaggregation for summarization of sophisticated documents (2016) 0.01
    0.0050440403 = product of:
      0.020176161 = sum of:
        0.020176161 = product of:
          0.040352322 = sum of:
            0.040352322 = weight(_text_:methods in 3122) [ClassicSimilarity], result of:
              0.040352322 = score(doc=3122,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.22209854 = fieldWeight in 3122, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3122)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences-potentially adding hundreds of unnecessary words to the summary-or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.