Search (96 results, page 1 of 5)

Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.04

0.035879306 = product of:
  0.07175861 = sum of:
    0.016202414 = weight(_text_:information in 1012) [ClassicSimilarity], result of:
      0.016202414 = score(doc=1012,freq=8.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.19395474 = fieldWeight in 1012, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1012)
    0.055556197 = sum of:
      0.02331961 = weight(_text_:technology in 1012) [ClassicSimilarity], result of:
        0.02331961 = score(doc=1012,freq=2.0), product of:
          0.1417311 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.047586527 = queryNorm
          0.16453418 = fieldWeight in 1012, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
      0.032236587 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
        0.032236587 = score(doc=1012,freq=2.0), product of:
          0.16663991 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047586527 = queryNorm
          0.19345059 = fieldWeight in 1012, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
  0.5 = coord(2/4)

Abstract: With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
Date: 22. 6.2023 14:55:20
Source: Journal of the Association for Information Science and Technology. 74(2023) no.7, S.759-774

Oh, H.; Nam, S.; Zhu, Y.: Structured abstract summarization of scientific articles : summarization using full-text section information (2023) 0.03
```
0.03479395 = product of:
  0.0695879 = sum of:
    0.0140317045 = weight(_text_:information in 889) [ClassicSimilarity], result of:
      0.0140317045 = score(doc=889,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16796975 = fieldWeight in 889, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=889)
    0.055556197 = sum of:
      0.02331961 = weight(_text_:technology in 889) [ClassicSimilarity], result of:
        0.02331961 = score(doc=889,freq=2.0), product of:
          0.1417311 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.047586527 = queryNorm
          0.16453418 = fieldWeight in 889, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0390625 = fieldNorm(doc=889)
      0.032236587 = weight(_text_:22 in 889) [ClassicSimilarity], result of:
        0.032236587 = score(doc=889,freq=2.0), product of:
          0.16663991 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047586527 = queryNorm
          0.19345059 = fieldWeight in 889, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=889)
  0.5 = coord(2/4)
```
Abstract

The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text using a flat, unstructured abstract. However, the structured format of scientific articles and characteristics of each section have not been fully explored, despite their importance. The lack of a sufficient investigation and discussion of various characteristics for each section and their influence on summarization results has hindered the practical use of automatic summarization for scientific articles. To provide a balanced abstract proportionally emphasizing each section of a scientific article, the community introduced the structured abstract, an abstract with distinct, labeled sections. Using this information, in this study, we aim to understand tasks ranging from data preparation to model evaluation from diverse viewpoints. Specifically, we provide a preprocessed large-scale dataset and propose a summarization method applying the introduction, methods, results, and discussion (IMRaD) format reflecting the characteristics of each section. We also discuss the objective benchmarks and perspectives of state-of-the-art algorithms and present the challenges and research directions in this area.

Date

22. 1.2023 18:57:12

Source

Journal of the Association for Information Science and Technology. 74(2023) no.2, S.234-248

Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.03

0.0318287 = product of:
  0.0636574 = sum of:
    0.008101207 = weight(_text_:information in 5290) [ClassicSimilarity], result of:
      0.008101207 = score(doc=5290,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.09697737 = fieldWeight in 5290, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5290)
    0.055556197 = sum of:
      0.02331961 = weight(_text_:technology in 5290) [ClassicSimilarity], result of:
        0.02331961 = score(doc=5290,freq=2.0), product of:
          0.1417311 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.047586527 = queryNorm
          0.16453418 = fieldWeight in 5290, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5290)
      0.032236587 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
        0.032236587 = score(doc=5290,freq=2.0), product of:
          0.16663991 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047586527 = queryNorm
          0.19345059 = fieldWeight in 5290, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5290)
  0.5 = coord(2/4)

Date: 22. 7.2006 17:25:48
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.740-752

Kim, H.H.; Kim, Y.H.: Generic speech summarization of transcribed lecture videos : using tags and their semantic relations (2016) 0.03

0.0318287 = product of:
  0.0636574 = sum of:
    0.008101207 = weight(_text_:information in 2640) [ClassicSimilarity], result of:
      0.008101207 = score(doc=2640,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.09697737 = fieldWeight in 2640, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2640)
    0.055556197 = sum of:
      0.02331961 = weight(_text_:technology in 2640) [ClassicSimilarity], result of:
        0.02331961 = score(doc=2640,freq=2.0), product of:
          0.1417311 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.047586527 = queryNorm
          0.16453418 = fieldWeight in 2640, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2640)
      0.032236587 = weight(_text_:22 in 2640) [ClassicSimilarity], result of:
        0.032236587 = score(doc=2640,freq=2.0), product of:
          0.16663991 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047586527 = queryNorm
          0.19345059 = fieldWeight in 2640, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2640)
  0.5 = coord(2/4)

Date: 22. 1.2016 12:29:41
Source: Journal of the Association for Information Science and Technology. 67(2016) no.2, S.366-379

Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.03

0.025856568 = product of:
  0.051713135 = sum of:
    0.025923865 = weight(_text_:information in 6599) [ClassicSimilarity], result of:
      0.025923865 = score(doc=6599,freq=8.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.3103276 = fieldWeight in 6599, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=6599)
    0.02578927 = product of:
      0.05157854 = sum of:
        0.05157854 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
          0.05157854 = score(doc=6599,freq=2.0), product of:
            0.16663991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047586527 = queryNorm
            0.30952093 = fieldWeight in 6599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6599)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: With the onset of the information explosion arising from digital libraries and access to a wealth of information through the Internet, the need to efficiently determine the relevance of a document becomes even more urgent. Describes a text extraction system (TES), which retrieves a set of sentences from a document to form an indicative abstract. Such an automated process enables information to be filtered more quickly. Discusses the combination of various text extraction techniques. Compares results with manually produced abstracts
Date: 26. 2.1997 10:22:43
Source: Microcomputers for information management. 13(1996) no.1, S.41-55

Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.02

0.022060106 = product of:
  0.04412021 = sum of:
    0.018330941 = weight(_text_:information in 6974) [ClassicSimilarity], result of:
      0.018330941 = score(doc=6974,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.21943474 = fieldWeight in 6974, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=6974)
    0.02578927 = product of:
      0.05157854 = sum of:
        0.05157854 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
          0.05157854 = score(doc=6974,freq=2.0), product of:
            0.16663991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047586527 = queryNorm
            0.30952093 = fieldWeight in 6974, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6974)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon

Marcu, D.: Automatic abstracting and summarization (2009) 0.02

0.017984057 = product of:
  0.035968114 = sum of:
    0.019644385 = weight(_text_:information in 3748) [ClassicSimilarity], result of:
      0.019644385 = score(doc=3748,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.23515764 = fieldWeight in 3748, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3748)
    0.016323728 = product of:
      0.032647457 = sum of:
        0.032647457 = weight(_text_:technology in 3748) [ClassicSimilarity], result of:
          0.032647457 = score(doc=3748,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.23034787 = fieldWeight in 3748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3748)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: After lying dormant for a few decades, the field of automated text summarization has experienced a tremendous resurgence of interest. Recently, many new algorithms and techniques have been proposed for identifying important information in single documents and document collections, and for mapping this information into grammatical, cohesive, and coherent abstracts. Since 1997, annual workshops, conferences, and large-scale comparative evaluations have provided a rich environment for exchanging ideas between researchers in Asia, Europe, and North America. This entry reviews the main developments in the field and provides a guiding map to those interested in understanding the strengths and weaknesses of an increasingly ubiquitous technology.
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.02
```
0.017981712 = product of:
  0.035963424 = sum of:
    0.02430362 = weight(_text_:information in 3459) [ClassicSimilarity], result of:
      0.02430362 = score(doc=3459,freq=18.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.2909321 = fieldWeight in 3459, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3459)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 3459) [ClassicSimilarity], result of:
          0.02331961 = score(doc=3459,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 3459, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3459)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.5, S.1062-1072
Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.02
```
0.016545078 = product of:
  0.033090156 = sum of:
    0.013748205 = weight(_text_:information in 948) [ClassicSimilarity], result of:
      0.013748205 = score(doc=948,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16457605 = fieldWeight in 948, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=948)
    0.019341951 = product of:
      0.038683902 = sum of:
        0.038683902 = weight(_text_:22 in 948) [ClassicSimilarity], result of:
          0.038683902 = score(doc=948,freq=2.0), product of:
            0.16663991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.047586527 = queryNorm
            0.23214069 = fieldWeight in 948, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=948)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.

Source

Information processing and management. 43(2007) no.6, S.1606-1618
Rodríguez-Vidal, J.; Carrillo-de-Albornoz, J.; Gonzalo, J.; Plaza, L.: Authority and priority signals in automatic summary generation for online reputation management (2021) 0.01
```
0.014887327 = product of:
  0.029774655 = sum of:
    0.01811485 = weight(_text_:information in 213) [ClassicSimilarity], result of:
      0.01811485 = score(doc=213,freq=10.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.21684799 = fieldWeight in 213, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=213)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 213) [ClassicSimilarity], result of:
          0.02331961 = score(doc=213,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=213)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Online reputation management (ORM) comprises the collection of techniques that help monitoring and improving the public image of an entity (companies, products, institutions) on the Internet. The ORM experts try to minimize the negative impact of the information about an entity while maximizing the positive material for being more trustworthy to the customers. Due to the huge amount of information that is published on the Internet every day, there is a need to summarize the entire flow of information to obtain only those data that are relevant to the entities. Traditionally the automatic summarization task in the ORM scenario takes some in-domain signals into account such as popularity, polarity for reputation and novelty but exists other feature to be considered, the authority of the people. This authority depends on the ability to convince others and therefore to influence opinions. In this work, we propose the use of authority signals that measures the influence of a user jointly with (a) priority signals related to the ORM domain and (b) information regarding the different topics that influential people is talking about. Our results indicate that the use of authority signals may significantly improve the quality of the summaries that are automatically generated.

Source

Journal of the Association for Information Science and Technology. 72(2021) no.5, S.583-594
Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.01
```
0.01393111 = product of:
  0.02786222 = sum of:
    0.016202414 = weight(_text_:information in 1719) [ClassicSimilarity], result of:
      0.016202414 = score(doc=1719,freq=8.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.19395474 = fieldWeight in 1719, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1719)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 1719) [ClassicSimilarity], result of:
          0.02331961 = score(doc=1719,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 1719, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1719)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals - mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.6, S.887-902

Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003) 0.01

0.013869986 = product of:
  0.027739972 = sum of:
    0.013748205 = weight(_text_:information in 2115) [ClassicSimilarity], result of:
      0.013748205 = score(doc=2115,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16457605 = fieldWeight in 2115, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2115)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 2115) [ClassicSimilarity], result of:
          0.027983533 = score(doc=2115,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 2115, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2115)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
Source: Journal of the American Society for Information Science and technology. 54(2003) no.13, S.1224-1236

Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.01

0.013869986 = product of:
  0.027739972 = sum of:
    0.013748205 = weight(_text_:information in 2832) [ClassicSimilarity], result of:
      0.013748205 = score(doc=2832,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16457605 = fieldWeight in 2832, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2832)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 2832) [ClassicSimilarity], result of:
          0.027983533 = score(doc=2832,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 2832, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2832)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.71-82

Yulianti, E.; Huspi, S.; Sanderson, M.: Tweet-biased summarization (2016) 0.01
```
0.012845755 = product of:
  0.02569151 = sum of:
    0.0140317045 = weight(_text_:information in 2926) [ClassicSimilarity], result of:
      0.0140317045 = score(doc=2926,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16796975 = fieldWeight in 2926, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2926)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 2926) [ClassicSimilarity], result of:
          0.02331961 = score(doc=2926,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 2926, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2926)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

We examined whether the microblog comments given by people after reading a web document could be exploited to improve the accuracy of a web document summarization system. We examined the effect of social information (i.e., tweets) on the accuracy of the generated summaries by comparing the user preference for TBS (tweet-biased summary) with GS (generic summary). The result of crowdsourcing-based evaluation shows that the user preference for TBS was significantly higher than GS. We also took random samples of the documents to see the performance of summaries in a traditional evaluation using ROUGE, which, in general, TBS was also shown to be better than GS. We further analyzed the influence of the number of tweets pointed to a web document on summarization accuracy, finding a positive moderate correlation between the number of tweets pointed to a web document and the performance of generated TBS as measured by user preference. The results show that incorporating social information into the summary generation process can improve the accuracy of summary. The reason for people choosing one summary over another in a crowdsourcing-based evaluation is also presented in this article.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1289-1300

Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 1965) [ClassicSimilarity], result of:
      0.00972145 = score(doc=1965,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 1965, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1965)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 1965) [ClassicSimilarity], result of:
          0.027983533 = score(doc=1965,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 1965, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1965)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and Technology. 56(2005) no.2, S.129-139

Sparck Jones, K.: Automatic summarising : the state of the art (2007) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 932) [ClassicSimilarity], result of:
      0.00972145 = score(doc=932,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 932, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=932)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 932) [ClassicSimilarity], result of:
          0.027983533 = score(doc=932,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 932, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=932)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper reviews research on automatic summarising in the last decade. This work has grown, stimulated by technology and by evaluation programmes. The paper uses several frameworks to organise the review, for summarising itself, for the factors affecting summarising, for systems, and for evaluation. The review examines the evaluation strategies applied to summarising, the issues they raise, and the major programmes. It considers the input, purpose and output factors investigated in recent summarising research, and discusses the classes of strategy, extractive and non-extractive, that have been explored, illustrating the range of systems built. The conclusions drawn are that automatic summarisation has made valuable progress, with useful applications, better evaluation, and more task understanding. But summarising systems are still poorly motivated in relation to the factors affecting them, and evaluation needs taking much further to engage with the purposes summaries are intended to serve and the contexts in which they are used.
Source: Information processing and management. 43(2007) no.6, S.1449-1481

Wang, W.; Hwang, D.: Abstraction Assistant : an automatic text abstraction system (2010) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 3981) [ClassicSimilarity], result of:
      0.00972145 = score(doc=3981,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 3981, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3981)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 3981) [ClassicSimilarity], result of:
          0.027983533 = score(doc=3981,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 3981, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=3981)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.9, S.1790-1799

Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.01
```
0.011558321 = product of:
  0.023116643 = sum of:
    0.011456838 = weight(_text_:information in 522) [ClassicSimilarity], result of:
      0.011456838 = score(doc=522,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.13714671 = fieldWeight in 522, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=522)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 522) [ClassicSimilarity], result of:
          0.02331961 = score(doc=522,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 522, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=522)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method-with or without the use of a taxonomy-were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.10, S.1419-1435
Wei, F.; Li, W.; Lu, Q.; He, Y.: Applying two-level reinforcement ranking in query-oriented multidocument summarization (2009) 0.01
```
0.011558321 = product of:
  0.023116643 = sum of:
    0.011456838 = weight(_text_:information in 3120) [ClassicSimilarity], result of:
      0.011456838 = score(doc=3120,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.13714671 = fieldWeight in 3120, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3120)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 3120) [ClassicSimilarity], result of:
          0.02331961 = score(doc=3120,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 3120, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3120)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the newly emerging graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that links sentences together. In general, the existing PageRank-like algorithms can model well the phenomena that a sentence is important if it is linked by many other important sentences. Or they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multidocument summarization these algorithms often assemble a set of documents into one large file. The document dimension is totally ignored. In this article we present a framework to model the two-level mutual reinforcement among sentences as well as documents. Under this framework we design and develop a novel ranking algorithm such that the document reinforcement is taken into account in the process of sentence ranking. The convergence issue is examined. We also explore an interesting and important property of the proposed algorithm. When evaluated on the DUC 2005 and 2006 query-oriented multidocument summarization datasets, significant results are achieved.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.10, S.2119-2131
Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.01
```
0.011558321 = product of:
  0.023116643 = sum of:
    0.011456838 = weight(_text_:information in 4770) [ClassicSimilarity], result of:
      0.011456838 = score(doc=4770,freq=4.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.13714671 = fieldWeight in 4770, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4770)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 4770) [ClassicSimilarity], result of:
          0.02331961 = score(doc=4770,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 4770, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4770)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept- and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept- and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.10, S.2067-2082

Search (96 results, page 1 of 5)

Authors

Years

Languages

Types

Themes

Subjects