Search (99 results, page 1 of 5)

  • × theme_ss:"Automatisches Abstracting"
  1. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.04
    0.03618269 = product of:
      0.10854807 = sum of:
        0.021104332 = weight(_text_:information in 6599) [ClassicSimilarity], result of:
          0.021104332 = score(doc=6599,freq=8.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.3103276 = fieldWeight in 6599, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=6599)
        0.06644897 = weight(_text_:techniques in 6599) [ClassicSimilarity], result of:
          0.06644897 = score(doc=6599,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3893711 = fieldWeight in 6599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0625 = fieldNorm(doc=6599)
        0.02099476 = product of:
          0.04198952 = sum of:
            0.04198952 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
              0.04198952 = score(doc=6599,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.30952093 = fieldWeight in 6599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6599)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    With the onset of the information explosion arising from digital libraries and access to a wealth of information through the Internet, the need to efficiently determine the relevance of a document becomes even more urgent. Describes a text extraction system (TES), which retrieves a set of sentences from a document to form an indicative abstract. Such an automated process enables information to be filtered more quickly. Discusses the combination of various text extraction techniques. Compares results with manually produced abstracts
    Date
    26. 2.1997 10:22:43
    Source
    Microcomputers for information management. 13(1996) no.1, S.41-55
  2. Moens, M.-F.: Summarizing court decisions (2007) 0.03
    0.031596936 = product of:
      0.09479081 = sum of:
        0.009233146 = weight(_text_:information in 954) [ClassicSimilarity], result of:
          0.009233146 = score(doc=954,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13576832 = fieldWeight in 954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=954)
        0.027414814 = weight(_text_:retrieval in 954) [ClassicSimilarity], result of:
          0.027414814 = score(doc=954,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.23394634 = fieldWeight in 954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0546875 = fieldNorm(doc=954)
        0.05814285 = weight(_text_:techniques in 954) [ClassicSimilarity], result of:
          0.05814285 = score(doc=954,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3406997 = fieldWeight in 954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=954)
      0.33333334 = coord(3/9)
    
    Abstract
    In the field of law there is an absolute need for summarizing the texts of court decisions in order to make the content of the cases easily accessible for legal professionals. During the SALOMON and MOSAIC projects we investigated the summarization and retrieval of legal cases. This article presents some of the main findings while integrating the research results of experiments on legal document summarization by other research groups. In addition, we propose novel avenues of research for automatic text summarization, which we currently exploit when summarizing court decisions in the ACILA project. Techniques for automated concept learning and argument recognition are here the most challenging.
    Source
    Information processing and management. 43(2007) no.6, S.1748-1764
  3. Nomoto, T.: Discriminative sentence compression with conditional random fields (2007) 0.03
    0.02901427 = product of:
      0.08704281 = sum of:
        0.013707667 = weight(_text_:information in 945) [ClassicSimilarity], result of:
          0.013707667 = score(doc=945,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.20156369 = fieldWeight in 945, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
        0.023498412 = weight(_text_:retrieval in 945) [ClassicSimilarity], result of:
          0.023498412 = score(doc=945,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.20052543 = fieldWeight in 945, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
        0.049836725 = weight(_text_:techniques in 945) [ClassicSimilarity], result of:
          0.049836725 = score(doc=945,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 945, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
      0.33333334 = coord(3/9)
    
    Abstract
    The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91-107.].
    Source
    Information processing and management. 43(2007) no.6, S.1571-1587
  4. Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.03
    0.02674227 = product of:
      0.08022681 = sum of:
        0.014923017 = weight(_text_:information in 6974) [ClassicSimilarity], result of:
          0.014923017 = score(doc=6974,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.21943474 = fieldWeight in 6974, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=6974)
        0.04430903 = weight(_text_:retrieval in 6974) [ClassicSimilarity], result of:
          0.04430903 = score(doc=6974,freq=4.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.37811437 = fieldWeight in 6974, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0625 = fieldNorm(doc=6974)
        0.02099476 = product of:
          0.04198952 = sum of:
            0.04198952 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
              0.04198952 = score(doc=6974,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.30952093 = fieldWeight in 6974, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6974)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  5. Abdi, A.; Idris, N.; Alguliev, R.M.; Aliguliyev, R.M.: Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems (2015) 0.02
    0.024636826 = product of:
      0.11086571 = sum of:
        0.011192262 = weight(_text_:information in 2681) [ClassicSimilarity], result of:
          0.011192262 = score(doc=2681,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16457605 = fieldWeight in 2681, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2681)
        0.09967345 = weight(_text_:techniques in 2681) [ClassicSimilarity], result of:
          0.09967345 = score(doc=2681,freq=8.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.5840566 = fieldWeight in 2681, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=2681)
      0.22222222 = coord(2/9)
    
    Abstract
    Summary writing is a process for creating a short version of a source text. It can be used as a measure of understanding. As grading students' summaries is a very time-consuming task, computer-assisted assessment can help teachers perform the grading more effectively. Several techniques, such as BLEU, ROUGE, N-gram co-occurrence, Latent Semantic Analysis (LSA), LSA_Ngram and LSA_ERB, have been proposed to support the automatic assessment of students' summaries. Since these techniques are more suitable for long texts, their performance is not satisfactory for the evaluation of short summaries. This paper proposes a specialized method that works well in assessing short summaries. Our proposed method integrates the semantic relations between words, and their syntactic composition. As a result, the proposed method is able to obtain high accuracy and improve the performance compared with the current techniques. Experiments have displayed that it is to be preferred over the existing techniques. A summary evaluation system based on the proposed method has also been developed.
    Source
    Information processing and management. 51(2015) no.4, S.340-358
  6. Salton, G.: Automatic text structuring and summarization (1997) 0.02
    0.020324344 = product of:
      0.09145955 = sum of:
        0.009233146 = weight(_text_:information in 145) [ClassicSimilarity], result of:
          0.009233146 = score(doc=145,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13576832 = fieldWeight in 145, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=145)
        0.0822264 = weight(_text_:techniques in 145) [ClassicSimilarity], result of:
          0.0822264 = score(doc=145,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.48182213 = fieldWeight in 145, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=145)
      0.22222222 = coord(2/9)
    
    Abstract
    Applies the ideas from the automatic link generation research to automatic text summarisation. Using techniques for inter-document link generation, generates intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, characterises the structure of the text. Applies the knowledge of text structure to do automatic text summarisation by passage extraction. Evaluates a set of 50 summaries generated using these techniques by comparing the to paragraph extracts constructed by humans. The automatic summarisation methods perform well, especially in view of the fact that the summaries generates by 2 humans for the same article are surprisingly dissimilar
    Source
    Information processing and management. 33(1997) no.2, S.193-207
  7. Moens, M.-F.; Uyttendaele, C.; Dumotier, J.: Abstracting of legal cases : the potential of clustering based on the selection of representative objects (1999) 0.02
    0.017420867 = product of:
      0.0783939 = sum of:
        0.007914125 = weight(_text_:information in 2944) [ClassicSimilarity], result of:
          0.007914125 = score(doc=2944,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 2944, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2944)
        0.07047977 = weight(_text_:techniques in 2944) [ClassicSimilarity], result of:
          0.07047977 = score(doc=2944,freq=4.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.4129904 = fieldWeight in 2944, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=2944)
      0.22222222 = coord(2/9)
    
    Abstract
    The SALOMON project automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts text units from the case text to form a case summary. Such a case summary facilitates the rapid determination of the relevance of the case or may be employed in text search. an important part of the research concerns the development of techniques for automatic recognition of representative text paragraphs (or sentences) in texts of unrestricted domains. these techniques are employed to eliminate redundant material in the case texts, and to identify informative text paragraphs which are relevant to include in the case summary. An evaluation of a test set of 700 criminal cases demonstrates that the algorithms have an application potential for automatic indexing, abstracting, and text linkage
    Source
    Journal of the American Society for Information Science. 50(1999) no.2, S.151-161
  8. Marcu, D.: Automatic abstracting and summarization (2009) 0.02
    0.016474472 = product of:
      0.074135125 = sum of:
        0.015992278 = weight(_text_:information in 3748) [ClassicSimilarity], result of:
          0.015992278 = score(doc=3748,freq=6.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.23515764 = fieldWeight in 3748, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3748)
        0.05814285 = weight(_text_:techniques in 3748) [ClassicSimilarity], result of:
          0.05814285 = score(doc=3748,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3406997 = fieldWeight in 3748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3748)
      0.22222222 = coord(2/9)
    
    Abstract
    After lying dormant for a few decades, the field of automated text summarization has experienced a tremendous resurgence of interest. Recently, many new algorithms and techniques have been proposed for identifying important information in single documents and document collections, and for mapping this information into grammatical, cohesive, and coherent abstracts. Since 1997, annual workshops, conferences, and large-scale comparative evaluations have provided a rich environment for exchanging ideas between researchers in Asia, Europe, and North America. This entry reviews the main developments in the field and provides a guiding map to those interested in understanding the strengths and weaknesses of an increasingly ubiquitous technology.
    Source
    Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates
  9. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.02
    0.015297981 = product of:
      0.04589394 = sum of:
        0.013190207 = weight(_text_:information in 1012) [ClassicSimilarity], result of:
          0.013190207 = score(doc=1012,freq=8.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 1012, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
        0.01958201 = weight(_text_:retrieval in 1012) [ClassicSimilarity], result of:
          0.01958201 = score(doc=1012,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 1012, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
        0.013121725 = product of:
          0.02624345 = sum of:
            0.02624345 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
              0.02624345 = score(doc=1012,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.19345059 = fieldWeight in 1012, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1012)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
    Date
    22. 6.2023 14:55:20
    Source
    Journal of the Association for Information Science and Technology. 74(2023) no.7, S.759-774
  10. Zajic, D.; Dorr, B.J.; Lin, J.; Schwartz, R.: Multi-candidate reduction : sentence compression as a tool for document summarization tasks (2007) 0.01
    0.014972444 = product of:
      0.067375995 = sum of:
        0.009233146 = weight(_text_:information in 944) [ClassicSimilarity], result of:
          0.009233146 = score(doc=944,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.13576832 = fieldWeight in 944, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=944)
        0.05814285 = weight(_text_:techniques in 944) [ClassicSimilarity], result of:
          0.05814285 = score(doc=944,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.3406997 = fieldWeight in 944, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0546875 = fieldNorm(doc=944)
      0.22222222 = coord(2/9)
    
    Abstract
    This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a "parse-and-trim" approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.
    Source
    Information processing and management. 43(2007) no.6, S.1549-1570
  11. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.01
    0.013561998 = product of:
      0.061028987 = sum of:
        0.011192262 = weight(_text_:information in 953) [ClassicSimilarity], result of:
          0.011192262 = score(doc=953,freq=4.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.16457605 = fieldWeight in 953, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
        0.049836725 = weight(_text_:techniques in 953) [ClassicSimilarity], result of:
          0.049836725 = score(doc=953,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.2920283 = fieldWeight in 953, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
      0.22222222 = coord(2/9)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
    Source
    Information processing and management. 43(2007) no.6, S.1735-1747
  12. Kim, H.H.; Kim, Y.H.: Generic speech summarization of transcribed lecture videos : using tags and their semantic relations (2016) 0.01
    0.013099614 = product of:
      0.03929884 = sum of:
        0.0065951035 = weight(_text_:information in 2640) [ClassicSimilarity], result of:
          0.0065951035 = score(doc=2640,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.09697737 = fieldWeight in 2640, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2640)
        0.01958201 = weight(_text_:retrieval in 2640) [ClassicSimilarity], result of:
          0.01958201 = score(doc=2640,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.16710453 = fieldWeight in 2640, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2640)
        0.013121725 = product of:
          0.02624345 = sum of:
            0.02624345 = weight(_text_:22 in 2640) [ClassicSimilarity], result of:
              0.02624345 = score(doc=2640,freq=2.0), product of:
                0.13565971 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038739666 = queryNorm
                0.19345059 = fieldWeight in 2640, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2640)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer- and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.
    Date
    22. 1.2016 12:29:41
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.2, S.366-379
  13. Rodríguez-Vidal, J.; Carrillo-de-Albornoz, J.; Gonzalo, J.; Plaza, L.: Authority and priority signals in automatic summary generation for online reputation management (2021) 0.01
    0.012506157 = product of:
      0.056277707 = sum of:
        0.0147471 = weight(_text_:information in 213) [ClassicSimilarity], result of:
          0.0147471 = score(doc=213,freq=10.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.21684799 = fieldWeight in 213, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=213)
        0.04153061 = weight(_text_:techniques in 213) [ClassicSimilarity], result of:
          0.04153061 = score(doc=213,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 213, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=213)
      0.22222222 = coord(2/9)
    
    Abstract
    Online reputation management (ORM) comprises the collection of techniques that help monitoring and improving the public image of an entity (companies, products, institutions) on the Internet. The ORM experts try to minimize the negative impact of the information about an entity while maximizing the positive material for being more trustworthy to the customers. Due to the huge amount of information that is published on the Internet every day, there is a need to summarize the entire flow of information to obtain only those data that are relevant to the entities. Traditionally the automatic summarization task in the ORM scenario takes some in-domain signals into account such as popularity, polarity for reputation and novelty but exists other feature to be considered, the authority of the people. This authority depends on the ability to convince others and therefore to influence opinions. In this work, we propose the use of authority signals that measures the influence of a user jointly with (a) priority signals related to the ORM domain and (b) information regarding the different topics that influential people is talking about. Our results indicate that the use of authority signals may significantly improve the quality of the summaries that are automatically generated.
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.5, S.583-594
  14. Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 0.01
    0.012202433 = product of:
      0.05491095 = sum of:
        0.007914125 = weight(_text_:information in 1965) [ClassicSimilarity], result of:
          0.007914125 = score(doc=1965,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 1965, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1965)
        0.046996824 = weight(_text_:retrieval in 1965) [ClassicSimilarity], result of:
          0.046996824 = score(doc=1965,freq=8.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.40105087 = fieldWeight in 1965, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=1965)
      0.22222222 = coord(2/9)
    
    Abstract
    We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as weIl as in the documents based an co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as weIl as retrieving English documents from Chinese queries. Extensive experiments have been conducted an a large-scale parallel corpus enabling studies an retrieval performance for two different cross-lingual settings of full-length documents as weIl as automated summaries.
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.2, S.129-139
  15. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.01
    0.012160181 = product of:
      0.054720815 = sum of:
        0.013190207 = weight(_text_:information in 1719) [ClassicSimilarity], result of:
          0.013190207 = score(doc=1719,freq=8.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 1719, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1719)
        0.04153061 = weight(_text_:techniques in 1719) [ClassicSimilarity], result of:
          0.04153061 = score(doc=1719,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 1719, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1719)
      0.22222222 = coord(2/9)
    
    Abstract
    Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals - mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.6, S.887-902
  16. Johnson, F.C.; Paice, C.D.; Black, W.J.; Neal, A.P.: ¬The application of linguistic processing to automatic abstract generation (1993) 0.01
    0.0116342725 = product of:
      0.052354228 = sum of:
        0.013190207 = weight(_text_:information in 2290) [ClassicSimilarity], result of:
          0.013190207 = score(doc=2290,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 2290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=2290)
        0.03916402 = weight(_text_:retrieval in 2290) [ClassicSimilarity], result of:
          0.03916402 = score(doc=2290,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.33420905 = fieldWeight in 2290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=2290)
      0.22222222 = coord(2/9)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.538-552.
  17. Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 0.01
    0.0116342725 = product of:
      0.052354228 = sum of:
        0.013190207 = weight(_text_:information in 1949) [ClassicSimilarity], result of:
          0.013190207 = score(doc=1949,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 1949, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1949)
        0.03916402 = weight(_text_:retrieval in 1949) [ClassicSimilarity], result of:
          0.03916402 = score(doc=1949,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.33420905 = fieldWeight in 1949, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=1949)
      0.22222222 = coord(2/9)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.478-483.
  18. Marsh, E.: ¬A production rule system for message summarisation (1984) 0.01
    0.0116342725 = product of:
      0.052354228 = sum of:
        0.013190207 = weight(_text_:information in 1956) [ClassicSimilarity], result of:
          0.013190207 = score(doc=1956,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.19395474 = fieldWeight in 1956, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1956)
        0.03916402 = weight(_text_:retrieval in 1956) [ClassicSimilarity], result of:
          0.03916402 = score(doc=1956,freq=2.0), product of:
            0.1171842 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.038739666 = queryNorm
            0.33420905 = fieldWeight in 1956, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.078125 = fieldNorm(doc=1956)
      0.22222222 = coord(2/9)
    
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.534-537.
  19. Galgani, F.; Compton, P.; Hoffmann, A.: Summarization based on bi-directional citation analysis (2015) 0.01
    0.010694603 = product of:
      0.048125714 = sum of:
        0.0065951035 = weight(_text_:information in 2685) [ClassicSimilarity], result of:
          0.0065951035 = score(doc=2685,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.09697737 = fieldWeight in 2685, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2685)
        0.04153061 = weight(_text_:techniques in 2685) [ClassicSimilarity], result of:
          0.04153061 = score(doc=2685,freq=2.0), product of:
            0.17065717 = queryWeight, product of:
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.038739666 = queryNorm
            0.24335694 = fieldWeight in 2685, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.405231 = idf(docFreq=1467, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2685)
      0.22222222 = coord(2/9)
    
    Abstract
    Automatic document summarization using citations is based on summarizing what others explicitly say about the document, by extracting a summary from text around the citations (citances). While this technique works quite well for summarizing the impact of scientific articles, other genres of documents as well as other types of summaries require different approaches. In this paper, we introduce a new family of methods that we developed for legal documents summarization to generate catchphrases for legal cases (where catchphrases are a form of legal summary). Our methods use both incoming and outgoing citations, and we show how citances can be combined with other elements of cited and citing documents, including the full text of the target document, and catchphrases of cited and citing cases. On a legal summarization corpus, our methods outperform competitive baselines. The combination of full text sentences and catchphrases from cited and citing cases is particularly successful. We also apply and evaluate the methods on scientific paper summarization, where they perform at the level of state-of-the-art techniques. Our family of citation-based summarization methods is powerful and flexible enough to target successfully a range of different domains and summarization tasks.
    Source
    Information processing and management. 51(2015) no.1, S.1-24
  20. Dorr, B.J.; Gaasterland, T.: Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction (2007) 0.01
    0.010272992 = product of:
      0.046228465 = sum of:
        0.007914125 = weight(_text_:information in 950) [ClassicSimilarity], result of:
          0.007914125 = score(doc=950,freq=2.0), product of:
            0.06800663 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.038739666 = queryNorm
            0.116372846 = fieldWeight in 950, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=950)
        0.03831434 = product of:
          0.07662868 = sum of:
            0.07662868 = weight(_text_:theories in 950) [ClassicSimilarity], result of:
              0.07662868 = score(doc=950,freq=2.0), product of:
                0.21161452 = queryWeight, product of:
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.038739666 = queryNorm
                0.36211446 = fieldWeight in 950, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4624767 = idf(docFreq=509, maxDocs=44218)
                  0.046875 = fieldNorm(doc=950)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    This paper presents a model that incorporates contemporary theories of tense and aspect and develops a new framework for extracting temporal relations between two sentence-internal events, given their tense, aspect, and a temporal connecting word relating the two events. A linguistic constraint on event combination has been implemented to detect incorrect parser analyses and potentially apply syntactic reanalysis or semantic reinterpretation - in preparation for subsequent processing for multi-document summarization. An important contribution of this work is the extension of two different existing theoretical frameworks - Hornstein's 1990 theory of tense analysis and Allen's 1984 theory on event ordering - and the combination of both into a unified system for representing and constraining combinations of different event types (points, closed intervals, and open-ended intervals). We show that our theoretical results have been verified in a large-scale corpus analysis. The framework is designed to inform a temporally motivated sentence-ordering module in an implemented multi-document summarization system.
    Source
    Information processing and management. 43(2007) no.6, S.1681-1704

Years

Languages

  • e 87
  • d 11
  • chi 1
  • More… Less…

Types

  • a 95
  • m 2
  • el 1
  • r 1
  • s 1
  • More… Less…