Search (69 results, page 1 of 4)

Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.11

0.11395815 = product of:
  0.17093723 = sum of:
    0.051577676 = weight(_text_:management in 6599) [ClassicSimilarity], result of:
      0.051577676 = score(doc=6599,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.29792285 = fieldWeight in 6599, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=6599)
    0.11935955 = sum of:
      0.0636879 = weight(_text_:system in 6599) [ClassicSimilarity], result of:
        0.0636879 = score(doc=6599,freq=4.0), product of:
          0.16177002 = queryWeight, product of:
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.051362853 = queryNorm
          0.3936941 = fieldWeight in 6599, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.0625 = fieldNorm(doc=6599)
      0.05567166 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
        0.05567166 = score(doc=6599,freq=2.0), product of:
          0.17986396 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051362853 = queryNorm
          0.30952093 = fieldWeight in 6599, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6599)
  0.6666667 = coord(2/3)

Abstract: With the onset of the information explosion arising from digital libraries and access to a wealth of information through the Internet, the need to efficiently determine the relevance of a document becomes even more urgent. Describes a text extraction system (TES), which retrieves a set of sentences from a document to form an indicative abstract. Such an automated process enables information to be filtered more quickly. Discusses the combination of various text extraction techniques. Compares results with manually produced abstracts
Date: 26. 2.1997 10:22:43
Source: Microcomputers for information management. 13(1996) no.1, S.41-55

Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.10

0.10397439 = product of:
  0.15596157 = sum of:
    0.038683258 = weight(_text_:management in 948) [ClassicSimilarity], result of:
      0.038683258 = score(doc=948,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 948, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=948)
    0.11727831 = sum of:
      0.07552456 = weight(_text_:system in 948) [ClassicSimilarity], result of:
        0.07552456 = score(doc=948,freq=10.0), product of:
          0.16177002 = queryWeight, product of:
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.051362853 = queryNorm
          0.46686378 = fieldWeight in 948, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.046875 = fieldNorm(doc=948)
      0.041753743 = weight(_text_:22 in 948) [ClassicSimilarity], result of:
        0.041753743 = score(doc=948,freq=2.0), product of:
          0.17986396 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051362853 = queryNorm
          0.23214069 = fieldWeight in 948, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=948)
  0.6666667 = coord(2/3)

Abstract: In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.
Source: Information processing and management. 43(2007) no.6, S.1606-1618

Bateman, J.; Teich, E.: Selective information presentation in an integrated publication system : an application of genre-driven text generation (1995) 0.09

0.08644387 = product of:
  0.1296658 = sum of:
    0.09026093 = weight(_text_:management in 2928) [ClassicSimilarity], result of:
      0.09026093 = score(doc=2928,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.521365 = fieldWeight in 2928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.109375 = fieldNorm(doc=2928)
    0.039404877 = product of:
      0.07880975 = sum of:
        0.07880975 = weight(_text_:system in 2928) [ClassicSimilarity], result of:
          0.07880975 = score(doc=2928,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.4871716 = fieldWeight in 2928, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.109375 = fieldNorm(doc=2928)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 31(1995) no.5, S.753-767

Brandow, R.; Mitze, K.; Rau, L.F.: Automatic condensation of electronic publications by sentence selection (1995) 0.06

0.060385596 = product of:
  0.09057839 = sum of:
    0.051577676 = weight(_text_:management in 2929) [ClassicSimilarity], result of:
      0.051577676 = score(doc=2929,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.29792285 = fieldWeight in 2929, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=2929)
    0.039000716 = product of:
      0.07800143 = sum of:
        0.07800143 = weight(_text_:system in 2929) [ClassicSimilarity], result of:
          0.07800143 = score(doc=2929,freq=6.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.48217484 = fieldWeight in 2929, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=2929)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Description of a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portions of the texts (the löead), up to the target length of the summaries. 3 lengths of articles were evaluated for 250 documents by both systems, totalling 1.500 suitability judgements in all. The lead-based summaries outperformed the 'intelligent' summaries significantly, achieving acceptability ratings of over 90%, compared to 74,7%
Source: Information processing and management. 31(1995) no.5, S.675-685

Craven, T.C.: ¬A computer-aided abstracting tool kit (1993) 0.05

0.0493965 = product of:
  0.07409475 = sum of:
    0.051577676 = weight(_text_:management in 6506) [ClassicSimilarity], result of:
      0.051577676 = score(doc=6506,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.29792285 = fieldWeight in 6506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=6506)
    0.022517072 = product of:
      0.045034144 = sum of:
        0.045034144 = weight(_text_:system in 6506) [ClassicSimilarity], result of:
          0.045034144 = score(doc=6506,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.27838376 = fieldWeight in 6506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=6506)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Describes the abstracting assistance features being prototyped in the TEXNET text network management system. Sentence weighting methods include: weithing negatively or positively on the stems in a selected passage; weighting on general lists of cue words, adjusting weights of selected segments; and weighting of occurrence of frequent stems. The user may adjust a number of parameters: the minimum strength of extracts; the threshold for frequent word/stems and the amount sentence weight is to be adjusted for each weighting type

Ahmad, K.: Text summarisation : the role of lexical cohesion analysis (1995) 0.05

0.0493965 = product of:
  0.07409475 = sum of:
    0.051577676 = weight(_text_:management in 5795) [ClassicSimilarity], result of:
      0.051577676 = score(doc=5795,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.29792285 = fieldWeight in 5795, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=5795)
    0.022517072 = product of:
      0.045034144 = sum of:
        0.045034144 = weight(_text_:system in 5795) [ClassicSimilarity], result of:
          0.045034144 = score(doc=5795,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.27838376 = fieldWeight in 5795, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=5795)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The work in automatic text summary focuses mainly on computational models of texts. The artificial intelligence related work in text summary deals mainly with narrative texts such as newspaper reports and stories. Presents a study on the summary of non-narrative texts such as those in scientific and technical communication. Discusses syntactic cohesion; lexical cohesion; complex lexical repetition; simple and complex paraphrase; bonds and links; and Tele-pattan; an architecture for cohesion based text analysis and summarisation system working on SGML
Source: New review of document and text management. 1995, no.1, S.321-335

Sjöbergh, J.: Older versions of the ROUGEeval summarization evaluation system were easier to fool (2007) 0.05

0.0493965 = product of:
  0.07409475 = sum of:
    0.051577676 = weight(_text_:management in 940) [ClassicSimilarity], result of:
      0.051577676 = score(doc=940,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.29792285 = fieldWeight in 940, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0625 = fieldNorm(doc=940)
    0.022517072 = product of:
      0.045034144 = sum of:
        0.045034144 = weight(_text_:system in 940) [ClassicSimilarity], result of:
          0.045034144 = score(doc=940,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.27838376 = fieldWeight in 940, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0625 = fieldNorm(doc=940)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Information processing and management. 43(2007) no.6, S.1500-1505

Plaza, L.; Stevenson, M.; Díaz, A.: Resolving ambiguity in biomedical text to improve summarization (2012) 0.05

0.048662614 = product of:
  0.07299392 = sum of:
    0.045130465 = weight(_text_:management in 2734) [ClassicSimilarity], result of:
      0.045130465 = score(doc=2734,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.2606825 = fieldWeight in 2734, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2734)
    0.027863456 = product of:
      0.055726912 = sum of:
        0.055726912 = weight(_text_:system in 2734) [ClassicSimilarity], result of:
          0.055726912 = score(doc=2734,freq=4.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.34448233 = fieldWeight in 2734, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2734)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Access to the vast body of research literature that is now available on biomedicine and related fields can be improved with automatic summarization. This paper describes a summarization system for the biomedical domain that represents documents as graphs formed from concepts and relations in the UMLS Metathesaurus. This system has to deal with the ambiguities that occur in biomedical documents. We describe a variety of strategies that make use of MetaMap and Word Sense Disambiguation (WSD) to accurately map biomedical documents onto UMLS Metathesaurus concepts. Evaluation is carried out using a collection of 150 biomedical scientific articles from the BioMed Central corpus. We find that using WSD improves the quality of the summaries generated.
Source: Information processing and management. 48(2012) no.4, S.755-766

Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.04

0.044557698 = product of:
  0.13367309 = sum of:
    0.13367309 = sum of:
      0.07800143 = weight(_text_:system in 6751) [ClassicSimilarity], result of:
        0.07800143 = score(doc=6751,freq=6.0), product of:
          0.16177002 = queryWeight, product of:
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.051362853 = queryNorm
          0.48217484 = fieldWeight in 6751, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.1495528 = idf(docFreq=5152, maxDocs=44218)
            0.0625 = fieldNorm(doc=6751)
      0.05567166 = weight(_text_:22 in 6751) [ClassicSimilarity], result of:
        0.05567166 = score(doc=6751,freq=2.0), product of:
          0.17986396 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051362853 = queryNorm
          0.30952093 = fieldWeight in 6751, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6751)
  0.33333334 = coord(1/3)

Abstract: Presents a system for summarizing quantitative data in natural language, focusing on the use of a corpus of basketball game summaries, drawn from online news services, to empirically shape the system design and to evaluate the approach. Initial corpus analysis revealed characteristics of textual summaries that challenge the capabilities of current language generation systems. A revision based corpus analysis was used to identify and encode the revision rules of the system. Presents a quantitative evaluation, using several test corpora, to measure the robustness of the new revision based model
Date: 6. 3.1997 16:22:15

Endres-Niggemeyer, B.; Maier, E.; Sigel, A.: How to implement a naturalistic model of abstracting : four core working steps of an expert abstractor (1995) 0.04

0.043221936 = product of:
  0.0648329 = sum of:
    0.045130465 = weight(_text_:management in 2930) [ClassicSimilarity], result of:
      0.045130465 = score(doc=2930,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.2606825 = fieldWeight in 2930, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2930)
    0.019702438 = product of:
      0.039404877 = sum of:
        0.039404877 = weight(_text_:system in 2930) [ClassicSimilarity], result of:
          0.039404877 = score(doc=2930,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.2435858 = fieldWeight in 2930, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2930)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: 4 working steps taken from a comprehensive empirical model of expert abstracting are studied in order to prepare an explorative implementation of a simulation model. It aims at explaining the knowledge processing activities during professional summarizing. Following the case-based and holistic strategy of qualitative empirical research, the main features of the simulation system were developed by investigating in detail a small but central test case - 4 working steps where an expert abstractor discovers what the paper is about and drafts the topic sentence of the abstract
Source: Information processing and management. 31(1995) no.5, S.631-674

Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.04

0.041710816 = product of:
  0.06256622 = sum of:
    0.038683258 = weight(_text_:management in 949) [ClassicSimilarity], result of:
      0.038683258 = score(doc=949,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=949)
    0.02388296 = product of:
      0.04776592 = sum of:
        0.04776592 = weight(_text_:system in 949) [ClassicSimilarity], result of:
          0.04776592 = score(doc=949,freq=4.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.29527056 = fieldWeight in 949, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=949)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric information, and a better performance by the rouge measure than all but one of the single-document summarizers participating in DUC-2002. Anaphoric information is automatically extracted using a new release of our own anaphora resolution system, guitar, which incorporates proper noun resolution. Our summarizer also includes a new approach for automatically identifying the dimensionality reduction of a document on the basis of the desired summarization percentage. Anaphoric information is also used to check the coherence of the summary produced by our summarizer, by a reference checker module which identifies anaphoric resolution errors caused by sentence extraction.
Source: Information processing and management. 43(2007) no.6, S.1663-1680

Dorr, B.J.; Gaasterland, T.: Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction (2007) 0.04
```
0.041710816 = product of:
  0.06256622 = sum of:
    0.038683258 = weight(_text_:management in 950) [ClassicSimilarity], result of:
      0.038683258 = score(doc=950,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 950, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=950)
    0.02388296 = product of:
      0.04776592 = sum of:
        0.04776592 = weight(_text_:system in 950) [ClassicSimilarity], result of:
          0.04776592 = score(doc=950,freq=4.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.29527056 = fieldWeight in 950, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=950)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This paper presents a model that incorporates contemporary theories of tense and aspect and develops a new framework for extracting temporal relations between two sentence-internal events, given their tense, aspect, and a temporal connecting word relating the two events. A linguistic constraint on event combination has been implemented to detect incorrect parser analyses and potentially apply syntactic reanalysis or semantic reinterpretation - in preparation for subsequent processing for multi-document summarization. An important contribution of this work is the extension of two different existing theoretical frameworks - Hornstein's 1990 theory of tense analysis and Allen's 1984 theory on event ordering - and the combination of both into a unified system for representing and constraining combinations of different event types (points, closed intervals, and open-ended intervals). We show that our theoretical results have been verified in a large-scale corpus analysis. The framework is designed to inform a temporally motivated sentence-ordering module in an implemented multi-document summarization system.

Source

Information processing and management. 43(2007) no.6, S.1681-1704
Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.04
```
0.037050754 = product of:
  0.055576127 = sum of:
    0.025788838 = weight(_text_:management in 947) [ClassicSimilarity], result of:
      0.025788838 = score(doc=947,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.14896142 = fieldWeight in 947, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.03125 = fieldNorm(doc=947)
    0.029787289 = product of:
      0.059574578 = sum of:
        0.059574578 = weight(_text_:system in 947) [ClassicSimilarity], result of:
          0.059574578 = score(doc=947,freq=14.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.36826712 = fieldWeight in 947, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.03125 = fieldNorm(doc=947)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system-the Query, Cluster, Summarize (QCS) system-which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence "trimming" and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.

Source

Information processing and management. 43(2007) no.6, S.1588-1605
Maybury, M.T.: Generating summaries from event data (1995) 0.04
```
0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 2349) [ClassicSimilarity], result of:
      0.038683258 = score(doc=2349,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 2349, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2349)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 2349) [ClassicSimilarity], result of:
          0.03377561 = score(doc=2349,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 2349, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2349)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Summarization entails analysis of source material, selection of key information, condensation of this, and generation of a compct summary form. While there habe been many investigations into the automatic summarization of text, relatively little attention has been given to the summarization of information from structured information sources such as data of knowledge bases, despite this being a desirable capability for a number of application areas including report generation from databases (e.g. weather, financial, medical) and simulation (e.g. military, manufacturing, aconomic). After a brief introduction indicating the main elements of summarization and referring to some illustrative approaches to it, considers pecific issues in the generation of text summaries of event data, describes a system, SumGen, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures. Describes how Sum Gen then aggregates similar information and plans a summary presentations tailored to stereotypical users

Source

Information processing and management. 31(1995) no.5, S.735-751

Moens, M.-F.; Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases (1997) 0.04

0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 2256) [ClassicSimilarity], result of:
      0.038683258 = score(doc=2256,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 2256, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2256)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 2256) [ClassicSimilarity], result of:
          0.03377561 = score(doc=2256,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 2256, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2256)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The SALOMON system automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts relevant text units from the case text to form a case summary. Such a case profile facilitates the rapid determination of the relevance of the case or may be employed in text search. In a first important abstracting step SALOMON performs an initial categorization of legal criminal cases and structures the case text into separate legally relevant and irrelevant components. A text grammar represented as a semantic network is used to automatically determine the category of the case and its components. Extracts from the case general data and identifies text portions relevant for further abstracting. Prior knowledge of the text structure and its indicative cues may support automatic abstracting. A text grammar is a promising form for representing the knowledge involved
Source: Information processing and management. 33(1997) no.6, S.727-737

Liang, S.-F.; Devlin, S.; Tait, J.: Investigating sentence weighting components for automatic summarisation (2007) 0.04

0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 899) [ClassicSimilarity], result of:
      0.038683258 = score(doc=899,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 899, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=899)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 899) [ClassicSimilarity], result of:
          0.03377561 = score(doc=899,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 899, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=899)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. It subsequently proved to be a reliable indicator for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
Source: Information processing and management. 43(2007) no.1, S.146-153

Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007) 0.04
```
0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 942) [ClassicSimilarity], result of:
      0.038683258 = score(doc=942,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 942, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=942)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 942) [ClassicSimilarity], result of:
          0.03377561 = score(doc=942,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 942, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=942)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.

Source

Information processing and management. 43(2007) no.6, S.1521-1535

Nomoto, T.: Discriminative sentence compression with conditional random fields (2007) 0.04

0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 945) [ClassicSimilarity], result of:
      0.038683258 = score(doc=945,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 945, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=945)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 945) [ClassicSimilarity], result of:
          0.03377561 = score(doc=945,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 945, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91-107.].
Source: Information processing and management. 43(2007) no.6, S.1571-1587

Díaz, A.; Gervás, P.: User-model based personalized summarization (2007) 0.04
```
0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 952) [ClassicSimilarity], result of:
      0.038683258 = score(doc=952,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 952, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=952)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 952) [ClassicSimilarity], result of:
          0.03377561 = score(doc=952,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=952)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The potential of summary personalization is high, because a summary that would be useless to decide the relevance of a document if summarized in a generic manner, may be useful if the right sentences are selected that match the user interest. In this paper we defend the use of a personalized summarization facility to maximize the density of relevance of selections sent by a personalized information system to a given user. The personalization is applied to the digital newspaper domain and it used a user-model that stores long and short term interests using four reference systems: sections, categories, keywords and feedback terms. On the other side, it is crucial to measure how much information is lost during the summarization process, and how this information loss may affect the ability of the user to judge the relevance of a given document. The results obtained in two personalization systems show that personalized summaries perform better than generic and generic-personalized summaries in terms of identifying documents that satisfy user preferences. We also considered a user-centred direct evaluation that showed a high level of user satisfaction with the summaries.

Source

Information processing and management. 43(2007) no.6, S.1715-1734
Abdi, A.; Idris, N.; Alguliev, R.M.; Aliguliyev, R.M.: Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems (2015) 0.04
```
0.03704738 = product of:
  0.055571064 = sum of:
    0.038683258 = weight(_text_:management in 2681) [ClassicSimilarity], result of:
      0.038683258 = score(doc=2681,freq=2.0), product of:
        0.17312427 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.051362853 = queryNorm
        0.22344214 = fieldWeight in 2681, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2681)
    0.016887804 = product of:
      0.03377561 = sum of:
        0.03377561 = weight(_text_:system in 2681) [ClassicSimilarity], result of:
          0.03377561 = score(doc=2681,freq=2.0), product of:
            0.16177002 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.051362853 = queryNorm
            0.20878783 = fieldWeight in 2681, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2681)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Summary writing is a process for creating a short version of a source text. It can be used as a measure of understanding. As grading students' summaries is a very time-consuming task, computer-assisted assessment can help teachers perform the grading more effectively. Several techniques, such as BLEU, ROUGE, N-gram co-occurrence, Latent Semantic Analysis (LSA), LSA_Ngram and LSA_ERB, have been proposed to support the automatic assessment of students' summaries. Since these techniques are more suitable for long texts, their performance is not satisfactory for the evaluation of short summaries. This paper proposes a specialized method that works well in assessing short summaries. Our proposed method integrates the semantic relations between words, and their syntactic composition. As a result, the proposed method is able to obtain high accuracy and improve the performance compared with the current techniques. Experiments have displayed that it is to be preferred over the existing techniques. A summary evaluation system based on the proposed method has also been developed.

Source

Information processing and management. 51(2015) no.4, S.340-358

Search (69 results, page 1 of 4)

Authors

Years

Languages

Types

Themes