Search (23 results, page 1 of 2)

  • × theme_ss:"Automatisches Abstracting"
  1. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.01
    0.0061221616 = product of:
      0.030610807 = sum of:
        0.010176711 = product of:
          0.03053013 = sum of:
            0.03053013 = weight(_text_:problem in 3459) [ClassicSimilarity], result of:
              0.03053013 = score(doc=3459,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.23447686 = fieldWeight in 3459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3459)
          0.33333334 = coord(1/3)
        0.020434096 = product of:
          0.06130229 = sum of:
            0.06130229 = weight(_text_:2010 in 3459) [ClassicSimilarity], result of:
              0.06130229 = score(doc=3459,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.41779095 = fieldWeight in 3459, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3459)
          0.33333334 = coord(1/3)
      0.2 = coord(2/10)
    
    Abstract
    In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.5, S.1062-1072
    Year
    2010
  2. Kuhlen, R.: Abstracts, abstracting : intellektuelle und maschinelle Verfahren (1990) 0.00
    0.003934862 = product of:
      0.03934862 = sum of:
        0.03934862 = product of:
          0.11804586 = sum of:
            0.11804586 = weight(_text_:1990 in 2333) [ClassicSimilarity], result of:
              0.11804586 = score(doc=2333,freq=3.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.85381323 = fieldWeight in 2333, product of:
                  1.7320508 = tf(freq=3.0), with freq of:
                    3.0 = termFreq=3.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2333)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Year
    1990
  3. Endres-Niggemeyer, B.: Kognitive Modellierung des Abstracting (1991) 0.00
    0.0027538298 = product of:
      0.027538298 = sum of:
        0.027538298 = product of:
          0.08261489 = sum of:
            0.08261489 = weight(_text_:1990 in 23) [ClassicSimilarity], result of:
              0.08261489 = score(doc=23,freq=2.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5975448 = fieldWeight in 23, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.09375 = fieldNorm(doc=23)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Deutscher Dokumentartag 1990. 1. Deutsch-deutscher Dokumentartag, 25.-27.9.90, Fulda. Proceedings. Hrsg. W. Neubauer u. U. Schneider-Briehn
  4. Wang, W.; Hwang, D.: Abstraction Assistant : an automatic text abstraction system (2010) 0.00
    0.0024520915 = product of:
      0.024520915 = sum of:
        0.024520915 = product of:
          0.07356274 = sum of:
            0.07356274 = weight(_text_:2010 in 3981) [ClassicSimilarity], result of:
              0.07356274 = score(doc=3981,freq=5.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.5013491 = fieldWeight in 3981, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3981)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.9, S.1790-1799
    Year
    2010
  5. Paice, C.D.: Automatic abstracting (1994) 0.00
    0.0022948582 = product of:
      0.022948582 = sum of:
        0.022948582 = product of:
          0.06884574 = sum of:
            0.06884574 = weight(_text_:1990 in 917) [ClassicSimilarity], result of:
              0.06884574 = score(doc=917,freq=2.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.497954 = fieldWeight in 917, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.078125 = fieldNorm(doc=917)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The final report of the 2nd British Library abstracting project (the BLAB project), 1990-1992, which was carried out partly at the Computing Department of Lancaster University, and partly at the Centre for Computational Linguistics, UMIST. This project built on the results of the first project, of 1985-1987, to build a system designed create abstracts automatically from given texts
  6. Ercan, G.; Cicekli, I.: Using lexical chains for keyword extraction (2007) 0.00
    0.0020148864 = product of:
      0.020148862 = sum of:
        0.020148862 = product of:
          0.060446583 = sum of:
            0.060446583 = weight(_text_:problem in 951) [ClassicSimilarity], result of:
              0.060446583 = score(doc=951,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.46424055 = fieldWeight in 951, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=951)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.
  7. Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007) 0.00
    0.0017270452 = product of:
      0.017270451 = sum of:
        0.017270451 = product of:
          0.051811352 = sum of:
            0.051811352 = weight(_text_:problem in 942) [ClassicSimilarity], result of:
              0.051811352 = score(doc=942,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.39792046 = fieldWeight in 942, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.046875 = fieldNorm(doc=942)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
  8. Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.00
    0.0015508389 = product of:
      0.015508389 = sum of:
        0.015508389 = product of:
          0.046525165 = sum of:
            0.046525165 = weight(_text_:2010 in 2832) [ClassicSimilarity], result of:
              0.046525165 = score(doc=2832,freq=2.0), product of:
                0.14672957 = queryWeight, product of:
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.03067635 = queryNorm
                0.31708103 = fieldWeight in 2832, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.7831497 = idf(docFreq=1005, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2832)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
  9. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.00
    0.0014392043 = product of:
      0.0143920425 = sum of:
        0.0143920425 = product of:
          0.043176126 = sum of:
            0.043176126 = weight(_text_:problem in 5400) [ClassicSimilarity], result of:
              0.043176126 = score(doc=5400,freq=4.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.33160037 = fieldWeight in 5400, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
  10. Ruda, S.: Abstracting: eine Auswahlbibliographie (1992) 0.00
    0.0014247395 = product of:
      0.014247394 = sum of:
        0.014247394 = product of:
          0.04274218 = sum of:
            0.04274218 = weight(_text_:problem in 6603) [ClassicSimilarity], result of:
              0.04274218 = score(doc=6603,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.3282676 = fieldWeight in 6603, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6603)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Die vorliegende Auswahlbibliographie ist in 9 Themenbereiche unterteilt. Der erste Abschnitt enthält Literatur, in der auf Abstracts und Abstracting-Verfahren allgemein eingegangen und ein Überblick über den Stand der Forschung gegeben wird. Im nächsten Abschnitt werden solche Aufsätze referiert, die die historische Entwicklung des Abstracting beschreiben. Im dritten Teil sind Abstracting-Richtlinien verschiedener Institutionen aufgelistet. Lexikalische, syntaktische und semantische Textkondensierungsverfahren sind das Thema der in Abschnitt 4 präsentierten Arbeiten. Textstrukturen von Abstracts werden unter Punkt 5 betrachtet, und die Arbeiten des nächsten Themenbereiches befassen sich mit dem Problem des Schreibens von Abstracts. Der siebte Abschnitt listet sog. 'maschinelle' und maschinen-unterstützte Abstracting-Methoden auf. Anschließend werden 'maschinelle' und maschinenunterstützte Abstracting-Verfahren, Abstracts im Vergleich zu ihren Primärtexten sowie Abstracts im allgemeien bewertet. Den Abschluß bilden Bibliographien
  11. Zajic, D.; Dorr, B.J.; Lin, J.; Schwartz, R.: Multi-candidate reduction : sentence compression as a tool for document summarization tasks (2007) 0.00
    0.0014247395 = product of:
      0.014247394 = sum of:
        0.014247394 = product of:
          0.04274218 = sum of:
            0.04274218 = weight(_text_:problem in 944) [ClassicSimilarity], result of:
              0.04274218 = score(doc=944,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.3282676 = fieldWeight in 944, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=944)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a "parse-and-trim" approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.
  12. Dorr, B.J.; Gaasterland, T.: Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction (2007) 0.00
    0.0013769149 = product of:
      0.013769149 = sum of:
        0.013769149 = product of:
          0.041307446 = sum of:
            0.041307446 = weight(_text_:1990 in 950) [ClassicSimilarity], result of:
              0.041307446 = score(doc=950,freq=2.0), product of:
                0.13825724 = queryWeight, product of:
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.03067635 = queryNorm
                0.2987724 = fieldWeight in 950, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.506965 = idf(docFreq=1325, maxDocs=44218)
                  0.046875 = fieldNorm(doc=950)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    This paper presents a model that incorporates contemporary theories of tense and aspect and develops a new framework for extracting temporal relations between two sentence-internal events, given their tense, aspect, and a temporal connecting word relating the two events. A linguistic constraint on event combination has been implemented to detect incorrect parser analyses and potentially apply syntactic reanalysis or semantic reinterpretation - in preparation for subsequent processing for multi-document summarization. An important contribution of this work is the extension of two different existing theoretical frameworks - Hornstein's 1990 theory of tense analysis and Allen's 1984 theory on event ordering - and the combination of both into a unified system for representing and constraining combinations of different event types (points, closed intervals, and open-ended intervals). We show that our theoretical results have been verified in a large-scale corpus analysis. The framework is designed to inform a temporally motivated sentence-ordering module in an implemented multi-document summarization system.
  13. Ye, S.; Chua, T.-S.; Kan, M.-Y.; Qiu, L.: Document concept lattice for text understanding and summarization (2007) 0.00
    0.0012212053 = product of:
      0.012212053 = sum of:
        0.012212053 = product of:
          0.03663616 = sum of:
            0.03663616 = weight(_text_:problem in 941) [ClassicSimilarity], result of:
              0.03663616 = score(doc=941,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.28137225 = fieldWeight in 941, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.046875 = fieldNorm(doc=941)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or concrete entity or its action often expressed by diverse terms in text. Summary generation can thus be considered as an optimization problem of selecting a set of sentences with minimal answer loss. In this paper, we propose a document concept lattice that indexes the hierarchy of local topics tied to a set of frequent concepts and the corresponding sentences containing these topics. The local topics will specify the promising sub-spaces related to the selected concepts and sentences. Based on this lattice, the summary is an optimized selection of a set of distinct and salient local topics that lead to maximal coverage of concepts with the given number of sentences. Our summarizer based on the concept lattice has demonstrated competitive performance in Document Understanding Conference 2005 and 2006 evaluations as well as follow-on tests.
  14. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.00
    0.0011083259 = product of:
      0.011083259 = sum of:
        0.011083259 = product of:
          0.033249777 = sum of:
            0.033249777 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
              0.033249777 = score(doc=6599,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.30952093 = fieldWeight in 6599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6599)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Date
    26. 2.1997 10:22:43
  15. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.00
    0.0011083259 = product of:
      0.011083259 = sum of:
        0.011083259 = product of:
          0.033249777 = sum of:
            0.033249777 = weight(_text_:22 in 6751) [ClassicSimilarity], result of:
              0.033249777 = score(doc=6751,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.30952093 = fieldWeight in 6751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6751)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Date
    6. 3.1997 16:22:15
  16. Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.00
    0.0011083259 = product of:
      0.011083259 = sum of:
        0.011083259 = product of:
          0.033249777 = sum of:
            0.033249777 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
              0.033249777 = score(doc=6974,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.30952093 = fieldWeight in 6974, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6974)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  17. Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.00
    0.0010176711 = product of:
      0.010176711 = sum of:
        0.010176711 = product of:
          0.03053013 = sum of:
            0.03053013 = weight(_text_:problem in 4770) [ClassicSimilarity], result of:
              0.03053013 = score(doc=4770,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.23447686 = fieldWeight in 4770, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4770)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept- and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept- and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.
  18. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.00
    8.3124434E-4 = product of:
      0.008312443 = sum of:
        0.008312443 = product of:
          0.02493733 = sum of:
            0.02493733 = weight(_text_:22 in 948) [ClassicSimilarity], result of:
              0.02493733 = score(doc=948,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.23214069 = fieldWeight in 948, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=948)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.
  19. Abdi, A.; Shamsuddin, S.M.; Aliguliyev, R.M.: QMOS: Query-based multi-documents opinion-oriented summarization (2018) 0.00
    8.141369E-4 = product of:
      0.008141369 = sum of:
        0.008141369 = product of:
          0.024424106 = sum of:
            0.024424106 = weight(_text_:problem in 5089) [ClassicSimilarity], result of:
              0.024424106 = score(doc=5089,freq=2.0), product of:
                0.1302053 = queryWeight, product of:
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03067635 = queryNorm
                0.1875815 = fieldWeight in 5089, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.244485 = idf(docFreq=1723, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5089)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Abstract
    Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users' needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.
  20. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.00
    6.927037E-4 = product of:
      0.0069270367 = sum of:
        0.0069270367 = product of:
          0.02078111 = sum of:
            0.02078111 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
              0.02078111 = score(doc=5290,freq=2.0), product of:
                0.10742335 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03067635 = queryNorm
                0.19345059 = fieldWeight in 5290, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5290)
          0.33333334 = coord(1/3)
      0.1 = coord(1/10)
    
    Date
    22. 7.2006 17:25:48