Search (37 results, page 1 of 2)

Kim, H.H.; Kim, Y.H.: Generic speech summarization of transcribed lecture videos : using tags and their semantic relations (2016) 0.08

0.08275341 = product of:
  0.13792235 = sum of:
    0.023397226 = weight(_text_:retrieval in 2640) [ClassicSimilarity], result of:
      0.023397226 = score(doc=2640,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.16710453 = fieldWeight in 2640, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2640)
    0.09884685 = weight(_text_:semantic in 2640) [ClassicSimilarity], result of:
      0.09884685 = score(doc=2640,freq=10.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.51360583 = fieldWeight in 2640, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2640)
    0.015678266 = product of:
      0.031356532 = sum of:
        0.031356532 = weight(_text_:22 in 2640) [ClassicSimilarity], result of:
          0.031356532 = score(doc=2640,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.19345059 = fieldWeight in 2640, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2640)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer- and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.
Date: 22. 1.2016 12:29:41
Object: Latent Semantic Analysis

Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.06

0.06095504 = product of:
  0.10159173 = sum of:
    0.023397226 = weight(_text_:retrieval in 1012) [ClassicSimilarity], result of:
      0.023397226 = score(doc=1012,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.16710453 = fieldWeight in 1012, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1012)
    0.062516235 = weight(_text_:semantic in 1012) [ClassicSimilarity], result of:
      0.062516235 = score(doc=1012,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 1012, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1012)
    0.015678266 = product of:
      0.031356532 = sum of:
        0.031356532 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
          0.031356532 = score(doc=1012,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.19345059 = fieldWeight in 1012, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1012)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
Date: 22. 6.2023 14:55:20

Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.03

0.03121084 = product of:
  0.0780271 = sum of:
    0.052941877 = weight(_text_:retrieval in 6974) [ClassicSimilarity], result of:
      0.052941877 = score(doc=6974,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.37811437 = fieldWeight in 6974, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=6974)
    0.025085226 = product of:
      0.05017045 = sum of:
        0.05017045 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
          0.05017045 = score(doc=6974,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.30952093 = fieldWeight in 6974, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6974)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon

Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.03
```
0.027113868 = product of:
  0.06778467 = sum of:
    0.032420147 = weight(_text_:retrieval in 947) [ClassicSimilarity], result of:
      0.032420147 = score(doc=947,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.23154683 = fieldWeight in 947, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=947)
    0.03536452 = weight(_text_:semantic in 947) [ClassicSimilarity], result of:
      0.03536452 = score(doc=947,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.18375319 = fieldWeight in 947, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.03125 = fieldNorm(doc=947)
  0.4 = coord(2/5)
```
Abstract

Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system-the Query, Cluster, Summarize (QCS) system-which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence "trimming" and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.
Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.02
```
0.023953566 = product of:
  0.059883915 = sum of:
    0.04420565 = weight(_text_:semantic in 5290) [ClassicSimilarity], result of:
      0.04420565 = score(doc=5290,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.22969149 = fieldWeight in 5290, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5290)
    0.015678266 = product of:
      0.031356532 = sum of:
        0.031356532 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
          0.031356532 = score(doc=5290,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.19345059 = fieldWeight in 5290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5290)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.

Date

22. 7.2006 17:25:48
Yeh, J.-Y.; Ke, H.-R.; Yang, W.-P.; Meng, I.-H.: Text summarization using a trainable summarizer and latent semantic analysis (2005) 0.02
```
0.021656258 = product of:
  0.10828129 = sum of:
    0.10828129 = weight(_text_:semantic in 1003) [ClassicSimilarity], result of:
      0.10828129 = score(doc=1003,freq=12.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.56262696 = fieldWeight in 1003, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1003)
  0.2 = coord(1/5)
```
Abstract

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively.

Object

Latent Semantic Indexing
Abdi, A.; Idris, N.; Alguliev, R.M.; Aliguliyev, R.M.: Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems (2015) 0.02
```
0.018375946 = product of:
  0.091879725 = sum of:
    0.091879725 = weight(_text_:semantic in 2681) [ClassicSimilarity], result of:
      0.091879725 = score(doc=2681,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.47740483 = fieldWeight in 2681, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2681)
  0.2 = coord(1/5)
```
Abstract

Summary writing is a process for creating a short version of a source text. It can be used as a measure of understanding. As grading students' summaries is a very time-consuming task, computer-assisted assessment can help teachers perform the grading more effectively. Several techniques, such as BLEU, ROUGE, N-gram co-occurrence, Latent Semantic Analysis (LSA), LSA_Ngram and LSA_ERB, have been proposed to support the automatic assessment of students' summaries. Since these techniques are more suitable for long texts, their performance is not satisfactory for the evaluation of short summaries. This paper proposes a specialized method that works well in assessing short summaries. Our proposed method integrates the semantic relations between words, and their syntactic composition. As a result, the proposed method is able to obtain high accuracy and improve the performance compared with the current techniques. Experiments have displayed that it is to be preferred over the existing techniques. A summary evaluation system based on the proposed method has also been developed.
Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.02
```
0.018375946 = product of:
  0.091879725 = sum of:
    0.091879725 = weight(_text_:semantic in 2832) [ClassicSimilarity], result of:
      0.091879725 = score(doc=2832,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.47740483 = fieldWeight in 2832, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2832)
  0.2 = coord(1/5)
```
Abstract

Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009) 0.02
```
0.017504547 = product of:
  0.08752273 = sum of:
    0.08752273 = weight(_text_:semantic in 2448) [ClassicSimilarity], result of:
      0.08752273 = score(doc=2448,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45476598 = fieldWeight in 2448, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2448)
  0.2 = coord(1/5)
```
Abstract

In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.

Object

Latent semantic analysis
Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.01
```
0.012503247 = product of:
  0.062516235 = sum of:
    0.062516235 = weight(_text_:semantic in 946) [ClassicSimilarity], result of:
      0.062516235 = score(doc=946,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 946, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=946)
  0.2 = coord(1/5)
```
Abstract

Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the literature. In this paper, we present a study of methods for automatically generating gene summaries from biomedical literature. Unlike most existing work on automatic text summarization, in which the generated summary is often a list of extracted sentences, we propose to generate a semi-structured summary which consists of sentences covering specific semantic aspects of a gene. Such a semi-structured summary is more appropriate for describing genes and poses special challenges for automatic text summarization. We propose a two-stage approach to generate such a summary for a given gene - first retrieving articles about a gene and then extracting sentences for each specified semantic aspect. We address the issue of gene name variation in the first stage and propose several different methods for sentence extraction in the second stage. We evaluate the proposed methods using a test set with 20 genes. Experiment results show that the proposed methods can generate useful semi-structured gene summaries automatically from biomedical literature, and our proposed methods outperform general purpose summarization methods. Among all the proposed methods for sentence extraction, a probabilistic language modeling approach that models gene context performs the best.
Kim, H.H.; Kim, Y.H.: Video summarization using event-related potential responses to shot boundaries in real-time video watching (2019) 0.01
```
0.012503247 = product of:
  0.062516235 = sum of:
    0.062516235 = weight(_text_:semantic in 4685) [ClassicSimilarity], result of:
      0.062516235 = score(doc=4685,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 4685, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4685)
  0.2 = coord(1/5)
```
Abstract

Our aim was to develop an event-related potential (ERP)-based method to construct a video skim consisting of key shots to bridge the semantic gap between the topic inferred from a whole video and that from its summary. Mayer's cognitive model was examined, wherein the topic integration process of a user evoked by a visual stimulus can be associated with long-latency ERP components. We determined that long-latency ERP components are suitable for measuring a user's neuronal response through a literature review. We hypothesized that N300 is specific to the categorization of all shots regardless of topic relevance, N400 is specific for the semantic mismatching process for topic-irrelevant shots, and P600 is specific for the context updating process for topic-relevant shots. In our experiment, the N400 component led to more negative ERP signals in response to topic-irrelevant shots than to topic-relevant shots and showed a fronto-central scalp pattern. P600 elicited more positive ERP signals for topic-relevant shots than for topic-irrelevant shots and showed a fronto-central scalp pattern. We used discriminant and artificial neural network (ANN) analyses to decode video shot relevance and observed that the ANN produced particularly high success rates: 91.3% from the training set and 100% from the test set.
Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.01
```
0.012503247 = product of:
  0.062516235 = sum of:
    0.062516235 = weight(_text_:semantic in 5400) [ClassicSimilarity], result of:
      0.062516235 = score(doc=5400,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 5400, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
  0.2 = coord(1/5)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
Ercan, G.; Cicekli, I.: Using lexical chains for keyword extraction (2007) 0.01
```
0.012377582 = product of:
  0.06188791 = sum of:
    0.06188791 = weight(_text_:semantic in 951) [ClassicSimilarity], result of:
      0.06188791 = score(doc=951,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32156807 = fieldWeight in 951, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=951)
  0.2 = coord(1/5)
```
Abstract

Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.
Lam, W.; Chan, K.; Radev, D.; Saggion, H.; Teufel, S.: Context-based generic cross-lingual retrieval of documents and automated summaries (2005) 0.01
```
0.011230669 = product of:
  0.056153342 = sum of:
    0.056153342 = weight(_text_:retrieval in 1965) [ClassicSimilarity], result of:
      0.056153342 = score(doc=1965,freq=8.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.40105087 = fieldWeight in 1965, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1965)
  0.2 = coord(1/5)
```
Abstract

We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as weIl as in the documents based an co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as weIl as retrieving English documents from Chinese queries. Extensive experiments have been conducted an a large-scale parallel corpus enabling studies an retrieval performance for two different cross-lingual settings of full-length documents as weIl as automated summaries.
Moens, M.-F.; Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases (1997) 0.01
```
0.010609357 = product of:
  0.05304678 = sum of:
    0.05304678 = weight(_text_:semantic in 2256) [ClassicSimilarity], result of:
      0.05304678 = score(doc=2256,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.2756298 = fieldWeight in 2256, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2256)
  0.2 = coord(1/5)
```
Abstract

The SALOMON system automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts relevant text units from the case text to form a case summary. Such a case profile facilitates the rapid determination of the relevance of the case or may be employed in text search. In a first important abstracting step SALOMON performs an initial categorization of legal criminal cases and structures the case text into separate legally relevant and irrelevant components. A text grammar represented as a semantic network is used to automatically determine the category of the case and its components. Extracts from the case general data and identifies text portions relevant for further abstracting. Prior knowledge of the text structure and its indicative cues may support automatic abstracting. A text grammar is a promising form for representing the knowledge involved
Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.01
```
0.010609357 = product of:
  0.05304678 = sum of:
    0.05304678 = weight(_text_:semantic in 949) [ClassicSimilarity], result of:
      0.05304678 = score(doc=949,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.2756298 = fieldWeight in 949, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=949)
  0.2 = coord(1/5)
```
Abstract

We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric information, and a better performance by the rouge measure than all but one of the single-document summarizers participating in DUC-2002. Anaphoric information is automatically extracted using a new release of our own anaphora resolution system, guitar, which incorporates proper noun resolution. Our summarizer also includes a new approach for automatically identifying the dimensionality reduction of a document on the basis of the desired summarization percentage. Anaphoric information is also used to check the coherence of the summary produced by our summarizer, by a reference checker module which identifies anaphoric resolution errors caused by sentence extraction.
Dorr, B.J.; Gaasterland, T.: Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction (2007) 0.01
```
0.010609357 = product of:
  0.05304678 = sum of:
    0.05304678 = weight(_text_:semantic in 950) [ClassicSimilarity], result of:
      0.05304678 = score(doc=950,freq=2.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.2756298 = fieldWeight in 950, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=950)
  0.2 = coord(1/5)
```
Abstract

This paper presents a model that incorporates contemporary theories of tense and aspect and develops a new framework for extracting temporal relations between two sentence-internal events, given their tense, aspect, and a temporal connecting word relating the two events. A linguistic constraint on event combination has been implemented to detect incorrect parser analyses and potentially apply syntactic reanalysis or semantic reinterpretation - in preparation for subsequent processing for multi-document summarization. An important contribution of this work is the extension of two different existing theoretical frameworks - Hornstein's 1990 theory of tense analysis and Allen's 1984 theory on event ordering - and the combination of both into a unified system for representing and constraining combinations of different event types (points, closed intervals, and open-ended intervals). We show that our theoretical results have been verified in a large-scale corpus analysis. The framework is designed to inform a temporally motivated sentence-ordering module in an implemented multi-document summarization system.
Abdi, A.; Shamsuddin, S.M.; Aliguliyev, R.M.: QMOS: Query-based multi-documents opinion-oriented summarization (2018) 0.01
```
0.010002597 = product of:
  0.050012987 = sum of:
    0.050012987 = weight(_text_:semantic in 5089) [ClassicSimilarity], result of:
      0.050012987 = score(doc=5089,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.25986627 = fieldWeight in 5089, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.03125 = fieldNorm(doc=5089)
  0.2 = coord(1/5)
```
Abstract

Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users' needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.
Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.01
```
0.009804162 = product of:
  0.049020812 = sum of:
    0.049020812 = product of:
      0.098041624 = sum of:
        0.098041624 = weight(_text_:web in 953) [ClassicSimilarity], result of:
          0.098041624 = score(doc=953,freq=18.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.64902663 = fieldWeight in 953, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.

Johnson, F.C.; Paice, C.D.; Black, W.J.; Neal, A.P.: ¬The application of linguistic processing to automatic abstract generation (1993) 0.01

0.00935889 = product of:
  0.04679445 = sum of:
    0.04679445 = weight(_text_:retrieval in 2290) [ClassicSimilarity], result of:
      0.04679445 = score(doc=2290,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33420905 = fieldWeight in 2290, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=2290)
  0.2 = coord(1/5)

Footnote: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.538-552.

Search (37 results, page 1 of 2)

Authors

Years

Languages

Themes