Search (24 results, page 1 of 2)

  • × theme_ss:"Automatisches Abstracting"
  1. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.03
    0.0268396 = product of:
      0.0939386 = sum of:
        0.021217827 = weight(_text_:subject in 5400) [ClassicSimilarity], result of:
          0.021217827 = score(doc=5400,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.19758089 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.023791125 = weight(_text_:classification in 5400) [ClassicSimilarity], result of:
          0.023791125 = score(doc=5400,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.02513852 = weight(_text_:bibliographic in 5400) [ClassicSimilarity], result of:
          0.02513852 = score(doc=5400,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.21506234 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.023791125 = weight(_text_:classification in 5400) [ClassicSimilarity], result of:
          0.023791125 = score(doc=5400,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.2857143 = coord(4/14)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
    Footnote
    Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.
  2. Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.02
    0.01526027 = product of:
      0.10682189 = sum of:
        0.053410944 = weight(_text_:classification in 953) [ClassicSimilarity], result of:
          0.053410944 = score(doc=953,freq=14.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.55856633 = fieldWeight in 953, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
        0.053410944 = weight(_text_:classification in 953) [ClassicSimilarity], result of:
          0.053410944 = score(doc=953,freq=14.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.55856633 = fieldWeight in 953, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=953)
      0.14285715 = coord(2/14)
    
    Abstract
    Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
  3. Moens, M.F.; Dumortier, J.: Use of a text grammar for generating highlight abstracts of magazine articles (2000) 0.01
    0.011293716 = product of:
      0.07905601 = sum of:
        0.029704956 = weight(_text_:subject in 4540) [ClassicSimilarity], result of:
          0.029704956 = score(doc=4540,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.27661324 = fieldWeight in 4540, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4540)
        0.04935105 = product of:
          0.0987021 = sum of:
            0.0987021 = weight(_text_:texts in 4540) [ClassicSimilarity], result of:
              0.0987021 = score(doc=4540,freq=4.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.5996243 = fieldWeight in 4540, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4540)
          0.5 = coord(1/2)
      0.14285715 = coord(2/14)
    
    Abstract
    Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.
  4. Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.01
    0.009389086 = product of:
      0.043815732 = sum of:
        0.016822865 = weight(_text_:classification in 5290) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5290,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5290)
        0.016822865 = weight(_text_:classification in 5290) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5290,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5290, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5290)
        0.010170003 = product of:
          0.020340007 = sum of:
            0.020340007 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
              0.020340007 = score(doc=5290,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.19345059 = fieldWeight in 5290, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5290)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.
    Date
    22. 7.2006 17:25:48
  5. Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 0.01
    0.007121728 = product of:
      0.09970418 = sum of:
        0.09970418 = product of:
          0.19940837 = sum of:
            0.19940837 = weight(_text_:texts in 6892) [ClassicSimilarity], result of:
              0.19940837 = score(doc=6892,freq=8.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                1.211424 = fieldWeight in 6892, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.078125 = fieldNorm(doc=6892)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Content
    Need for indexing and abstracting texts; attributes of texts; text representations and their use; selection of natural language index terms; assignment of controlled language index texts; automatic abstracting; applications
  6. Ahmad, K.: Text summarisation : the role of lexical cohesion analysis (1995) 0.00
    0.004934078 = product of:
      0.06907709 = sum of:
        0.06907709 = product of:
          0.13815418 = sum of:
            0.13815418 = weight(_text_:texts in 5795) [ClassicSimilarity], result of:
              0.13815418 = score(doc=5795,freq=6.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.8392992 = fieldWeight in 5795, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5795)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The work in automatic text summary focuses mainly on computational models of texts. The artificial intelligence related work in text summary deals mainly with narrative texts such as newspaper reports and stories. Presents a study on the summary of non-narrative texts such as those in scientific and technical communication. Discusses syntactic cohesion; lexical cohesion; complex lexical repetition; simple and complex paraphrase; bonds and links; and Tele-pattan; an architecture for cohesion based text analysis and summarisation system working on SGML
  7. Paice, C.D.: Automatic abstracting (1994) 0.00
    0.003560864 = product of:
      0.04985209 = sum of:
        0.04985209 = product of:
          0.09970418 = sum of:
            0.09970418 = weight(_text_:texts in 917) [ClassicSimilarity], result of:
              0.09970418 = score(doc=917,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.605712 = fieldWeight in 917, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.078125 = fieldNorm(doc=917)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The final report of the 2nd British Library abstracting project (the BLAB project), 1990-1992, which was carried out partly at the Computing Department of Lancaster University, and partly at the Centre for Computational Linguistics, UMIST. This project built on the results of the first project, of 1985-1987, to build a system designed create abstracts automatically from given texts
  8. Su, H.: Automatic abstracting (1996) 0.00
    0.003560864 = product of:
      0.04985209 = sum of:
        0.04985209 = product of:
          0.09970418 = sum of:
            0.09970418 = weight(_text_:texts in 150) [ClassicSimilarity], result of:
              0.09970418 = score(doc=150,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.605712 = fieldWeight in 150, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.078125 = fieldNorm(doc=150)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Presents an introductory overview of research into the automatic construction of abstracts from the texts of documents. Discusses the origin and definition of automatic abstracting; reasons for using automatic abstracting; methods of automatic abstracting; and evaluation problems
  9. Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 0.00
    0.003560864 = product of:
      0.04985209 = sum of:
        0.04985209 = product of:
          0.09970418 = sum of:
            0.09970418 = weight(_text_:texts in 1949) [ClassicSimilarity], result of:
              0.09970418 = score(doc=1949,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.605712 = fieldWeight in 1949, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1949)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
  10. Moens, M.-F.; Uyttendaele, C.; Dumotier, J.: Abstracting of legal cases : the potential of clustering based on the selection of representative objects (1999) 0.00
    0.0030214933 = product of:
      0.042300906 = sum of:
        0.042300906 = product of:
          0.08460181 = sum of:
            0.08460181 = weight(_text_:texts in 2944) [ClassicSimilarity], result of:
              0.08460181 = score(doc=2944,freq=4.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.5139637 = fieldWeight in 2944, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2944)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The SALOMON project automatically summarizes Belgian criminal cases in order to improve access to the large number of existing and future court decisions. SALOMON extracts text units from the case text to form a case summary. Such a case summary facilitates the rapid determination of the relevance of the case or may be employed in text search. an important part of the research concerns the development of techniques for automatic recognition of representative text paragraphs (or sentences) in texts of unrestricted domains. these techniques are employed to eliminate redundant material in the case texts, and to identify informative text paragraphs which are relevant to include in the case summary. An evaluation of a test set of 700 criminal cases demonstrates that the algorithms have an application potential for automatic indexing, abstracting, and text linkage
  11. Brandow, R.; Mitze, K.; Rau, L.F.: Automatic condensation of electronic publications by sentence selection (1995) 0.00
    0.002848691 = product of:
      0.039881673 = sum of:
        0.039881673 = product of:
          0.079763345 = sum of:
            0.079763345 = weight(_text_:texts in 2929) [ClassicSimilarity], result of:
              0.079763345 = score(doc=2929,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.4845696 = fieldWeight in 2929, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2929)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Description of a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portions of the texts (the löead), up to the target length of the summaries. 3 lengths of articles were evaluated for 250 documents by both systems, totalling 1.500 suitability judgements in all. The lead-based summaries outperformed the 'intelligent' summaries significantly, achieving acceptability ratings of over 90%, compared to 74,7%
  12. Uyttendaele, C.; Moens, M.-F.; Dumortier, J.: SALOMON: automatic abstracting of legal cases for effective access to court decisions (1998) 0.00
    0.0024926048 = product of:
      0.034896467 = sum of:
        0.034896467 = product of:
          0.069792934 = sum of:
            0.069792934 = weight(_text_:texts in 495) [ClassicSimilarity], result of:
              0.069792934 = score(doc=495,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.42399842 = fieldWeight in 495, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=495)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The SALOMON project summarises Belgian criminal cases in order to improve access to the large number of existing and future cases. A double methodology was used when developing SALOMON: the cases are processed by employing additional knowledge to interpret structural patterns and features on the one hand and by way of occurrence statistics of index terms on the other. SALOMON performs an initial categorisation and structuring of the cases and subsequently extracts the most relevant text units of the alleged offences and of the opinion of the court. The SALOMON techniques do not themselves solve any legal questions, but they do guide the use effectively towards relevant texts
  13. Moens, M.-F.: Summarizing court decisions (2007) 0.00
    0.0024926048 = product of:
      0.034896467 = sum of:
        0.034896467 = product of:
          0.069792934 = sum of:
            0.069792934 = weight(_text_:texts in 954) [ClassicSimilarity], result of:
              0.069792934 = score(doc=954,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.42399842 = fieldWeight in 954, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=954)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    In the field of law there is an absolute need for summarizing the texts of court decisions in order to make the content of the cases easily accessible for legal professionals. During the SALOMON and MOSAIC projects we investigated the summarization and retrieval of legal cases. This article presents some of the main findings while integrating the research results of experiments on legal document summarization by other research groups. In addition, we propose novel avenues of research for automatic text summarization, which we currently exploit when summarizing court decisions in the ACILA project. Techniques for automated concept learning and argument recognition are here the most challenging.
  14. Abdi, A.; Idris, N.; Alguliev, R.M.; Aliguliyev, R.M.: Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems (2015) 0.00
    0.0021365185 = product of:
      0.029911257 = sum of:
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 2681) [ClassicSimilarity], result of:
              0.059822515 = score(doc=2681,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 2681, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2681)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Summary writing is a process for creating a short version of a source text. It can be used as a measure of understanding. As grading students' summaries is a very time-consuming task, computer-assisted assessment can help teachers perform the grading more effectively. Several techniques, such as BLEU, ROUGE, N-gram co-occurrence, Latent Semantic Analysis (LSA), LSA_Ngram and LSA_ERB, have been proposed to support the automatic assessment of students' summaries. Since these techniques are more suitable for long texts, their performance is not satisfactory for the evaluation of short summaries. This paper proposes a specialized method that works well in assessing short summaries. Our proposed method integrates the semantic relations between words, and their syntactic composition. As a result, the proposed method is able to obtain high accuracy and improve the performance compared with the current techniques. Experiments have displayed that it is to be preferred over the existing techniques. A summary evaluation system based on the proposed method has also been developed.
  15. Martinez-Romo, J.; Araujo, L.; Fernandez, A.D.: SemGraph : extracting keyphrases following a novel semantic graph-based approach (2016) 0.00
    0.0021365185 = product of:
      0.029911257 = sum of:
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 2832) [ClassicSimilarity], result of:
              0.059822515 = score(doc=2832,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 2832, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2832)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
  16. Liang, S.-F.; Devlin, S.; Tait, J.: Investigating sentence weighting components for automatic summarisation (2007) 0.00
    0.0020356115 = product of:
      0.02849856 = sum of:
        0.02849856 = product of:
          0.05699712 = sum of:
            0.05699712 = weight(_text_:schemes in 899) [ClassicSimilarity], result of:
              0.05699712 = score(doc=899,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.35474116 = fieldWeight in 899, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=899)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. It subsequently proved to be a reliable indicator for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
  17. Reeve, L.H.; Han, H.; Brooks, A.D.: ¬The use of domain-specific concepts in biomedical text summarization (2007) 0.00
    0.001780432 = product of:
      0.024926046 = sum of:
        0.024926046 = product of:
          0.04985209 = sum of:
            0.04985209 = weight(_text_:texts in 955) [ClassicSimilarity], result of:
              0.04985209 = score(doc=955,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.302856 = fieldWeight in 955, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=955)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician's evaluation of three randomly-selected papers from an evaluation corpus to show that the author's abstract does not always reflect the entire contents of the full-text.
  18. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.00
    0.0011622861 = product of:
      0.016272005 = sum of:
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 6599) [ClassicSimilarity], result of:
              0.03254401 = score(doc=6599,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 6599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6599)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    26. 2.1997 10:22:43
  19. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.00
    0.0011622861 = product of:
      0.016272005 = sum of:
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 6751) [ClassicSimilarity], result of:
              0.03254401 = score(doc=6751,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 6751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6751)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    6. 3.1997 16:22:15
  20. Jones, P.A.; Bradbeer, P.V.G.: Discovery of optimal weights in a concept selection system (1996) 0.00
    0.0011622861 = product of:
      0.016272005 = sum of:
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 6974) [ClassicSimilarity], result of:
              0.03254401 = score(doc=6974,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 6974, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6974)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon