Search (2 results, page 1 of 1)

Hahn, U.; Reimer, U.: Informationslinguistische Konzepte der Volltextverarbeitung in TOPIC (1983) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 450) [ClassicSimilarity], result of:
          0.009471525 = score(doc=450,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 450, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=450)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Ou, S.; Khoo, C.; Goh, D.H.; Heng, H.-Y.: Automatic discourse parsing of sociology dissertation abstracts as sentence categorization (2004) 0.00
```
0.0022438213 = product of:
  0.0044876426 = sum of:
    0.0044876426 = product of:
      0.008975285 = sum of:
        0.008975285 = weight(_text_:a in 2676) [ClassicSimilarity], result of:
          0.008975285 = score(doc=2676,freq=22.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.16900843 = fieldWeight in 2676, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=2676)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We investigated an approach to automatic discourse parsing of sociology dissertation abstracts as a sentence categorization task. Decision tree induction was used for the automatic categorization. Three models were developed. Model 1 made use of word tokens found in the sentences. Model 2 made use of both word tokens and sentence position in the abstract. In addition to the attributes used in Model 2, Model 3 also considered information regarding the presence of indicator words in surrounding sentences. Model 3 obtained the highest accuracy rate of 74.5 % when applied to a test sample, compared to 71.6% for Model 2 and 60.8% for Model 1. The results indicated that information about sentence position can substantially increase the accuracy of categorization, and indicator words in earlier sentences (before the sentence being processed) also contribute to the categorization accuracy.

Content

1. Introduction This paper reports our initial effort to develop an automatic method for parsing the discourse structure of sociology dissertation abstracts. This study is part of a broader study to develop a method for multi-document summarization. Accurate discourse parsing will make it easier to perform automatic multi-document summarization of dissertation abstracts. In a previous study, we determined that the macro-level structure of dissertation abstracts typically has five sections (Khoo et al., 2002). In this study, we treated discourse parsing as a text categorization problem - assigning each sentence in a dissertation abstract to one of the five predefined sections or categories. Decision tree induction, a machine-learning method, was applied to word tokens found in the abstracts to construct a decision tree model for the categorization purpose. Decision tree induction was selected primarily because decision tree models are easy to interpret and can be converted to rules that can be incorporated in other computer programs. A well-known decision-tree induction program, C5.0 (Quinlan, 1993), was used in this study.

Type

a

Search (2 results, page 1 of 1)

Authors

Years

Languages