Search (1 results, page 1 of 1)

  • × author_ss:"Goh, D.H."
  • × theme_ss:"Computerlinguistik"
  1. Ou, S.; Khoo, C.; Goh, D.H.; Heng, H.-Y.: Automatic discourse parsing of sociology dissertation abstracts as sentence categorization (2004) 0.02
    0.01973276 = product of:
      0.05919828 = sum of:
        0.02621591 = weight(_text_:computer in 2676) [ClassicSimilarity], result of:
          0.02621591 = score(doc=2676,freq=2.0), product of:
            0.16231956 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.044416238 = queryNorm
            0.16150802 = fieldWeight in 2676, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.03125 = fieldNorm(doc=2676)
        0.032982368 = product of:
          0.065964736 = sum of:
            0.065964736 = weight(_text_:programs in 2676) [ClassicSimilarity], result of:
              0.065964736 = score(doc=2676,freq=2.0), product of:
                0.25748047 = queryWeight, product of:
                  5.79699 = idf(docFreq=364, maxDocs=44218)
                  0.044416238 = queryNorm
                0.25619316 = fieldWeight in 2676, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.79699 = idf(docFreq=364, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2676)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Content
    1. Introduction This paper reports our initial effort to develop an automatic method for parsing the discourse structure of sociology dissertation abstracts. This study is part of a broader study to develop a method for multi-document summarization. Accurate discourse parsing will make it easier to perform automatic multi-document summarization of dissertation abstracts. In a previous study, we determined that the macro-level structure of dissertation abstracts typically has five sections (Khoo et al., 2002). In this study, we treated discourse parsing as a text categorization problem - assigning each sentence in a dissertation abstract to one of the five predefined sections or categories. Decision tree induction, a machine-learning method, was applied to word tokens found in the abstracts to construct a decision tree model for the categorization purpose. Decision tree induction was selected primarily because decision tree models are easy to interpret and can be converted to rules that can be incorporated in other computer programs. A well-known decision-tree induction program, C5.0 (Quinlan, 1993), was used in this study.