Search (2 results, page 1 of 1)

Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.02
```
0.01756874 = product of:
  0.03513748 = sum of:
    0.03513748 = sum of:
      0.009593598 = weight(_text_:a in 1107) [ClassicSimilarity], result of:
        0.009593598 = score(doc=1107,freq=24.0), product of:
          0.043477926 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.037706986 = queryNorm
          0.22065444 = fieldWeight in 1107, product of:
            4.8989797 = tf(freq=24.0), with freq of:
              24.0 = termFreq=24.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
      0.02554388 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
        0.02554388 = score(doc=1107,freq=2.0), product of:
          0.13204344 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.037706986 = queryNorm
          0.19345059 = fieldWeight in 1107, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1107)
  0.5 = coord(1/2)
```
Abstract

Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.

Date

28.10.2013 19:22:57

Type

a
Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.00
```
0.0014390396 = product of:
  0.0028780792 = sum of:
    0.0028780792 = product of:
      0.0057561584 = sum of:
        0.0057561584 = weight(_text_:a in 3331) [ClassicSimilarity], result of:
          0.0057561584 = score(doc=3331,freq=6.0), product of:
            0.043477926 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.037706986 = queryNorm
            0.13239266 = fieldWeight in 3331, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3331)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.

Type

a