Document (#38108)

Author
Liu, R.-L.
Title
¬A passage extractor for classification of disease aspect information
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277
Year
2013
Abstract
Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
Theme
Automatisches Klassifizieren
Field
Medizin

Similar documents (content)

  1. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.36
    0.36017683 = sum of:
      0.36017683 = product of:
        1.0004911 = sum of:
          0.023928642 = weight(abstract_txt:techniques in 2765) [ClassicSimilarity], result of:
            0.023928642 = score(doc=2765,freq=3.0), product of:
              0.055768065 = queryWeight, product of:
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.01231124 = queryNorm
              0.4290743 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.0197129 = weight(abstract_txt:thus in 2765) [ClassicSimilarity], result of:
            0.0197129 = score(doc=2765,freq=1.0), product of:
              0.07068289 = queryWeight, product of:
                1.1258081 = boost
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.01231124 = queryNorm
              0.2788921 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0997415 = idf(docFreq=732, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.03573448 = weight(abstract_txt:categories in 2765) [ClassicSimilarity], result of:
            0.03573448 = score(doc=2765,freq=3.0), product of:
              0.07286157 = queryWeight, product of:
                1.143027 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.01231124 = queryNorm
              0.49044347 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.024519537 = weight(abstract_txt:often in 2765) [ClassicSimilarity], result of:
            0.024519537 = score(doc=2765,freq=1.0), product of:
              0.09358061 = queryWeight, product of:
                1.5865209 = boost
                4.791134 = idf(docFreq=997, maxDocs=44218)
                0.01231124 = queryNorm
              0.26201513 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.791134 = idf(docFreq=997, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.010958307 = weight(abstract_txt:information in 2765) [ClassicSimilarity], result of:
            0.010958307 = score(doc=2765,freq=3.0), product of:
              0.047786985 = queryWeight, product of:
                1.6033291 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01231124 = queryNorm
              0.22931573 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.018911771 = weight(abstract_txt:classification in 2765) [ClassicSimilarity], result of:
            0.018911771 = score(doc=2765,freq=1.0), product of:
              0.086625434 = queryWeight, product of:
                1.7625642 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01231124 = queryNorm
              0.21831661 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.31748822 = weight(abstract_txt:passages in 2765) [ClassicSimilarity], result of:
            0.31748822 = score(doc=2765,freq=14.0), product of:
              0.18703505 = queryWeight, product of:
                1.8313389 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.01231124 = queryNorm
              1.6974797 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.47060755 = weight(abstract_txt:passage in 2765) [ClassicSimilarity], result of:
            0.47060755 = score(doc=2765,freq=14.0), product of:
              0.27833915 = queryWeight, product of:
                2.7361505 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.01231124 = queryNorm
              1.6907703 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.07862969 = weight(abstract_txt:text in 2765) [ClassicSimilarity], result of:
            0.07862969 = score(doc=2765,freq=4.0), product of:
              0.1777754 = queryWeight, product of:
                3.5708618 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01231124 = queryNorm
              0.4422979 = fieldWeight in 2765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
        0.36 = coord(9/25)
    
  2. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.29
    0.29215312 = sum of:
      0.29215312 = product of:
        0.91297853 = sum of:
          0.041681543 = weight(abstract_txt:categories in 900) [ClassicSimilarity], result of:
            0.041681543 = score(doc=900,freq=2.0), product of:
              0.07286157 = queryWeight, product of:
                1.143027 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.01231124 = queryNorm
              0.5720649 = fieldWeight in 900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.03708616 = weight(abstract_txt:technique in 900) [ClassicSimilarity], result of:
            0.03708616 = score(doc=900,freq=1.0), product of:
              0.08492207 = queryWeight, product of:
                1.2340066 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.01231124 = queryNorm
              0.43670815 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.02826597 = weight(abstract_txt:problem in 900) [ClassicSimilarity], result of:
            0.02826597 = score(doc=900,freq=1.0), product of:
              0.0811121 = queryWeight, product of:
                1.4770516 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.01231124 = queryNorm
              0.3484803 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.049536943 = weight(abstract_txt:often in 900) [ClassicSimilarity], result of:
            0.049536943 = score(doc=900,freq=2.0), product of:
              0.09358061 = queryWeight, product of:
                1.5865209 = boost
                4.791134 = idf(docFreq=997, maxDocs=44218)
                0.01231124 = queryNorm
              0.5293505 = fieldWeight in 900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.791134 = idf(docFreq=997, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.009038259 = weight(abstract_txt:information in 900) [ClassicSimilarity], result of:
            0.009038259 = score(doc=900,freq=1.0), product of:
              0.047786985 = queryWeight, product of:
                1.6033291 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01231124 = queryNorm
              0.18913643 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.027016817 = weight(abstract_txt:classification in 900) [ClassicSimilarity], result of:
            0.027016817 = score(doc=900,freq=1.0), product of:
              0.086625434 = queryWeight, product of:
                1.7625642 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01231124 = queryNorm
              0.3118809 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.056164064 = weight(abstract_txt:text in 900) [ClassicSimilarity], result of:
            0.056164064 = score(doc=900,freq=1.0), product of:
              0.1777754 = queryWeight, product of:
                3.5708618 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01231124 = queryNorm
              0.3159271 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.66418874 = weight(abstract_txt:classifiers in 900) [ClassicSimilarity], result of:
            0.66418874 = score(doc=900,freq=6.0), product of:
              0.4613852 = queryWeight, product of:
                4.9819536 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01231124 = queryNorm
              1.4395536 = fieldWeight in 900, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
        0.32 = coord(8/25)
    
  3. Hofmann-Apitius, M.: Direct use of information extraction from scientific text for modeling and simulation in the life sciences (2009) 0.19
    0.19431835 = sum of:
      0.19431835 = product of:
        0.80965984 = sum of:
          0.08253692 = weight(abstract_txt:etiology in 2814) [ClassicSimilarity], result of:
            0.08253692 = score(doc=2814,freq=1.0), product of:
              0.13332395 = queryWeight, product of:
                1.0933175 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01231124 = queryNorm
              0.6190705 = fieldWeight in 2814, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
          0.033914175 = weight(abstract_txt:identification in 2814) [ClassicSimilarity], result of:
            0.033914175 = score(doc=2814,freq=1.0), product of:
              0.092841074 = queryWeight, product of:
                1.2902602 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.01231124 = queryNorm
              0.3652928 = fieldWeight in 2814, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
          0.015361846 = weight(abstract_txt:about in 2814) [ClassicSimilarity], result of:
            0.015361846 = score(doc=2814,freq=1.0), product of:
              0.06268236 = queryWeight, product of:
                1.298451 = boost
                3.9211915 = idf(docFreq=2381, maxDocs=44218)
                0.01231124 = queryNorm
              0.24507447 = fieldWeight in 2814, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9211915 = idf(docFreq=2381, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
          0.014461216 = weight(abstract_txt:information in 2814) [ClassicSimilarity], result of:
            0.014461216 = score(doc=2814,freq=4.0), product of:
              0.047786985 = queryWeight, product of:
                1.6033291 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01231124 = queryNorm
              0.3026183 = fieldWeight in 2814, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
          0.11887691 = weight(abstract_txt:text in 2814) [ClassicSimilarity], result of:
            0.11887691 = score(doc=2814,freq=7.0), product of:
              0.1777754 = queryWeight, product of:
                3.5708618 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01231124 = queryNorm
              0.6686916 = fieldWeight in 2814, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
          0.54450876 = weight(abstract_txt:disease in 2814) [ClassicSimilarity], result of:
            0.54450876 = score(doc=2814,freq=4.0), product of:
              0.5651433 = queryWeight, product of:
                5.9555316 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.01231124 = queryNorm
              0.9634879 = fieldWeight in 2814, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=2814)
        0.24 = coord(6/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.19
    0.18925665 = sum of:
      0.18925665 = product of:
        0.6759166 = sum of:
          0.050017852 = weight(abstract_txt:categories in 5003) [ClassicSimilarity], result of:
            0.050017852 = score(doc=5003,freq=2.0), product of:
              0.07286157 = queryWeight, product of:
                1.143027 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.01231124 = queryNorm
              0.68647784 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.06513739 = weight(abstract_txt:texts in 5003) [ClassicSimilarity], result of:
            0.06513739 = score(doc=5003,freq=2.0), product of:
              0.08688978 = queryWeight, product of:
                1.2482213 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.01231124 = queryNorm
              0.7496553 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.09679701 = weight(abstract_txt:classifier in 5003) [ClassicSimilarity], result of:
            0.09679701 = score(doc=5003,freq=1.0), product of:
              0.14256069 = queryWeight, product of:
                1.5988477 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.01231124 = queryNorm
              0.6789881 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.010845913 = weight(abstract_txt:information in 5003) [ClassicSimilarity], result of:
            0.010845913 = score(doc=5003,freq=1.0), product of:
              0.047786985 = queryWeight, product of:
                1.6033291 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01231124 = queryNorm
              0.22696373 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.03242018 = weight(abstract_txt:classification in 5003) [ClassicSimilarity], result of:
            0.03242018 = score(doc=5003,freq=1.0), product of:
              0.086625434 = queryWeight, product of:
                1.7625642 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01231124 = queryNorm
              0.37425706 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.09531358 = weight(abstract_txt:text in 5003) [ClassicSimilarity], result of:
            0.09531358 = score(doc=5003,freq=2.0), product of:
              0.1777754 = queryWeight, product of:
                3.5708618 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01231124 = queryNorm
              0.53614604 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.3253847 = weight(abstract_txt:classifiers in 5003) [ClassicSimilarity], result of:
            0.3253847 = score(doc=5003,freq=1.0), product of:
              0.4613852 = queryWeight, product of:
                4.9819536 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01231124 = queryNorm
              0.7052344 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
        0.28 = coord(7/25)
    
  5. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.18
    0.17945954 = sum of:
      0.17945954 = product of:
        0.74774814 = sum of:
          0.019970983 = weight(abstract_txt:several in 3331) [ClassicSimilarity], result of:
            0.019970983 = score(doc=3331,freq=1.0), product of:
              0.056209832 = queryWeight, product of:
                1.003953 = boost
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.01231124 = queryNorm
              0.35529342 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.03708616 = weight(abstract_txt:technique in 3331) [ClassicSimilarity], result of:
            0.03708616 = score(doc=3331,freq=1.0), product of:
              0.08492207 = queryWeight, product of:
                1.2340066 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.01231124 = queryNorm
              0.43670815 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.009038259 = weight(abstract_txt:information in 3331) [ClassicSimilarity], result of:
            0.009038259 = score(doc=3331,freq=1.0), product of:
              0.047786985 = queryWeight, product of:
                1.6033291 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.01231124 = queryNorm
              0.18913643 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.027016817 = weight(abstract_txt:classification in 3331) [ClassicSimilarity], result of:
            0.027016817 = score(doc=3331,freq=1.0), product of:
              0.086625434 = queryWeight, product of:
                1.7625642 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01231124 = queryNorm
              0.3118809 = fieldWeight in 3331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.11232813 = weight(abstract_txt:text in 3331) [ClassicSimilarity], result of:
            0.11232813 = score(doc=3331,freq=4.0), product of:
              0.1777754 = queryWeight, product of:
                3.5708618 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01231124 = queryNorm
              0.6318542 = fieldWeight in 3331, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
          0.5423078 = weight(abstract_txt:classifiers in 3331) [ClassicSimilarity], result of:
            0.5423078 = score(doc=3331,freq=4.0), product of:
              0.4613852 = queryWeight, product of:
                4.9819536 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01231124 = queryNorm
              1.1753906 = fieldWeight in 3331, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=3331)
        0.24 = coord(6/25)