Document (#40930)

Author
Yim, W.-w.
Kwan, S.W.
Yetisgen, M.
Title
Classifying tumor event attributes in radiology reports
Source
Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2662-2674
Year
2017
Abstract
Radiology reports contain vital diagnostic information that characterizes patient disease progression. However, information from reports is represented in free text, which is difficult to query against for secondary use. Automatic extraction of important information, such as tumor events using natural language processing, offers possibilities in improved clinical decision support, cohort identification, and retrospective evidence-based research for cancer patients. The goal of this work was to classify tumor event attributes: negation, temporality, and malignancy, using biomedical ontology and linguistically enriched features. We report our results on an annotated corpus of 101 hepatocellular carcinoma patient radiology reports, and show that the improved classification improves overall template structuring. Classification performances for negation identification, past temporality classification, and malignancy classification were at 0.94, 0.62, and 0.77 F1, respectively. Incorporating the attributes into full templates led to an improvement of 0.72 F1 for tumor-related events over a baseline of 0.65 F1. Improvement of negation, malignancy, and temporality classifications led to significant improvements in template extraction for the majority of categories. We present our machine-learning approach to identifying these several tumor event attributes from radiology reports, as well as highlight challenges and areas for improvement.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23937/full.
Footnote
Beitrag in einem Special issue on biomedical information retrieval.
Field
Medizin

Similar documents (content)

  1. Pluye, P.; Grad, R.; Repchinsky, C.; Jovaisas, B.; Johnson-Lafleur, J.; Carrier, M.-E.; Granikov, V.; Farrell, B.; Rodriguez, C.; Bartlett, G.; Loiselle, C.; Légaré, F.: Four levels of outcomes of information-seeking : a mixed methods study in primary health care (2013) 0.16
    0.15893641 = sum of:
      0.15893641 = product of:
        0.56763005 = sum of:
          0.014499232 = weight(abstract_txt:information in 534) [ClassicSimilarity], result of:
            0.014499232 = score(doc=534,freq=9.0), product of:
              0.031941738 = queryWeight, product of:
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.013193906 = queryNorm
              0.45392746 = fieldWeight in 534, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.09739368 = weight(abstract_txt:clinical in 534) [ClassicSimilarity], result of:
            0.09739368 = score(doc=534,freq=5.0), product of:
              0.0959105 = queryWeight, product of:
                1.0004449 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013193906 = queryNorm
              1.0154642 = fieldWeight in 534, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.0519943 = weight(abstract_txt:disease in 534) [ClassicSimilarity], result of:
            0.0519943 = score(doc=534,freq=1.0), product of:
              0.10792933 = queryWeight, product of:
                1.0612797 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.013193906 = queryNorm
              0.48174396 = fieldWeight in 534, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.053567346 = weight(abstract_txt:patients in 534) [ClassicSimilarity], result of:
            0.053567346 = score(doc=534,freq=1.0), product of:
              0.110095374 = queryWeight, product of:
                1.0718762 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013193906 = queryNorm
              0.48655403 = fieldWeight in 534, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.21614668 = weight(abstract_txt:patient in 534) [ClassicSimilarity], result of:
            0.21614668 = score(doc=534,freq=5.0), product of:
              0.20559916 = queryWeight, product of:
                2.0715039 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.013193906 = queryNorm
              1.0513014 = fieldWeight in 534, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.079296805 = weight(abstract_txt:improvement in 534) [ClassicSimilarity], result of:
            0.079296805 = score(doc=534,freq=1.0), product of:
              0.2062433 = queryWeight, product of:
                2.541035 = boost
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.013193906 = queryNorm
              0.38448185 = fieldWeight in 534, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
          0.054732043 = weight(abstract_txt:reports in 534) [ClassicSimilarity], result of:
            0.054732043 = score(doc=534,freq=1.0), product of:
              0.1909795 = queryWeight, product of:
                3.1567373 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.013193906 = queryNorm
              0.28658596 = fieldWeight in 534, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.0625 = fieldNorm(doc=534)
        0.28 = coord(7/25)
    
  2. Lomax, E.C.; Lowe, H.J.; Logan, T.F.; Detlefsen, E.G.: ¬An investigation of the information seeking behavior of medical oncologists in Metropolitan Pittsburgh using a multi-method approach (1999) 0.15
    0.15030445 = sum of:
      0.15030445 = product of:
        0.53680164 = sum of:
          0.008371135 = weight(abstract_txt:information in 289) [ClassicSimilarity], result of:
            0.008371135 = score(doc=289,freq=3.0), product of:
              0.031941738 = queryWeight, product of:
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.013193906 = queryNorm
              0.26207513 = fieldWeight in 289, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.06159717 = weight(abstract_txt:clinical in 289) [ClassicSimilarity], result of:
            0.06159717 = score(doc=289,freq=2.0), product of:
              0.0959105 = queryWeight, product of:
                1.0004449 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013193906 = queryNorm
              0.64223593 = fieldWeight in 289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.0519943 = weight(abstract_txt:disease in 289) [ClassicSimilarity], result of:
            0.0519943 = score(doc=289,freq=1.0), product of:
              0.10792933 = queryWeight, product of:
                1.0612797 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.013193906 = queryNorm
              0.48174396 = fieldWeight in 289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.053567346 = weight(abstract_txt:patients in 289) [ClassicSimilarity], result of:
            0.053567346 = score(doc=289,freq=1.0), product of:
              0.110095374 = queryWeight, product of:
                1.0718762 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013193906 = queryNorm
              0.48655403 = fieldWeight in 289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.09166835 = weight(abstract_txt:diagnostic in 289) [ClassicSimilarity], result of:
            0.09166835 = score(doc=289,freq=2.0), product of:
              0.12501782 = queryWeight, product of:
                1.1422102 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.013193906 = queryNorm
              0.7332423 = fieldWeight in 289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.13290016 = weight(abstract_txt:cancer in 289) [ClassicSimilarity], result of:
            0.13290016 = score(doc=289,freq=4.0), product of:
              0.12710598 = queryWeight, product of:
                1.1517098 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.013193906 = queryNorm
              1.0455854 = fieldWeight in 289, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
          0.13670315 = weight(abstract_txt:patient in 289) [ClassicSimilarity], result of:
            0.13670315 = score(doc=289,freq=2.0), product of:
              0.20559916 = queryWeight, product of:
                2.0715039 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.013193906 = queryNorm
              0.6649013 = fieldWeight in 289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=289)
        0.28 = coord(7/25)
    
  3. Cruz Díaz, N.P.; Maña López, M.J.; Mata Vázquez, J.; Pachón Álvarez, V.: ¬A machine-learning approach to negation and speculation detection in clinical texts (2012) 0.11
    0.111107014 = sum of:
      0.111107014 = product of:
        0.69441885 = sum of:
          0.004833077 = weight(abstract_txt:information in 283) [ClassicSimilarity], result of:
            0.004833077 = score(doc=283,freq=1.0), product of:
              0.031941738 = queryWeight, product of:
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.013193906 = queryNorm
              0.15130915 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=283)
          0.07544082 = weight(abstract_txt:clinical in 283) [ClassicSimilarity], result of:
            0.07544082 = score(doc=283,freq=3.0), product of:
              0.0959105 = queryWeight, product of:
                1.0004449 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013193906 = queryNorm
              0.7865752 = fieldWeight in 283, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=283)
          0.054732043 = weight(abstract_txt:reports in 283) [ClassicSimilarity], result of:
            0.054732043 = score(doc=283,freq=1.0), product of:
              0.1909795 = queryWeight, product of:
                3.1567373 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.013193906 = queryNorm
              0.28658596 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.0625 = fieldNorm(doc=283)
          0.5594129 = weight(abstract_txt:negation in 283) [ClassicSimilarity], result of:
            0.5594129 = score(doc=283,freq=6.0), product of:
              0.41749114 = queryWeight, product of:
                3.6153 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.013193906 = queryNorm
              1.3399396 = fieldWeight in 283, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0625 = fieldNorm(doc=283)
        0.16 = coord(4/25)
    
  4. McNamara, M.; Arnold, C.; Sarma, K.; Aberle, D.; Garon, E.; Bui, A.A.T.: Patient portal preferences : perspectives on imaging information (2015) 0.10
    0.09696432 = sum of:
      0.09696432 = product of:
        0.48482162 = sum of:
          0.012082692 = weight(abstract_txt:information in 2134) [ClassicSimilarity], result of:
            0.012082692 = score(doc=2134,freq=4.0), product of:
              0.031941738 = queryWeight, product of:
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.013193906 = queryNorm
              0.37827286 = fieldWeight in 2134, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=2134)
          0.11597671 = weight(abstract_txt:patients in 2134) [ClassicSimilarity], result of:
            0.11597671 = score(doc=2134,freq=3.0), product of:
              0.110095374 = queryWeight, product of:
                1.0718762 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013193906 = queryNorm
              1.0534204 = fieldWeight in 2134, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.078125 = fieldNorm(doc=2134)
          0.117468245 = weight(abstract_txt:cancer in 2134) [ClassicSimilarity], result of:
            0.117468245 = score(doc=2134,freq=2.0), product of:
              0.12710598 = queryWeight, product of:
                1.1517098 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.013193906 = queryNorm
              0.9241756 = fieldWeight in 2134, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=2134)
          0.17087893 = weight(abstract_txt:patient in 2134) [ClassicSimilarity], result of:
            0.17087893 = score(doc=2134,freq=2.0), product of:
              0.20559916 = queryWeight, product of:
                2.0715039 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.013193906 = queryNorm
              0.83112663 = fieldWeight in 2134, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=2134)
          0.06841505 = weight(abstract_txt:reports in 2134) [ClassicSimilarity], result of:
            0.06841505 = score(doc=2134,freq=1.0), product of:
              0.1909795 = queryWeight, product of:
                3.1567373 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.013193906 = queryNorm
              0.35823244 = fieldWeight in 2134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.078125 = fieldNorm(doc=2134)
        0.2 = coord(5/25)
    
  5. Bean, C.A.: Representation of medical knowledge for automated semantic interpretation of clinical reports (2004) 0.09
    0.094745845 = sum of:
      0.094745845 = product of:
        0.59216154 = sum of:
          0.054444723 = weight(abstract_txt:clinical in 2660) [ClassicSimilarity], result of:
            0.054444723 = score(doc=2660,freq=1.0), product of:
              0.0959105 = queryWeight, product of:
                1.0004449 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.013193906 = queryNorm
              0.56766176 = fieldWeight in 2660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.078125 = fieldNorm(doc=2660)
          0.18382761 = weight(abstract_txt:disease in 2660) [ClassicSimilarity], result of:
            0.18382761 = score(doc=2660,freq=8.0), product of:
              0.10792933 = queryWeight, product of:
                1.0612797 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.013193906 = queryNorm
              1.703222 = fieldWeight in 2660, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.078125 = fieldNorm(doc=2660)
          0.06841505 = weight(abstract_txt:reports in 2660) [ClassicSimilarity], result of:
            0.06841505 = score(doc=2660,freq=1.0), product of:
              0.1909795 = queryWeight, product of:
                3.1567373 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.013193906 = queryNorm
              0.35823244 = fieldWeight in 2660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.078125 = fieldNorm(doc=2660)
          0.28547418 = weight(abstract_txt:negation in 2660) [ClassicSimilarity], result of:
            0.28547418 = score(doc=2660,freq=1.0), product of:
              0.41749114 = queryWeight, product of:
                3.6153 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.013193906 = queryNorm
              0.683785 = fieldWeight in 2660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.078125 = fieldNorm(doc=2660)
        0.16 = coord(4/25)