Document (#40928)

Author
Wang, P.
Hao, T.
Yan, J.
Jin, L.
Title
Large-scale extraction of drug-disease pairs from the medical literature
Source
Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2649-2661
Year
2017
Abstract
Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23876/full.
Footnote
Beitrag in einem Special issue on biomedical information retrieval.
Field
Medizin

Similar documents (author)

  1. Wang, H.; Wang, C.: Ontologies for universal information systems (1995) 4.64
    4.63939 = sum of:
      4.63939 = weight(author_txt:wang in 3194) [ClassicSimilarity], result of:
        4.63939 = fieldWeight in 3194, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.5 = fieldNorm(doc=3194)
    
  2. Wang, F.; Wang, X.: Tracing theory diffusion : a text mining and citation-based analysis of TAM (2020) 4.64
    4.63939 = sum of:
      4.63939 = weight(author_txt:wang in 5980) [ClassicSimilarity], result of:
        4.63939 = fieldWeight in 5980, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.5 = fieldNorm(doc=5980)
    
  3. Wang, C.: ¬The online catalogue, subject access and user reactions : a review (1985) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 986) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 986, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=986)
    
  4. Wang, C.: Bibliometrics : a textbook (1990) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 5040) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 5040, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=5040)
    
  5. Wang, P.: Users' information needs at different stages of a research project : a cognitive view (1997) 4.10
    4.1006804 = sum of:
      4.1006804 = weight(author_txt:wang in 320) [ClassicSimilarity], result of:
        4.1006804 = fieldWeight in 320, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.5610886 = idf(docFreq=169, maxDocs=44218)
          0.625 = fieldNorm(doc=320)
    

Similar documents (content)

  1. Song, M.; Kang, K.; An, J.Y.: Investigating drug-disease interactions in drug-symptom-disease triples via citation relations (2018) 0.35
    0.35400295 = sum of:
      0.35400295 = product of:
        1.7700148 = sum of:
          0.0040243757 = weight(abstract_txt:from in 4545) [ClassicSimilarity], result of:
            0.0040243757 = score(doc=4545,freq=1.0), product of:
              0.023296941 = queryWeight, product of:
                1.2281331 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.006863314 = queryNorm
              0.17274266 = fieldWeight in 4545, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=4545)
          0.06742198 = weight(abstract_txt:extracting in 4545) [ClassicSimilarity], result of:
            0.06742198 = score(doc=4545,freq=2.0), product of:
              0.10999628 = queryWeight, product of:
                2.3110833 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.006863314 = queryNorm
              0.6129478 = fieldWeight in 4545, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=4545)
          0.35944727 = weight(abstract_txt:pairs in 4545) [ClassicSimilarity], result of:
            0.35944727 = score(doc=4545,freq=4.0), product of:
              0.42293838 = queryWeight, product of:
                9.063483 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.006863314 = queryNorm
              0.84988093 = fieldWeight in 4545, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=4545)
          0.5367427 = weight(abstract_txt:disease in 4545) [ClassicSimilarity], result of:
            0.5367427 = score(doc=4545,freq=5.0), product of:
              0.49827015 = queryWeight, product of:
                9.418782 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.006863314 = queryNorm
              1.0772122 = fieldWeight in 4545, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=4545)
          0.80237836 = weight(abstract_txt:drug in 4545) [ClassicSimilarity], result of:
            0.80237836 = score(doc=4545,freq=5.0), product of:
              0.6706096 = queryWeight, product of:
                11.412781 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.006863314 = queryNorm
              1.196491 = fieldWeight in 4545, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.0625 = fieldNorm(doc=4545)
        0.2 = coord(5/25)
    
  2. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.18
    0.1844161 = sum of:
      0.1844161 = product of:
        0.76840043 = sum of:
          0.0040243757 = weight(abstract_txt:from in 1107) [ClassicSimilarity], result of:
            0.0040243757 = score(doc=1107,freq=1.0), product of:
              0.023296941 = queryWeight, product of:
                1.2281331 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.006863314 = queryNorm
              0.17274266 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.04858091 = weight(abstract_txt:medical in 1107) [ClassicSimilarity], result of:
            0.04858091 = score(doc=1107,freq=3.0), product of:
              0.07723076 = queryWeight, product of:
                1.936519 = boost
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.006863314 = queryNorm
              0.6290358 = fieldWeight in 1107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.03393128 = weight(abstract_txt:extraction in 1107) [ClassicSimilarity], result of:
            0.03393128 = score(doc=1107,freq=1.0), product of:
              0.08768403 = queryWeight, product of:
                2.0634162 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.006863314 = queryNorm
              0.38697222 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.041043513 = weight(abstract_txt:extract in 1107) [ClassicSimilarity], result of:
            0.041043513 = score(doc=1107,freq=1.0), product of:
              0.09954436 = queryWeight, product of:
                2.1985428 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.006863314 = queryNorm
              0.4123138 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.052848164 = weight(abstract_txt:treatment in 1107) [ClassicSimilarity], result of:
            0.052848164 = score(doc=1107,freq=1.0), product of:
              0.12967408 = queryWeight, product of:
                2.897494 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.006863314 = queryNorm
              0.4075461 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.58797216 = weight(abstract_txt:disease in 1107) [ClassicSimilarity], result of:
            0.58797216 = score(doc=1107,freq=6.0), product of:
              0.49827015 = queryWeight, product of:
                9.418782 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.006863314 = queryNorm
              1.1800269 = fieldWeight in 1107, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
        0.24 = coord(6/25)
    
  3. Lee, C.-H.; Khoo, C.; Na, J.-C.: Automatic identification of treatment relations for medical ontology learning : an exploratory study (2004) 0.18
    0.18307659 = sum of:
      0.18307659 = product of:
        0.7628192 = sum of:
          0.012289586 = weight(abstract_txt:method in 2661) [ClassicSimilarity], result of:
            0.012289586 = score(doc=2661,freq=2.0), product of:
              0.030891433 = queryWeight, product of:
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.006863314 = queryNorm
              0.3978315 = fieldWeight in 2661, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
          0.0040243757 = weight(abstract_txt:from in 2661) [ClassicSimilarity], result of:
            0.0040243757 = score(doc=2661,freq=1.0), product of:
              0.023296941 = queryWeight, product of:
                1.2281331 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.006863314 = queryNorm
              0.17274266 = fieldWeight in 2661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
          0.0560964 = weight(abstract_txt:medical in 2661) [ClassicSimilarity], result of:
            0.0560964 = score(doc=2661,freq=4.0), product of:
              0.07723076 = queryWeight, product of:
                1.936519 = boost
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.006863314 = queryNorm
              0.7263479 = fieldWeight in 2661, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8107834 = idf(docFreq=359, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
          0.0915357 = weight(abstract_txt:treatment in 2661) [ClassicSimilarity], result of:
            0.0915357 = score(doc=2661,freq=3.0), product of:
              0.12967408 = queryWeight, product of:
                2.897494 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.006863314 = queryNorm
              0.70589054 = fieldWeight in 2661, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
          0.24003863 = weight(abstract_txt:disease in 2661) [ClassicSimilarity], result of:
            0.24003863 = score(doc=2661,freq=1.0), product of:
              0.49827015 = queryWeight, product of:
                9.418782 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.006863314 = queryNorm
              0.48174396 = fieldWeight in 2661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
          0.3588345 = weight(abstract_txt:drug in 2661) [ClassicSimilarity], result of:
            0.3588345 = score(doc=2661,freq=1.0), product of:
              0.6706096 = queryWeight, product of:
                11.412781 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.006863314 = queryNorm
              0.53508705 = fieldWeight in 2661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.0625 = fieldNorm(doc=2661)
        0.24 = coord(6/25)
    
  4. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.13
    0.12539108 = sum of:
      0.12539108 = product of:
        0.52246284 = sum of:
          0.0173801 = weight(abstract_txt:method in 5061) [ClassicSimilarity], result of:
            0.0173801 = score(doc=5061,freq=4.0), product of:
              0.030891433 = queryWeight, product of:
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.006863314 = queryNorm
              0.56261873 = fieldWeight in 5061, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.013198771 = weight(abstract_txt:proposed in 5061) [ClassicSimilarity], result of:
            0.013198771 = score(doc=5061,freq=2.0), product of:
              0.03239681 = queryWeight, product of:
                1.0240757 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.006863314 = queryNorm
              0.4074096 = fieldWeight in 5061, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.0069704233 = weight(abstract_txt:from in 5061) [ClassicSimilarity], result of:
            0.0069704233 = score(doc=5061,freq=3.0), product of:
              0.023296941 = queryWeight, product of:
                1.2281331 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.006863314 = queryNorm
              0.29919907 = fieldWeight in 5061, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.05804429 = weight(abstract_txt:extract in 5061) [ClassicSimilarity], result of:
            0.05804429 = score(doc=5061,freq=2.0), product of:
              0.09954436 = queryWeight, product of:
                2.1985428 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.006863314 = queryNorm
              0.5830997 = fieldWeight in 5061, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.06742198 = weight(abstract_txt:extracting in 5061) [ClassicSimilarity], result of:
            0.06742198 = score(doc=5061,freq=2.0), product of:
              0.10999628 = queryWeight, product of:
                2.3110833 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.006863314 = queryNorm
              0.6129478 = fieldWeight in 5061, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.35944727 = weight(abstract_txt:pairs in 5061) [ClassicSimilarity], result of:
            0.35944727 = score(doc=5061,freq=4.0), product of:
              0.42293838 = queryWeight, product of:
                9.063483 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.006863314 = queryNorm
              0.84988093 = fieldWeight in 5061, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
        0.24 = coord(6/25)
    
  5. Naing, M.-M.; Lim, E.-P.; Chiang, R.H.L.: Extracting link chains of relationship instances from a Web site (2006) 0.11
    0.11357728 = sum of:
      0.11357728 = product of:
        0.40563312 = sum of:
          0.0153619815 = weight(abstract_txt:method in 6111) [ClassicSimilarity], result of:
            0.0153619815 = score(doc=6111,freq=2.0), product of:
              0.030891433 = queryWeight, product of:
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.006863314 = queryNorm
              0.49728936 = fieldWeight in 6111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.011666176 = weight(abstract_txt:proposed in 6111) [ClassicSimilarity], result of:
            0.011666176 = score(doc=6111,freq=1.0), product of:
              0.03239681 = queryWeight, product of:
                1.0240757 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.006863314 = queryNorm
              0.36010262 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.0050304695 = weight(abstract_txt:from in 6111) [ClassicSimilarity], result of:
            0.0050304695 = score(doc=6111,freq=1.0), product of:
              0.023296941 = queryWeight, product of:
                1.2281331 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.006863314 = queryNorm
              0.21592833 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.030140596 = weight(abstract_txt:precision in 6111) [ClassicSimilarity], result of:
            0.030140596 = score(doc=6111,freq=1.0), product of:
              0.06982563 = queryWeight, product of:
                1.8413402 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.006863314 = queryNorm
              0.4316552 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.033951145 = weight(abstract_txt:recall in 6111) [ClassicSimilarity], result of:
            0.033951145 = score(doc=6111,freq=1.0), product of:
              0.07559329 = queryWeight, product of:
                1.9158797 = boost
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.006863314 = queryNorm
              0.44912907 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7488523 = idf(docFreq=382, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.084828205 = weight(abstract_txt:extraction in 6111) [ClassicSimilarity], result of:
            0.084828205 = score(doc=6111,freq=4.0), product of:
              0.08768403 = queryWeight, product of:
                2.0634162 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.006863314 = queryNorm
              0.96743053 = fieldWeight in 6111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
          0.22465456 = weight(abstract_txt:pairs in 6111) [ClassicSimilarity], result of:
            0.22465456 = score(doc=6111,freq=1.0), product of:
              0.42293838 = queryWeight, product of:
                9.063483 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.006863314 = queryNorm
              0.5311756 = fieldWeight in 6111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.078125 = fieldNorm(doc=6111)
        0.28 = coord(7/25)