Wang, P.; Hao, T.; Yan, J.; Jin, L.: Large-scale extraction of drug-disease pairs from the medical literature (2017)
0.00
0.0029294936 = product of:
0.005858987 = sum of:
0.005858987 = product of:
0.011717974 = sum of:
0.011717974 = weight(_text_:a in 3927) [ClassicSimilarity], result of:
0.011717974 = score(doc=3927,freq=24.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.22065444 = fieldWeight in 3927, product of:
4.8989797 = tf(freq=24.0), with freq of:
24.0 = termFreq=24.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0390625 = fieldNorm(doc=3927)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
- Type
- a