基于知识图谱的潜在不适当用药预测
doi:
-
摘要:
目的 为提高潜在不适当用药(potentially inappropriate medication, PIM)预测的准确率,提出一种结合知识图谱和机器学习的PIM预测模型。 方法 首先,基于2019版Beers标准,以知识图谱为基本结构,构建具有逻辑表达能力的PIM知识表示体系,实现从患者信息到PIM的推理过程。其次,利用分类器链算法建立每个PIM标签的机器学习预测模型,从数据中学习潜在特征关联。最后,根据样本量阈值,将知识图谱的部分推理结果作为分类器链上的输出标签,增加低频PIM预测结果的可靠性。 结果 实验采用来自成都地区9家医疗机构的11741份处方数据,对模型有效性进行评估。实验表明,该模型对于PIM数量预测的准确率为98.10%,F1值为93.66%,对于PIM多标签预测的汉明损失为0.06%,macro-F1为66.09%,与现有模型相比有着更高的预测精度。 结论 该PIM预测模型具有更好的潜在不适当用药预测性能,并且对于低频PIM标签识别效果提升显著。 -
关键词:
- koko体育app: 潜在不适当用药 /
- koko体育app: 机器学习 /
- koko体育app: 知识图谱 /
- 多标签分类
Abstract:Objective To improve the accuracy of potentially inappropriate medication (PIM) prediction, a PIM prediction model that combines knowledge graph and machine learning was proposed. Methods Firstly, based on Beers criteria 2019 and using the knowledge graph as the basic structure, a PIM knowledge representation framework with logical expression capabilities was constructed, and a PIM inference process was implemented from patient information nodes to PIM nodes. Secondly, a machine learning prediction model for each PIM label was established based on the classifier chain algorithm, to learn the potential feature associations from the data. Finally, based on a threshold of sample size, a portion of reasoning results from the knowledge graph was used as output labels on the classifier chain to enhance the reliability of the prediction results of low-frequency PIMs. Results 11741 prescriptions from 9 medical institutions in Chengdu were used to evaluate the effectiveness of the model. Experimental results show that the accuracy of the model for PIM quantity prediction is 98.10%, the F1 is 93.66%, the Hamming loss for PIM multi-label prediction is 0.06%, and the macroF1 is 66.09%, which has higher prediction accuracy than the existing models. Conclusion The method proposed has better prediction performance for potentially inappropriate medication and significantly improves the recognition of low-frequency PIM labels. -
koko体育app
图 2 PIM推理案例
Figure 2. PIM inference case
The knowledge graph is constructed based on a Chinese corpus and is presented in the original Chinese form.图 6 PIM34的路径查找过程
Figure 6. The path finding process of PIM34
The knowledge graph is constructed based on a Chinese corpus and is presented in the original Chinese form.表 1 三类逻辑节点定义
Table 1. Definition of three types of l⛦ogical nodes
Type Definition Case Necessary condition node All the constituent conditions exist. + Rivaroxaban
+ ≥75 yr.
+ Atrial fibrillationPositive and negative condition node All the constituent conditions exist and all the exclusion conditions do not exist. + Short- or rapid-acting insulin
- Basal or long-acting insulinCounting condition node Any constituent condition occurs at least a specified number of times. Anticholinergic≥2 下载: 导出CSV
表 2 数据集字段信息
Table 2. Field information in the dataset
Number Description Range Type 1 Sex 0, 1 Categorical 2 Age 65-119 Numeric 3 Number of diseases 1-19 Numeric 4 Number of drugs 1-23 Numeric 5-2261 Taking a certain kind of medicine 0, 1 Categorical 2262-2787 Suffering from a certain disease 0, 1 Categorical 2788-2828 A certain PIM exists (Target) 0, 1 Categorical 2829 Number of existing PIMs (Target) 0-10 Categorical 下载: 导出CSV
表 3 模型预测性能对比
Table 3. Comparison of prediction performance of the🐎 models
Model Number of PIMs PIM labels Acc/% Pre/% Rec/% F1/% SA/% HL/% Macro-F1/% Micro-F1/% Random Forest 92.96 84.90 60.94 68.68 92.93 0.25 38.49 88.76 XGBoost 96.57 92.51 84.67 88.28 96.48 0.10 51.96 95.72 CatBoost 97.76 92.97 87.37 89.93 97.62 0.07 64.73 97.27 AutoInt 86.74 72.52 72.99 71.62 86.46 0.37 45.43 85.44 DANets 92.59 77.45 65.62 69.54 92.17 0.25 34.95 89.52 FT-Transformer 94.32 87.53 72.82 78.85 94.18 0.18 40.23 92.06 T2G-FORMER 95.32 81.03 73.50 76.83 95.23 0.14 46.72 93.97 Model proposed in the study 98.10 94.60 92.83 93.66 97.98 0.06 66.09 97.69 Acc: accuracy; Pre: precision; Rec: recall; SA: subset accuracy; HL: hamming loss. The top performances are marked in bold, and the second best results are underlined. 下载: 导出CSV
表 4 部分PIM标签预测结果比较
Table 4. Comparison o𒅌f the prediction results for partial labels
PIM Model Acc/% Pre/% Rec/% F1/% 24 CatBoost 99.80 87.50 53.85 66.67 CatBoost+KG 99.94 86.67 100.00 92.86 34 CatBoost 99.86 100.00 64.29 78.26 CatBoost+KG 99.89 100.00 71.43 83.33 37 CatBoost 99.69 100.00 52.17 68.57 CatBoost+KG 99.91 100.00 86.96 93.02 Acc: accuracy; Pre: precision; Rec: recall. 下载: 导出CSV
表 5 不同多标签分类策略时的模型性能对比
Table 5. Comparison ofꦺ mode𒐪l performance with different multi-label classification strategies
Strategy Number of PIMs PIM labels Acc/% Pre/% Rec/% F1/% SA/% HL/% Macro-F1/% Macro-F1/% LP 96.96 93.74 87.79 90.48 96.74 00.10 54.15 95.72 BR 97.90 90.77 86.91 88.71 97.76 00.06 61.60 97.47 CC 98.10 94.60 92.83 93.66 97.98 00.06 66.09 97.69 LP: label powerset; BR: binary relevance; CC: classifier chains; Acc: accuracy; Pre: precision; Rec: recall; SA: subset accuracy; HL: hamming loss. The top performances are marked in bold, and the second best results are underlined. 下载: 导出CSV
表 6 不同样本量阈值时的模型性能对比
Table 6. Compar🐠ison of model performance at different sample size thresholds
$ \lambda $ Number of PIMs PIM labels Acc/% Pre/% Rec/% F1/% SA/% HL/% Macro-F1/% Micro-F1/% 0 97.76 92.97 87.37 89.93 97.62 0.07 64.73 97.27 10 97.81 94.07 87.50 90.44 97.76 0.06 64.94 97.32 30 98.10 94.60 92.83 93.66 97.98 0.06 66.09 97.69 50 98.07 92.70 89.80 90.93 97.96 0.05 64.71 97.75 100 97.28 88.25 82.54 84.98 97.19 0.07 61.67 96.99 Acc: accuracy; Pre: precision; Rec: recall; SA: subset accuracy; HL: hamming loss. The top performances are marked in bold, and the second best results are underlined. 下载: 导出CSV
-
[1] 廖世莉, 田方圆, 张莹, 等. 2018年成都9家医院老年科住院患者潜在不适当用药与相关风险因素分析. 药物流行病学杂志,2021,30(4): 245–250. doi: [2] 王思蒙, 张晨, 孙雪, 等. 社区老年人潜在不适当用药及其应对模式的研究进展. 中国全科医学,2022,25(13): 1551–1556. doi: [3] BEERS M H, OUSLANDER J G, ROLLINGHER I, et al. Explicit criteria for determining inappropriate medication use in nursing home residents. Arch Inter Med,1991,151(9): 1825–1832. doi: [4] PATEL J, LADANI A, SAMBAMOORTHI N, et al. A machine learning approach to identify predictors of potentially inappropriate non-steroidal anti-inflammatory drugs (NSAIDs) use in older adults with osteoarthritis. Int J Environ Res Public Health,2021,18(1): 155. doi: [5] LIN H C, WANG Z, HU Y H, et al. Characteristics of statewide prescription drug monitoring programs and potentially inappropriate opioid prescribing to patients with non-cancer chronic pain: a machine learning application. Prev Med,2022,161: 107116. doi: [6] WU X, CHANG H, LI M, et al. A machine learning-based risk warning platform for potentially inappropriate prescriptions for elderly patients with cardiovascular disease. Front Pharmacol,2022,13: 804566. doi: [7] 潘崇煜, 黄健, 郝建国, 等. 融合零样本学习和小样本学习的弱监督学习方法综述. 系统工程与电子技术,2020,42(10): 2246–2256. doi: [8] 欧阳宵, 陶红, 范瑞东, 等. 利用标签相关性先验的弱监督多标签学习方法. 软件学报,2023,34(4): 1732–1748. doi: [9] 徐玲玲, 迟冬祥. 面向不平衡数据集的机器学习分类策略. 计算机工程与应用,2020,56(24): 12–27. doi: [10] 赵杨, 刘成翰, 杨涛, 等. 新型冠状病毒肺炎知识图谱研究现状与展望. 中华医学图书情报杂志,2022,31(5): 32–44. doi: [11] XIA F, LI B, WENG Y, et al. MedConQA: medical conversational question answering system based on knowledge graphs//Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Abu Dhabi, UAE: Association for Computational Linguistics, 2022: 148−158. doi: . [12] HUANG J, CHEN Y, LI Y, et al. Medical knowledge-based network for Patient-oriented Visual Question Answering. Inform Process Manag,2023,60(2): 103241. doi: [13] ZHU C, YANG Z, XIA X, et al. Multimodal reasoning based on knowledge graph embedding for specific diseases. Bioinformatics,2022,38(8): 2235–2245. doi: [14] YANG R, YE Q, CHENG C, et al. Decision-making system for the diagnosis of syndrome based on traditional Chinese medicine knowledge graph. Evid Based Complement Alternat Med,2022,2022: 8693937. doi: [15] SU X, HU L, YOU Z, et al. Attention-based knowledge graph representation learning for predicting drug-drug interactions. Brief Bioinform,2022,23(3): bbac140. doi: [16] GAO Z, DING P, XU R. KG-predict: a knowledge graph computational framework for drug repurposing. J Biomed Inform,2022,132: 104133. doi: [17] ZHOU G, HAIHONG E, KUANG Z, et al. Clinical decision support system for hypertension medication based on knowledge graph. Comput Methods Programs Biomed,2022,227: 107220. doi: [18] TIAN F, LI H, CHEN Z, et al. Potentially inappropriate medications in Chinese older outpatients in tertiary hospitals according to Beers criteria: a cross-sectional study. Int J Clin Pract,2021,75(8): e14348. doi: [19] HU Q, TIAN F, JIN Z, et al. Developing a warning model of potentially inappropriate medications in older Chinese outpatients in tertiary hospitals: a machine-learning study. J Clin Med,2023,12(7): 2619. doi: [20] 郭和坚, 朱亚兰, 胡晓霞, 等. 基于Beers标准和STOPP/START标准评价神经内科老年患者出院带药处方潜在不适当用药情况. 临床药物治疗杂志,2022,20(11): 65–71. doi: [21] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms. IEEE Transact Knowl Data Engin,2014,26(8): 1819–1837. doi: [22] VERONIKA DOROGUSH A, ERSHOV V, GULIN A. CatBoost: gradient boosting with categorical features support. arXiv e-prints, 2018: arXiv: 1810.11363. doi: . [23] BREIMAN L. Random forests. Machine Learn,2001,45: 5–32. doi: [24] CHEN T, GUESTRIN C. Xgboost: a scalable tree boosting system//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. San Francisco, USA, 2016: 785-794. doi: . [25] SONG W, SHI C, XIAO Z, et al. Autoint: automatic feature interaction learning via self-attentive neural networks//Proceedings of the 28th ACM international conference on information and knowledge management. Beijing, China, 2019: 1161−1170. doi: . [26] GORISHNIY Y, RUBACHEV I, KHRULKOV V, et al. Revisiting deep learning models for tabular data. Advance Neur Informat Process Syst,2021,34: 18932–18943. [27] CHEN J, LIAO K, WAN Y, et al. DANets: deep abstract networks for tabular data classification and regression. Proceed AAAI Conference Artif Intellig,2022,36(4): 3930–3938. doi: [28] YAN J, CHEN J, WU Y, et al. T2g-former: organizing tabular features into relation graphs promotes heterogeneous feature interaction. Proceed AAAI Conference Artif Intellig,2023,37(9): 10720–10728. doi: [29] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions. Advance Neur Informat Process Syst, 2017, 30: 4765–4774. -
放开获利 本文遵循知识共享署名—非商业性使用4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供෴指向本文许可协议的链接,同时标明是否对原文作了修改;不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 //creativecommons.org/licenses/by-nc/4.0