[1]张恒,何文玢,何军,等.医学知识增强的肿瘤分期多任务学习模型[J].智能系统学报,2021,16(4):739-745.[doi:10.11992/tis.202010005]
 ZHANG Heng,HE Wenbin,HE Jun,et al.Multi-task tumor stage learning model with medical knowledge enhancement[J].CAAI Transactions on Intelligent Systems,2021,16(4):739-745.[doi:10.11992/tis.202010005]
点击复制

医学知识增强的肿瘤分期多任务学习模型(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年4期
页码:
739-745
栏目:
学术论文—知识工程
出版日期:
2021-07-05

文章信息/Info

Title:
Multi-task tumor stage learning model with medical knowledge enhancement
作者:
张恒1 何文玢2 何军1 焦增涛2 刘红岩3
1. 中国人民大学 信息学院,北京 100872;
2. 医渡云(北京)技术有限公司,北京 100191;
3. 清华大学 管理科学与工程系,北京 100084
Author(s):
ZHANG Heng1 HE Wenbin2 HE Jun1 JIAO Zengtao2 LIU Hongyan3
1. School of Information, Renmin University of China, Beijing 100872, China;
2. Yidu Cloud (Beijing) Technology Co., Ltd, Beijing 100191, China;
3. Department of Management Science and Technology, Tsinghua University, Beijing 100084, China
关键词:
肿瘤分期文本分类机器阅读理解多任务学习不均衡分类智慧医疗知识表示注意力机制
Keywords:
tumor stagingtext classificationmachine reading comprehensionmulti-task learningunbalanced classificationsmart healthcareknowledge representationattention mechanism
分类号:
TP391
DOI:
10.11992/tis.202010005
摘要:
肿瘤分期是指从病人的电子病历文本中推测肿瘤对应阶段的过程。在电子病历数据中存在类别严重不均衡现象,因此使用深度学习方法进行肿瘤分期具有一定的挑战性。该文提出医学知识增强的多任务学习KEMT(knowledge enhanced multi-task) 模型,将肿瘤分期问题视作面向医疗电子病历的文本分类任务,同时引入医生在人工预测肿瘤分期时参考的医学属性,提出基于医学问题的机器阅读理解任务,对上述两种任务进行联合学习。我们与医疗机构合作构建了真实场景下的肿瘤分期的数据集,实验结果显示,KEMT模型可以将医学知识与神经网络结合起来,预测准确率高于传统的文本分类模型。在数据分布不均衡的条件下,在小样本类别上的准确率提升了4.2个百分点,同时模型也具有一定的解释性。
Abstract:
Tumor staging is the process of inferring the corresponding stage of tumors based on patients’ electronic health records (EHR). The serious uneven data distribution in the types of EHRs has certain challenges on tumor stage prediction through in-depth learning. Accordingly, this paper proposes a knowledge enhanced multi-task (KEMT) model and considers tumor stage reasoning as a text classification task of EHR. It also introduces medical attributes that doctors referred to in tumor stage prediction and introduces a medical problem-based machine reading comprehension task. The tasks are jointly studied by building a real-world dataset of tumor staging with medical institutions. Experimental results show that the KEMT model combines medical knowledge with a neural network and gets a higher precision rate of prediction than the traditional text classification models. Under the condition of uneven data distribution, the accuracy of small samples is improved by 4.2%, for which the model also accounts.

参考文献/References:

[1] 姚云峰. 肿瘤分期与疗效评价[J]. 中国医学前沿杂志(电子版), 2010, 2(4): 70-75
YAO Yunfeng. Evaluation of tumor stage and curative effect[J]. Chinese journal of the frontiers of medical science (electronic version), 2010, 2(4): 70-75
[2] 周斌, 季科, 辛灵, 等. 美国肿瘤联合会乳腺癌分期系统(第8版)更新内容介绍及解读[J]. 中国实用外科杂志, 2017, 37(1): 10-14
ZHOU Bin, JI Ke, XIN Ling, et al. Updates and interpretations of the 8th edition of AJCC breast cancer staging system[J]. Chinese journal of practical surgery, 2017, 37(1): 10-14
[3] HU Zikun, LI Xiang, TU Cunchao, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics. New Mexico, USA, 2018: 487-498.
[4] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 2014: 1746-1751.
[5] TANG Duyu, QIN Bing, LIU Ting. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015: 1422-1432.
[6] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, 2017: 427-431.
[7] JOHNSON R, ZHANG Tong. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, 2017: 562-570.
[8] YAO Liang, MAO Chengsheng, LUO Yuang. Graph convolutional networks for text classification[C]//Proceedings of 32rd AAAI Conference on Artificial Intelligence. Hawaii, USA: 7370-7377.
[9] SUN Chi, QIU Xipeng, XU Yige, et al. How to fine-tune BERT for text classification?[C]//Proceedings of the 18th China National Conference on Chinese Computational Linguistics. Kunming, China, 2019: 194-206.
[10] ELHOSEINY M, SALEH B, ELGAMMAL A. Write a classifier: zero-shot learning using purely textual descriptions[C]//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia, 2013: 2584-2591.
[11] CUI Yiming, CHEN Zhipeng, WEI Si, et al. Attention-over-attention neural networks for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, 2017: 593-602.
[12] SEO M, KEMBHAVI A, FARHADI A, et al. Bidirectional attention flow for machine comprehension[EB/OL].(2016-11-05) [2019-10-12] https://arxiv.org/abs/1611.01603.
[13] PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[C]//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA, 2017.
[14] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22) [2019-12-12] https://arxiv.org/pdf/1412.6980.pdf.
[15] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. The journal of machine learning research, 2014, 15(1): 1929-1958.

相似文献/References:

[1]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,4(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,4(4):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
[2]夏睿,宗成庆.情感文本分类混合模型及特征扩展策略[J].智能系统学报,2011,6(06):483.
 XIA Rui,ZONG Chengqing.A hybrid approach to sentiment classification and feature expansion strategy[J].CAAI Transactions on Intelligent Systems,2011,6(4):483.
[3]沈高峰,谷淑敏.基于遗传算法优化综合启发式的中文网页特征提取[J].智能系统学报,2014,9(04):474.[doi:10.3969/j.issn.1673-4785.201305044]
 SHEN Gaofeng,GU Shumin.Chinese Web page feature extraction by optimizing comprehensive heuristics based on GA[J].CAAI Transactions on Intelligent Systems,2014,9(4):474.[doi:10.3969/j.issn.1673-4785.201305044]
[4]古丽娜孜·艾力木江,乎西旦·居马洪,孙铁利,等.基于支持向量的最近邻文本分类方法[J].智能系统学报,2018,13(05):799.[doi:10.11992/tis.201711007]
 GULNAZ Alimjan,HURXIDA Jumahun,SUN Tieli,et al.The nearest neighbor text classification method based on support vector[J].CAAI Transactions on Intelligent Systems,2018,13(4):799.[doi:10.11992/tis.201711007]
[5]商显震,韩萌,王少峰,等.融合迁移学习和神经网络的皮肤病诊断方法[J].智能系统学报,2020,15(3):452.[doi:10.11992/tis.201811015]
 SHANG Xianzhen,HAN Meng,WANG Shaofeng,et al.A skin diseases diagnosis method combining transfer learning and neural networks[J].CAAI Transactions on Intelligent Systems,2020,15(4):452.[doi:10.11992/tis.201811015]

备注/Memo

备注/Memo:
收稿日期:2020-10-09。
基金项目:国家自然科学基金项目(U171126,71771131)
作者简介:张恒,硕士研究生,主要研究方向为自然语言处理,医疗数据挖掘;何文玢,硕士研究生,主要研究方向为运动康复、医学数据分析、医学AI产品设计等;刘红岩,教授,博士生导师,CCF数据库专业委员会委员,主要研究方向为大数据管理与分析、数据/文本挖掘、商务智能、个性化推荐系统、医疗数据分析。发表学术论文近百篇,出版学术专著2部
通讯作者:刘红岩.E-mail:hyliu@tsinghua.edu.cn
更新日期/Last Update: 1900-01-01