[1]潘主强,张林,张磊,等.中医临床不均衡数据疾病分类方法研究[J].智能系统学报,2017,(06):848-856.[doi:10.11992/tis.201706046]
 PAN Zhuqiang,ZHANG Lin,ZHANG Lei,et al.Research on classification of diseases of clinical imbalanced data in traditional Chinese medicine[J].CAAI Transactions on Intelligent Systems,2017,(06):848-856.[doi:10.11992/tis.201706046]
点击复制

中医临床不均衡数据疾病分类方法研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2017年06期
页码:
848-856
栏目:
出版日期:
2017-12-25

文章信息/Info

Title:
Research on classification of diseases of clinical imbalanced data in traditional Chinese medicine
作者:
潘主强1 张林1 张磊2 李国正3 颜仕星4
1. 西南石油大学 计算机科学学院, 四川 成都 610500;
2. 中国中医科学院 中医临床基础医学研究所, 北京 100700;
3. 中国中医科学院 中医药数据中心, 北京 100700;
4. 上海金灯台信息科技有限公司, 上海 201800
Author(s):
PAN Zhuqiang1 ZHANG Lin1 ZHANG Lei2 LI Guozheng3 YAN Shixing4
1. School of Computer Science, Southwest Petroleum University, Chengdu 610500, China;
2. Institute of Basic Research in Clinical Medicine of Traditional Chinese Medicine, China Academy of Chinese Medical Science, Beijing 100700, China;
3. National D
关键词:
中医临床不均衡数据分类原始数据分布特征选择
Keywords:
Chinese medicine clinicalimbalance data classificationinitial data distributionfeature selection
分类号:
TP391
DOI:
10.11992/tis.201706046
摘要:
基于欠采样的不均衡数据分类算法是一种随机数据优化算法,但它不能最好地反映中医临床原始数据的分布并解决数据的特征冗余问题。提出了基于预测风险的最远病例不均衡装袋算法(PRFS-FPUSAB)。该算法中首先基于欠采样提出了改进的抽样方式尽可能地反映原始数据分布,然后结合集成学习、预测风险标准提高不均衡的分类性能并进行特征选择。在中医临床采集的经络电阻数据上的实验结果表明,该算法改善了曲线下面积并且选择的特征也符合中医学相关理论。
Abstract:
An algorithm based on under-sampling unbalanced data classification is a stochastic data optimization algorithm. However, in traditional Chinese medicine (TCM), it is difficult to best reflect the distribution of original clinical data to solve the problem of feature redundancy in data. Therefore, in this paper, the PRFS-FPUSAB algorithm is proposed. In the algorithm, an improved sampling method is proposed based on under-sampling. The original data distribution is reflected as much as possible; then, the classification is improved by combining integrated learning, prediction risk, and feature selection. The experimental results on meridian resistance data collected from TCM show that the algorithm improves the area under the curve, and the selected characteristics are also in accordance with TCM theory.

参考文献/References:

[1] 邹永杰. 基于特征提取的分类集成在脾虚证诊断中的应用[J]. 计算机应用与软件, 2010, 27(3): 22-25.
ZOU Yongjie. Applying feature selection-based classification ensemble in spleen asthenia diagnosis[J]. Computer applications and software, 2010, 27(3): 22-25.
[2] 刘天羽, 李国正. 齿轮故障不均衡分类问题的研究[J]. 计算机工程与应用, 2010, 46(20): 146-148.
LIU Tianyu, LI Guozheng. Research on imbalanced problems in gear fault diagnosis. Computer engineering and applications, 2010, 46(20): 146-148.
[3] 谢娜娜, 房斌, 吴磊. 不均衡数据集上文本分类方法研究[J]. 计算机工程与应用, 2013, 49(20): 118-121.
XIE Nana, FANG Bin, WU Lei. Study of text categorization on imbalanced data. Computer engineering and applications, 2013, 49(20): 118-121.
[4] 陶新民, 郝思媛, 张冬雪, 等. 不均衡数据分类算法的综述[J]. 重庆邮电大学学报:自然科学版, 2013, 25(1): 101-110.
TAO Xinmin, HAO Siyuan, ZHANG Dongxue, et al. Overview of classification algorithms for unbalanced data[J]. Journal of chongqing university of posts and telecommunications, 2013, 25(1): 101-43.
[5] LIUT Y, LI G Z. The imbalanced data problem in the fault diagnosis of rolling bearing[J]. Computer engineering and science, 2010, 32(5): 150-153.
[6] YU K S. A Network intrusion detection model based on data ming and feature selection schemes[J]. Microelectronics and computer, 2011, 28(8): 74-76.
[7] ZWEIG M H, CAMPBELLmpbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine[J]. Clinical chemistry, 1993, 39(4): 561-77.
[8] 浮盼盼. 大规模不均衡数据分类方法研究[D]. 大连:辽宁师范大学, 2014.
FU Panpan. Research on classification methods for large-scale imbalanced data [D]. Liaoning normal university, 2014.
[9] MIERSWA I. Controlling overfitting with multi-objective support vector machines[C]//Genetic and Evolutionary Computation Conference. London, UK, 2007: 1830-1837.
[10] 赵自翔, 王广亮, 李晓东. 基于支持向量机的不均衡数据分类的改进欠采样方法[J]. 中山大学学报:自然科学版, 2012, 51(6): 10-16.
ZHAO Zixiang, WANG Guangliang, LI Xiaodong. An improved SVM based under-sampling method for classifying imbalanced data[J]. Acta scientiarum naturalium universitatis sunyatseni, 2012, 51(6): 10-16.
[11] 朱明, 陶新民. 基于随机下采样和SMOTE的不均衡SVM分类算法[J]. 信息技术, 2012, (1): 39-43.
ZHU MING, TAO Xingmin. The SVM classifier for unbalanced data based on combination of RU-Undersample and SMOTE[J]. Information technology, 2012 (1): 39-43.
[12] LI G Z, MENG H H, LU W C, et al. Asymmetric bagging and feature selection for activitiesprediction of drug molecules[C]//International Multi-Symposiums on Computer and Computational Sciences. [S.l.], 2007: 1-11.
[13] DRUMMOND C, HOLTE R C. C4.5, Class imbalance, and cost sensitivity: why under-sampling beats over-sampling[C]//Proc of the Icml Workshop on Learning from Imbalanced Datasets Ⅱ, 2003: 1-8.
[14] BHAVANI S, NAGARGADDE A, THAWANI A, et al. Substructure-based support vector machine classifiers for prediction of adverse effects in diverse classes of drugs[J]. Journal of chemical information and modeling, 2007, 46(7): 2478-2486.
[15] 潘主强, 张林, 颜仕星, 等. 中医睡眠情绪类疾病不均衡数据的分类研究[J]. 济南大学学报(自然科学版), 2017, 31(1): 55-60.
PAN Zhuqiang, ZHANG Lin, YAN Shixing, et al. Classification research on imbalanced TCM clinical data of sleep and emotion disorder disease. Journal of university of Jinan: science and technology, 2017, 31(1): 55-60.
[16] UTANS J, MOODY J. Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction[C]//International Conference on Artificial Intelligence on Wall Street. [S.l.], 1991: 35-41.
[17] WITTEN I H, FRANK E. Data mining: practical machine learning tools and techniques with Java implementations [M]. Morgan Kaufmann Publishers Inc, 2011: 206-207.
[18] CHANG C C, LIN C J. LIBSVM: a library for support vector machines[J]. Acm transactions on intelligent systems and technology, 2007, 2(3): 389-396.

备注/Memo

备注/Memo:
收稿日期:2017-06-14;改回日期:。
基金项目:国家自然科学基金项目(81503680);中央级公益性科研院所基本科研业务费专项资金项目(ZZ0908032);全民健康保障信息化工程中医药研究项目(215005).
作者简介:潘主强,男,1987年生,硕士研究生,CCF会员,主要研究方向为数据挖掘;张林,男,1963年生,教授,博士,主要研究方向为计算机图像处理、计算机网络安全。曾获国家科学技术进步三等奖1项,发表学术论文10余篇;张磊,男,1981年生,助理研究员,博士,主要研究方向为中医临床数据挖掘。
通讯作者:张磊.E-mail:tcmxpzl@126.com.
更新日期/Last Update: 2018-01-03