[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(2):148-156.
YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(2):148-156.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
4
期数:
2009年第2期
页码:
148-156
栏目:
综述
出版日期:
2009-04-25
- Title:
-
A survey of imbalanced pattern classification problems
- 文章编号:
-
1673-4785(2009)02-0148-09
- 作者:
-
叶志飞1, 文益民2,吕宝粮1,3
-
1.上海交通大学计算机科学与工程系,上海200240;
?2.湖南工业职业技术学院信息工程系,湖南长沙410208;
?3. 上海交通大学智能计算与智能系统教育部微软重点实验室,上海200240
- Author(s):
-
YE Zhi-fei1,WEN Yi-min 2,LU Bao-liang 1,3
-
1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
2.Department of Information Engineering,Hunan Industry Polytechnic, Changsha 410208, China;
3.MOEMicrosoft Key Lab. for Intelligent Computing and Intelligent Systems, Shanghai Jiao Tong University, Shanghai 200240, China
-
- 关键词:
-
机器学习; 不平衡模式分类; 重采样; 代价敏感学习; 训练集划分; 分类器集成; 分类器性能评测
- Keywords:
-
machine learning; imbalanced pattern classification; resampling; cost sensitive learning; task decomposition; classifier ensemble; evaluation matrices
- 分类号:
-
TP181
- 文献标志码:
-
A
- 摘要:
-
实际的分类问题往往都是不平衡分类问题,采用传统的分类方法,难以得到满意的分类效果.为此,十多年来,人们相继提出了各种解决方案.对国内外不平衡分类问题的研究做了比较详细地综述,讨论了数据不平衡性引发的问题,介绍了目前几种主要的解决方案.通过仿真实验,比较了具有代表性的重采样法、代价敏感学习、训练集划分以及分类器集成在3个实际的不平衡数据集上的分类性能,发现训练集划分和分类器集成方法能较好地处理不平衡数据集,给出了针对不平衡分类问题的分类器评测指标和将来的工作.
- Abstract:
-
Imbalanced data sets have always been regarded as presenting significant difficulties when applying machine learning methods to realworld pattern classification problems. Although various approaches have been proposed during the past decade, limitations are imposed by many realworld imbalanced data sets, and as a result, a lot of further research is currently being done. In this paper, we provide an uptodate survey of research on imbalanced pattern classification problems. We first took a deep look into the problems that imbalanced data sets bring, and then we introduced different kinds of solutions in detail, with their representative approaches. Finally, using three real imbalanced data sets, we compared the performance of some typical methods including resampling, cost sensitive learning, training set partitions, and the performance of classifier ensembles. In addition, topics such as evaluation indexes and future areas of research were also discussed.
备注/Memo
收稿日期:2008-04-23.
基金项目:国家自然科学基金资助项目(60375022,60473040).
?作者简介:
叶志飞,男,1983年生,硕士,主要研究方向为统计机器学习和模式分类.
文益民,男,1969年生,博士后,副教授,CCF高级会员,主要研究方向为统计学习理论、生物信息学和图像处理.发表学术论文20余篇.
吕宝粮,男,1960年生,教授、博士生导师、博士、IEEE高级会员,主要研究方向为仿脑计算理论与模型、神经网络理论与应用、机器学习、模式识别、脑—计算机接口、生物信息学与计算生物学.已在IEEE Trans. Neural Networks, IEEE Trans. Bimedical Engineering,Neural Networks和ICCV等国际期刊和会议上发表学术论文80余篇.
更新日期/Last Update:
2009-05-04