[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(2):148-156.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(2):148-156.
点击复制

不平衡分类问题研究综述

参考文献/References:
[1]KUBAT M, HOLTE B C,MATWIN S. Machine learning for the detection of oil spills in satellite radar images[J]. Machine Learning, 1998, 30(2): 195215.
[2]CHAN P K,STOLFO S J. Toward scalable learning with nonuniform class and cost distributions: a case study in credit card fraud detection[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998:164168.
[3]CHOE W, ERSOY O K,BINA M. Neural network schemes for detecting rare events in human genomic DNA[J]. Bioinformatics, 2000, 16(12): 10621072.
[4]PLANT C, B〖AKO¨5〗HM C, BERNHARD T, et al. Enhancing instancebased classification with local density: a new algorithm for classifying unbalanced biomedical data[J]. Bioinformatics, 2006, 22(8): 981988.
[5]WEISS G M. Learning with rare cases and small disjuncts[C]// Proceedings of the 12th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1995:558565.
[6]WEISS G M, HIRSH H. A quantitative study of small disjuncts[C]//Proceedings of the 17th National Conference on Artificial Intelligence. Texas: AAAI Press, 2000: 665670.
[7]WEISS G M. Mining with rarity: a unifying framework[J]. Sigkdd Explorations, 2004, 6(1): 719. 
[8]JAPKOWICZ N, STEPHEN S. The class imbalance problem: a systematic study[J]. Intelligent Data Analysis Journal, 2002, 6(5): 429450.
[9]ARUNASALAM B, CHAWLA S. CCCS: a top down associative classifier for imbalanced class distribution[C]//International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2006:517522.
[10]DRUMMOND C, HOLTE R. Explicitly representing expected cost: an alternative to ROC representation[C]//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2000: 187207.
[11]PROVOST F, FAWCETT T. Robust classification for imprecise environments[J]. Machine Learning,2001, 42(3): 203231.
[12]DRUMMOND C, HOLTE R C. C4.5, class imbalance, and cost sensitivity: why undersampling beats oversampling[C]//International Conference on Machine Learning.Washington DC, 2003:152154.
[13]LING C,LI C. Data mining for direct marketing problems and solutions[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Ming. New York: AAAI Press, 1998:7379.
[14]CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16: 321357.
[15]LEE S S. Noisy replication in skewed binary classification [J]. Computational Statistics and Data Analysis, 2000, 34(2):165191.
[16]KUBAT M, HOTLE R,MATWIN S. Learning when negative examples abound[C]//Proceedings of the 9th European Conference on Machine Learning. London: SpringerVerlag, 1997:146153.
[17]KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: onesided selection[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997:179186.
[18]CHEN X W, GERLACH B, CASASENT D. Pruning support vectors for imbalanced data classification[C]//Proceedings of 18th International Joint Conference on Neural Networks. Montreal,Quebec,Canada,2005:18831887.
[19]RASKUTTI B, KOWALCZYK A. Extreme rebalancing for SVM’s: a case study[C]//International Conference on Machine Learning. Washington DC, 2003:6571.
[20]ESTABROOKS A, JAPKOWICZ N. A mixtureofexperts framework for learning from unbalanced data sets[C]//Proceedings of the 4th Intelligent Data Analysis Conference.Lisbon,Portugal,2001:3443.
[21]AN R, LIU Y, JIN R, et al. On predicting rare classes with SVM ensembles in scene classification[C]//IEEE International Conference on Acoustics, Speech and Signal Processing.Hong Kong, 2003:2124.
[22]LU B L, ITO M. Task decomposition and module combination based on class relations: a modular neural network for pattern classification[J]. IEEE Transaction on Neural Networks, 1999, 10(5):12441256.
[23]LU B L, WANG K A, UTIYAMA M, et al. A partversuspart method for massively parallel training of support vector machines[C]//Proceedings of 17th International Joint Conference on Neural Networks. Budapest,Hungary,2004: 735740.
[24]YE Z F , LU B L. Learning imbalanced data sets with a minmax modular support vector machine[C]//Proceedings of the 20th International Joint Conference on Neural Networks.Orlando, USA,2007: 16731678.
[25]KOTSIANTIS S B,PINTELAS P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1):4655.
[26]ESTABROOK A, TAEHO J,JAPKOWICZ N. A multiple resampling method for learning from imbalanced data sets[J]. Computational Intelligence, 2004, 20(1): 1836.
[27]CHEN C, LIAW A,BREIMAN L. Using random forest to learn imbalanced data[R]. No.666, Statistics Department, University of California at Berkeley, 2004.
[28]CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting[C]//Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. CavtatDubrovnik, Croatia, 2003:107119.
[29]LIU X Y, WU J X, ZHOU Z H. A cascadebased classification method for classimbalanced data[J]. Journal of NanJing University:Natural Science, 2006 ,42(2):148155
[30]ZHOU Z H, LIU X Y. Training costsensitive neural networks with methods addressing the class imbalance problem[J]. IEEE Transaction on Knowledge and Data Engineering, 2006, 18(1): 637
[31]PAZZANI M, MERZ C, MURPHY P, et al. Reducing misclassification costs[C]//Proceedings of the 11th International Conference on Machine Learning. San Francisco, CA, USA,1994:217225.
[32]DOMINGOS P. METACOST: a general method for making classifiers cost sensitive[C]//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. San Diego, CA:ACM Press, 1999:155164.
[33]CHE H G, BONGER R E, LIM C C. Dualnusupport vector machine with error rate and training size biasing[C]//Proceedings of the 25th IEEE International Conference on Acoustics, Speech and Signal Processing. Salt Lake City, USA: IEEE Press, 2001:12691272.
[34]FAN W, STOLFO J S, ZHANG J X,et al. AdaCost: misclassification costsensitive boosting[C]//Proceedings of the 16th International Conference on Machine Learning. San Mateo, USA, 1999:97105.
[35]JOSHI M V, AGARWAL R C, KUMAR V. Predicting rare classes: can boosting make any weak learner strong[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada:ACM Press, 2002: 297306.
[36]CHAWLA N V. C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure[C]//International Conference on Machine Learning. Washington DC, 2003:125130.
[37]ELKAN C. The foundation of costsensitive learning[C]//Proceedings of the 17th International Joint Conference on Artificial Intelligence. Seattle, Washington, 2001:239246.
[38]CARDIE C, HOWE N. Improving minority class predicting using casespecific feature weights[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 5765.
[39]ZHENG Z H, SRIHARI R. Optimally combining positive and negative features for text categorization[C]//International Conference on Machine Learning.Washington DC, 2003:241245
[40]WU G,CHANG E Y. KBA: kernel boundary alignment considering imbalanced data distribution[J]. IEEE Trans on Knowledge and Data Engineering, 2005, 17(6):786795.
[41]HONG X, CHEN S, HARRIS C J. A kernelbased twoclass classifier for imbalanced data sets[J]. IEEE Transaction on Neural Networks, 2007, 18(1): 2841.
[42]SCH〖AKO¨〗LKOPF B, PLATT J C, TAYLOR J S, et al. Estimating the support of a highdimensional distribution[J]. Neural Computation, 2001, 13(7):14431472.
[43]BRADLEY A. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J]. Pattern Recognition, 1997, 30(7):11451159.
[44]JOSHI M V. On evaluating performance of classifiers for rare classes[C]//Proceedings of the 2nd IEEE International Conference on Data Mining. Japan, 2002:641644.
[45]〖ZK(〗PARK K J, KANEHISA M. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs[J]. Bioinformatics, 2003, 19(13):16561663.
[46]MALOOF M A. Learning when data sets are imbalanced and when costs are unequal and unknown[C]//International Conference on Machine Learning.Washington DC, 2003:154160.
相似文献/References:
[1]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,2(1):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2():45.
[2]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报,2007,2(4):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2():9.
[3]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(5):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6():396.
[4]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(2):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8():95.[doi:10.3969/j.issn.1673-4785.201208012]
[5]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(2):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8():183.[doi:10.3969/j.issn.1673-4785.201209056]
[6]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(3):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8():261.
[7]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[8]孔庆超,毛文吉,张育浩.社交网站中用户评论行为预测[J].智能系统学报,2015,10(3):349.[doi:10.3969/j.issn.1673-4785.201403019]
 KONG Qingchao,MAO Wenji,ZHANG Yuhao.User comment behavior prediction in social networking sites[J].CAAI Transactions on Intelligent Systems,2015,10():349.[doi:10.3969/j.issn.1673-4785.201403019]
[9]姚霖,刘轶,李鑫鑫,等.词边界字向量的中文命名实体识别[J].智能系统学报,2016,11(1):37.[doi:10.11992/tis.201507065]
 YAO Lin,LIU Yi,LI Xinxin,et al.Chinese named entity recognition via word boundarybased character embedding[J].CAAI Transactions on Intelligent Systems,2016,11():37.[doi:10.11992/tis.201507065]
[10]钱冬,王蓓,张涛,等.结合Copula理论与贝叶斯决策理论的分类算法[J].智能系统学报,2016,11(1):78.[doi:10.11992/tis.201509011]
 QIAN Dong,WANG Bei,ZHANG Tao,et al.Classification algorithm based on Copula theory and Bayesian decision theory[J].CAAI Transactions on Intelligent Systems,2016,11():78.[doi:10.11992/tis.201509011]

备注/Memo

收稿日期:2008-04-23.
基金项目:国家自然科学基金资助项目(60375022,60473040).
 作者简介:
叶志飞,男,1983年生,硕士,主要研究方向为统计机器学习和模式分类.
文益民,男,1969年生,博士后,副教授,CCF高级会员,主要研究方向为统计学习理论、生物信息学和图像处理.发表学术论文20余篇.
吕宝粮,男,1960年生,教授、博士生导师、博士、IEEE高级会员,主要研究方向为仿脑计算理论与模型、神经网络理论与应用、机器学习、模式识别、脑—计算机接口、生物信息学与计算生物学.已在IEEE Trans. Neural Networks, IEEE Trans. Bimedical Engineering,Neural Networks和ICCV等国际期刊和会议上发表学术论文80余篇.

更新日期/Last Update: 2009-05-04
Copyright @ 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134