[1]石洪波,陈雨文,陈鑫.SMOTE过采样及其改进算法研究综述[J].智能系统学报,2019,14(6):1073-1083.[doi:10.11992/tis.201906052]
 SHI Hongbo,CHEN Yuwen,CHEN Xin.Summary of research on SMOTE oversampling and its improved algorithms[J].CAAI Transactions on Intelligent Systems,2019,14(6):1073-1083.[doi:10.11992/tis.201906052]
点击复制

SMOTE过采样及其改进算法研究综述

参考文献/References:
[1] VASIGHIZAKER A, JALILI S. C-PUGP:a cluster-based positive unlabeled learning method for disease gene prediction and prioritization[J]. Computational biology and chemistry, 2018, 76:23-31.
[2] JURGOVSKY J, GRANITZER M, ZIEGLER K, et al. Sequence classification for credit-card fraud detection[J]. Expert systems with applications, 2018, 100:234-245.
[3] KIM J H. Time frequency image and artificial neural network based classification of impact noise for machine fault diagnosis[J]. International journal of precision engineering and manufacturing, 2018, 19(6):821-827.
[4] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1):321-357.
[5] FERNáNDEZ A, GARCIA S, HERRERA F, et al. SMOTE for learning from imbalanced data:Progress and challenges, marking the 15-year anniversary[J]. Journal of artificial intelligence research, 2018, 61:863-905.
[6] HAN Hui, WANG Wenyuan, MAO Binghuan. Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//Proceedings of International Conference on Intelligent Computing. Hefei, China, 2005:878-887.
[7] BUNKHUMPORNPAT C, SINAPIROMSARAN K, LURSINSAP C. Safe-level-SMOTE:safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem[C]//Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Bangkok, Thailand, 2009:475-482.
[8] HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN:adaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China, 2008:1322-1328.
[9] ZHU Tuanfai, LIN Yaping, LIU Yonghe. Synthetic minority oversampling technique for multiclass imbalance problems[J]. Pattern recognition, 2017, 72:327-340.
[10] DOUZAS G, BACAO F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J]. Information sciences, 2019, 501:118-135.
[11] SEIFFERT C, KHOSHGOFTAAR T M, VAN HULSE J. Hybrid sampling for imbalanced data[J]. Integrated computer-aided engineering, 2009, 16(3):193-210.
[12] GAZZAH S, HECHKEL A, AMARA N E B. A hybrid sampling method for imbalanced data[C]//Proceedings of 2015 IEEE 12th International Multi-Conference on Systems, Signals & Devices. Mahdia, Tunisia, 2015:1-6.
[13] 古平, 欧阳源遊. 基于混合采样的非平衡数据集分类研究[J]. 计算机应用研究, 2015, 32(2):379-381, 418 GU Ping, OUYANG Yuanyou. Classification research for unbalanced data based on mixed-sampling[J]. Application research of computers, 2015, 32(2):379-381, 418
[14] SONG Jia, HUANG Xianglin, QIN Sijun, et al. A bi-directional sampling based on k-means method for imbalance text classification[C]//Proceedings of 2016 IEEE/ACIS International Conference on Computer and Information Science. Okayama, Japan, 2016:1-5.
[15] 冯宏伟, 姚博, 高原, 等. 基于边界混合采样的非均衡数据处理算法[J]. 控制与决策, 2017, 32(10):1831-1836 FENG Hongwei, YAO Bo, GAO Yuan, et al. Imbalanced data processing algorithm based on boundary mixed sampling[J]. Control and decision, 2017, 32(10):1831-1836
[16] 赵自翔, 王广亮, 李晓东. 基于支持向量机的不平衡数据分类的改进欠采样方法[J]. 中山大学学报(自然科学版), 2012, 51(6):10-16 ZHAO Zixiang, WANG Guangliang, LI Xiaodong. An improved SVM based under-sampling method for classifying imbalanced data[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2012, 51(6):10-16
[17] JIA Cangzhi, ZUO Yun. S-SulfPred:a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique[J]. Journal of theoretical biology, 2017, 422:84-49.
[18] HANSKUNATAI A. A new hybrid sampling approach for classification of imbalanced datasets[C]//Proceedings of 2018 International Conference on Computer and Communication Systems. Nagoya, Japan, 2018:67-71.
[19] SHI Hongbo, GAO Qigang, JI Suqin, et al. A hybrid sampling method based on safe screening for imbalanced datasets with sparse structure[C]//Proceedings of 2018 International Joint Conference on Neural Networks. Rio de Janeiro, Brazil, 2018:1-8.
[20] 吴艺凡, 梁吉业, 王俊红. 基于混合采样的非平衡数据分类算法[J]. 计算机科学与探索, 2019, 13(2):342-349 WU Yifan, LIANG Jiye, WANG Junhong. Classification algorithm based on hybrid sampling for unbalanced data[J]. Journal of frontiers of computer science and technology, 2019, 13(2):342-349
[21] RAMENTOL E, CABALLERO Y, BELLO R, et al. SMOTE-RSB*:a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory[J]. Knowledge and information systems, 2012, 33(2):245-265.
[22] SáEZ J A, LUENGO J, STEFANOWSKI J, et al. SMOTE-IPF:addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J]. Information sciences, 2015, 291:184-203.
[23] RADWAN A M. Enhancing prediction on imbalance data by thresholding technique with noise filtering[C]//Proceedings of 2017 International Conference on Information Technology. Amman, Jordan, 2017:399-404.
[24] ZHANG Jianjun, NG W. Stochastic sensitivity measure-based noise filtering and oversampling method for imbalanced classification problems[C]//Proceedings of 2018 IEEE International Conference on Systems, Man, and Cybernetics. Miyazaki, Japan, 2018:403-408.
[25] BISPO A, PRUDENCIO R, VéRAS D. Instance selection and class balancing techniques for cross project defect prediction[C]//Proceedings of 2018 Brazilian Conference on Intelligent Systems. Sao Paulo, Brazil, 2018:552-557.
[26] BATISTA G E A P A, PRATI R C, MONARD M C. A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD explorations newsletter, 2004, 6(1):20-29.
[27] BARUA S, ISLAM M M, YAO Xin, et al. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE transactions on knowledge and data engineering, 2014, 26(2):405-425.
[28] PRUENGKARN R, WONG K W, FUNG C C. Multiclass imbalanced classification using fuzzy C-mean and SMOTE with fuzzy support vector machine[C]//Proceedings of the 24th International Conference on Neural Information Processing. Guangzhou, China, 2017:67-75.
[29] DOUZAS G, BACAO F, LAST F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information sciences, 2018, 465:1-20.
[30] 楼晓俊, 孙雨轩, 刘海涛. 聚类边界过采样不平衡数据分类方法[J]. 浙江大学学报(工学版), 2013, 47(6):944-950 LOU Xiaojun, SUN Yuxuan, LIU Haitao. Clustering boundary over-sampling classification method for imbalanced data sets[J]. Journal of Zhejiang University (Engineering Science), 2013, 47(6):944-950
[31] MA Li, FAN Suohai. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J]. BMC bioinformatics, 2017, 18(1):169.
[32] IJAZ M F, ALFIAN G, SYAFRUDIN M, et al. Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest[J]. Applied sciences, 2018, 8(8):1325.
[33] 盛凯, 刘忠, 周德超, 等. 面向不平衡分类的IDP-SMOTE重采样算法[J]. 计算机应用研究, 2019, 36(01):115-118 SHENG Kai, LIU Zhong, ZHOU Dechao, et al. IDP-SMOTE resampling algorithm for imbalanced classification[J]. Application research of computers, 2019, 36(01):115-118
[34] BLAGUS R, LUSA L. SMOTE for high-dimensional class-imbalanced data[J]. BMC bioinformatics, 2013, 14:106.
[35] ABDI L, HASHEMI S. To combat multi-class imbalanced problems by means of over-sampling techniques[J]. IEEE transactions on knowledge and data engineering, 2016, 28(1):238-251.
[36] WANG Jin, YUN Bo, HUANG Pingli, et al. Applying threshold SMOTE algorithm with attribute bagging to imbalanced datasets[C]//Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology. Halifax, NS, Canada, 2013:221-228.
[37] MATHEW J, LUO Ming, PANG C K, et al. Kernel-based SMOTE for SVM classification of imbalanced datasets[C]//Proceedings of IECON 2015-41st Annual Conference of the IEEE Industrial Electronics Society. Yokohama, Japan, 2015:1127-1132.
[38] BELLINGER C, DRUMMOND C, JAPKOWICZ N. Beyond the boundaries of SMOTE-A framework for manifold-based synthetically oversampling[C]//Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Riva del Garda, Italy, 2016:248-263.
[39] BELLINGER C, JAPKOWICZ N, DRUMMOND C. Synthetic oversampling for advanced radioactive threat detection[C]//Proceedings of 2015 IEEE International Conference on Machine Learning and Applications. Miami, FL, USA, 2015:948-953.
[40] LI Xiao, ZOU Beiji, WANG Lei, et al. A novel LASSO-based feature weighting selection method for microarray data classification[C]//Proceedings of 2015 IET International Conference on Biomedical Image and Signal Processing. Beijing, China, 2015:1-5.
[41] ZHANG Chunkai, GUO Jianwei, LU Junru. Research on classification method of high-dimensional class-imbalanced data sets based on SVM[C]//Proceedings of the 2nd IEEE International Conference on Data Science in Cyberspace. Shenzhen, China, 2017:60-67.
[42] GUYON I, WESTON J, BARNHILL S, et al. Gene selection for cancer classification using support vector machines[J]. Machine learning, 2002, 46(1/2/3):389-422.
[43] 许召召, 李京华, 陈同林, 等. 融合SMOTE与Filter-Wrapper的朴素贝叶斯决策树算法及其应用[J]. 计算机科学, 2018, 45(9):65-69, 74 XU Zhaozhao, LI Jinghua, CHEN Tonglin, et al. Naive Bayesian decision tree algorithm combining SMOTE and Filter-Wrapper and it’s application[J]. Computer science, 2018, 45(9):65-69, 74
[44] GUO Lei, WANG Shunfang F. Membrane protein type prediction for high-dimensional imbalanced datasets[C]//Proceedings of 2018 International Conference on Information Technology in Medicine and Education. Hangzhou, China, 2018:847-851.
[45] TORGO L, BRANCO P, RIBEIRO R P, et al. Resampling strategies for regression[J]. Expert systems, 2015, 32(3):465-476.
[46] MONIZ N, BRANCO P, TORGO L. Resampling strategies for imbalanced time series[C]//Proceedings of 2016 IEEE International Conference on Data Science and Advanced Analytics. Montreal, QC, Canada, 2016:282-291.
[47] BRANCO P, TORGO L, RIBEIRO R P. REBAGG:REsampled BAGGing for imbalanced regression[C]//Proceedings of International Workshop on Learning with Imbalanced Domains:Theory and Applications. Dublin, Ireland, 2018:67-81.
[48] PéREZ-ORTIZ M, GUTIéRREZ P A, HERVáS-MARTíNEZ C, et al. Graph-based approaches for over-sampling in the context of ordinal regression[J]. IEEE transactions on knowledge and data engineering, 2015, 27(5):1233-1245.
[49] ZHU Tuanfei, LIN Yaping, LIU Yonghe, et al. Minority oversampling for imbalanced ordinal regression[J]. Knowledge-based systems, 2019, 166:140-155.
[50] COST S, SALZBERG S. A weighted nearest neighbor algorithm for learning with symbolic features[J]. Machine learning, 1993, 10(1):57-78.
[51] KURNIAWATI Y E, PERMANASARI A E, FAUZIATI S. Adaptive synthetic-nominal (ADASYN-N) and adaptive synthetic-KNN (ADASYN-KNN) for multiclass imbalance learning on laboratory test data[C]//Proceedings of 2018 International Conference on Science and Technology. Yogyakarta, Indonesia, 2018:1-6.
[52] WILSON D R, MARTINEZ T R. Improved heterogeneous distance functions[J]. Journal of artificial intelligence research, 1997, 6:1-34.
[53] AHMAD A, DEY L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[J]. Pattern recognition letters, 2007, 28(1):110-118.
[54] KULLBACK S, LEIBLER R A. On information and sufficiency[J]. The annals of mathematical statistics, 1951, 22(1):79-86.
[55] IENCO D, PENSA R G, MEO R. Context-based distance learning for categorical data clustering[C]//Proceedings of the 8th International Symposium on Intelligent Data Analysis. Lyon, France, 2009:83-94.
[56] DEL RíO S, LóPEZ V, BENíTEZ J M, et al. On the use of MapReduce for imbalanced big data using Random Forest[J]. Information sciences, 2014, 285:112-137.
[57] GUO Haixiang, LI Yijing, SHANG J, et al. Learning from class-imbalanced data:review of methods and applications[J]. Expert systems with applications, 2017, 73:220-239.
[58] GHAZIKHANI A, MONSEFI R, YAZDI H S. Online neural network model for non-stationary and imbalanced data stream classification[J]. International journal of machine learning and cybernetics, 2014, 5(1):51-62.
[59] WANG Shuo, MINKU L L, YAO Xin. A multi-objective ensemble method for online class imbalance learning[C]//Proceedings of 2014 International Joint Conference on Neural Networks. Beijing, China, 2014:3311-3318.
[60] WANG Shuo, MINKU L L, YAO Xin. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE transactions on knowledge and data engineering, 2015, 27(5):1356-1368.
[61] MIRZA B, LIN Zhiping, LIU Nan. Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift[J]. Neurocomputing, 2015, 149:316-329.
[62] GHAZIKHANI A, MONSEFI R, YAZDI H S. Ensemble of online neural networks for non-stationary and imbalanced data streams[J]. Neurocomputing, 2013, 122:535-544.
[63] DITZLER G, POLIKAR R. Incremental learning of concept drift from streaming imbalanced data[J]. IEEE transactions on knowledge and data engineering, 2013, 25(10):2283-2301.
[64] ERTEKIN ?. Adaptive oversampling for imbalanced data classification[C]//Proceedings of the 28th International Symposium on Computer and Information Sciences. Paris, France, 2013:261-269.
[65] MOUTAFIS P, KAKADIARIS I A. GS4:generating synthetic samples for semi-supervised nearest neighbor classification[C]//Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Tainan, China, 2014:393-403.
[66] TRIGUERO I, GARCIA S, HERRERA F. SEG-SSC:a framework based on synthetic examples generation for self-labeled semi-supervised classification[J]. IEEE transactions on cybernetics, 2015, 45(4):622-634.
[67] DONG Aimei, CHUNG F L, WANG Shitong. Semi-supervised classification method through oversampling and common hidden space[J]. Information sciences, 2016, 349-350:216-228.
相似文献/References:
[1]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(3):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8():261.
[2]黄庆康,宋恺涛,陆建峰.应用于不平衡多分类问题的损失平衡函数[J].智能系统学报,2019,14(5):953.[doi:10.11992/tis.201808004]
 HUANG Qingkang,SONG Kaitao,LU Jianfeng.Application of the loss balance function to the imbalanced multi-classification problems[J].CAAI Transactions on Intelligent Systems,2019,14():953.[doi:10.11992/tis.201808004]

备注/Memo

收稿日期:2019-06-27。
基金项目:国家自然科学基金资助项目(61801279);山西省自然科学基金项目(201801D121115,2014011022-2)
作者简介:石洪波,女,1965年生,教授,博士生导师,主要研究方向为机器学习、人工智能。主持和参与国家自然科学基金项目、山西省自然科学基金项目等20余项。发表学术论文50余篇;陈雨文,女,1995年生,硕士研究生,主要研究方向为数据挖掘、商务智能;陈鑫,男,1995年生,硕士研究生,主要研究方向为机器学习、数据挖掘、商务智能
通讯作者:石洪波.E-mail:shihb@sxufe.edu.cn

更新日期/Last Update: 2019-12-25
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com