[1]YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(2):148-156.
Copy

A survey of imbalanced pattern classification problems

References:
[1]KUBAT M, HOLTE B C,MATWIN S. Machine learning for the detection of oil spills in satellite radar images[J]. Machine Learning, 1998, 30(2): 195215.
[2]CHAN P K,STOLFO S J. Toward scalable learning with nonuniform class and cost distributions: a case study in credit card fraud detection[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998:164168.
[3]CHOE W, ERSOY O K,BINA M. Neural network schemes for detecting rare events in human genomic DNA[J]. Bioinformatics, 2000, 16(12): 10621072.
[4]PLANT C, B〖AKO¨5〗HM C, BERNHARD T, et al. Enhancing instancebased classification with local density: a new algorithm for classifying unbalanced biomedical data[J]. Bioinformatics, 2006, 22(8): 981988.
[5]WEISS G M. Learning with rare cases and small disjuncts[C]// Proceedings of the 12th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1995:558565.
[6]WEISS G M, HIRSH H. A quantitative study of small disjuncts[C]//Proceedings of the 17th National Conference on Artificial Intelligence. Texas: AAAI Press, 2000: 665670.
[7]WEISS G M. Mining with rarity: a unifying framework[J]. Sigkdd Explorations, 2004, 6(1): 719. 
[8]JAPKOWICZ N, STEPHEN S. The class imbalance problem: a systematic study[J]. Intelligent Data Analysis Journal, 2002, 6(5): 429450.
[9]ARUNASALAM B, CHAWLA S. CCCS: a top down associative classifier for imbalanced class distribution[C]//International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2006:517522.
[10]DRUMMOND C, HOLTE R. Explicitly representing expected cost: an alternative to ROC representation[C]//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2000: 187207.
[11]PROVOST F, FAWCETT T. Robust classification for imprecise environments[J]. Machine Learning,2001, 42(3): 203231.
[12]DRUMMOND C, HOLTE R C. C4.5, class imbalance, and cost sensitivity: why undersampling beats oversampling[C]//International Conference on Machine Learning.Washington DC, 2003:152154.
[13]LING C,LI C. Data mining for direct marketing problems and solutions[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Ming. New York: AAAI Press, 1998:7379.
[14]CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16: 321357.
[15]LEE S S. Noisy replication in skewed binary classification [J]. Computational Statistics and Data Analysis, 2000, 34(2):165191.
[16]KUBAT M, HOTLE R,MATWIN S. Learning when negative examples abound[C]//Proceedings of the 9th European Conference on Machine Learning. London: SpringerVerlag, 1997:146153.
[17]KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: onesided selection[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997:179186.
[18]CHEN X W, GERLACH B, CASASENT D. Pruning support vectors for imbalanced data classification[C]//Proceedings of 18th International Joint Conference on Neural Networks. Montreal,Quebec,Canada,2005:18831887.
[19]RASKUTTI B, KOWALCZYK A. Extreme rebalancing for SVM’s: a case study[C]//International Conference on Machine Learning. Washington DC, 2003:6571.
[20]ESTABROOKS A, JAPKOWICZ N. A mixtureofexperts framework for learning from unbalanced data sets[C]//Proceedings of the 4th Intelligent Data Analysis Conference.Lisbon,Portugal,2001:3443.
[21]AN R, LIU Y, JIN R, et al. On predicting rare classes with SVM ensembles in scene classification[C]//IEEE International Conference on Acoustics, Speech and Signal Processing.Hong Kong, 2003:2124.
[22]LU B L, ITO M. Task decomposition and module combination based on class relations: a modular neural network for pattern classification[J]. IEEE Transaction on Neural Networks, 1999, 10(5):12441256.
[23]LU B L, WANG K A, UTIYAMA M, et al. A partversuspart method for massively parallel training of support vector machines[C]//Proceedings of 17th International Joint Conference on Neural Networks. Budapest,Hungary,2004: 735740.
[24]YE Z F , LU B L. Learning imbalanced data sets with a minmax modular support vector machine[C]//Proceedings of the 20th International Joint Conference on Neural Networks.Orlando, USA,2007: 16731678.
[25]KOTSIANTIS S B,PINTELAS P E. Mixture of expert agents for handling imbalanced data sets[J]. Annals of Mathematics, Computing & Teleinformatics, 2003, 1(1):4655.
[26]ESTABROOK A, TAEHO J,JAPKOWICZ N. A multiple resampling method for learning from imbalanced data sets[J]. Computational Intelligence, 2004, 20(1): 1836.
[27]CHEN C, LIAW A,BREIMAN L. Using random forest to learn imbalanced data[R]. No.666, Statistics Department, University of California at Berkeley, 2004.
[28]CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting[C]//Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. CavtatDubrovnik, Croatia, 2003:107119.
[29]LIU X Y, WU J X, ZHOU Z H. A cascadebased classification method for classimbalanced data[J]. Journal of NanJing University:Natural Science, 2006 ,42(2):148155
[30]ZHOU Z H, LIU X Y. Training costsensitive neural networks with methods addressing the class imbalance problem[J]. IEEE Transaction on Knowledge and Data Engineering, 2006, 18(1): 637
[31]PAZZANI M, MERZ C, MURPHY P, et al. Reducing misclassification costs[C]//Proceedings of the 11th International Conference on Machine Learning. San Francisco, CA, USA,1994:217225.
[32]DOMINGOS P. METACOST: a general method for making classifiers cost sensitive[C]//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. San Diego, CA:ACM Press, 1999:155164.
[33]CHE H G, BONGER R E, LIM C C. Dualnusupport vector machine with error rate and training size biasing[C]//Proceedings of the 25th IEEE International Conference on Acoustics, Speech and Signal Processing. Salt Lake City, USA: IEEE Press, 2001:12691272.
[34]FAN W, STOLFO J S, ZHANG J X,et al. AdaCost: misclassification costsensitive boosting[C]//Proceedings of the 16th International Conference on Machine Learning. San Mateo, USA, 1999:97105.
[35]JOSHI M V, AGARWAL R C, KUMAR V. Predicting rare classes: can boosting make any weak learner strong[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada:ACM Press, 2002: 297306.
[36]CHAWLA N V. C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure[C]//International Conference on Machine Learning. Washington DC, 2003:125130.
[37]ELKAN C. The foundation of costsensitive learning[C]//Proceedings of the 17th International Joint Conference on Artificial Intelligence. Seattle, Washington, 2001:239246.
[38]CARDIE C, HOWE N. Improving minority class predicting using casespecific feature weights[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 5765.
[39]ZHENG Z H, SRIHARI R. Optimally combining positive and negative features for text categorization[C]//International Conference on Machine Learning.Washington DC, 2003:241245
[40]WU G,CHANG E Y. KBA: kernel boundary alignment considering imbalanced data distribution[J]. IEEE Trans on Knowledge and Data Engineering, 2005, 17(6):786795.
[41]HONG X, CHEN S, HARRIS C J. A kernelbased twoclass classifier for imbalanced data sets[J]. IEEE Transaction on Neural Networks, 2007, 18(1): 2841.
[42]SCH〖AKO¨〗LKOPF B, PLATT J C, TAYLOR J S, et al. Estimating the support of a highdimensional distribution[J]. Neural Computation, 2001, 13(7):14431472.
[43]BRADLEY A. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J]. Pattern Recognition, 1997, 30(7):11451159.
[44]JOSHI M V. On evaluating performance of classifiers for rare classes[C]//Proceedings of the 2nd IEEE International Conference on Data Mining. Japan, 2002:641644.
[45]〖ZK(〗PARK K J, KANEHISA M. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs[J]. Bioinformatics, 2003, 19(13):16561663.
[46]MALOOF M A. Learning when data sets are imbalanced and when costs are unequal and unknown[C]//International Conference on Machine Learning.Washington DC, 2003:154160.
Similar References:

Memo

-

Last Update: 2009-05-04

Copyright © CAAI Transactions on Intelligent Systems