<-Previous Article Next Article->

[1]LENG Qiangkui,SUN Xuezi,MENG Xiangfu.A borderline sample synthesis oversampling method based on KNN and random affine transformation[J].CAAI Transactions on Intelligent Systems,2025,20(2):329-343.[doi:10.11992/tis.202311038]

Copy

A borderline sample synthesis oversampling method based on KNN and random affine transformation

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 2 Page number: 329-343 Column: 学术论文—机器学习 Public date: 2025-03-05

Title:: A borderline sample synthesis oversampling method based on KNN and random affine transformation

Author(s):: LENG Qiangkui; SUN Xuezi; MENG Xiangfu; School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China

Keywords:: K-nearest neighbor; linear interpolation; borderline sample; natural distribution; oversampling; three nearest neighbor theory; random affine transformation; imbalanced classification

CLC:: TP391

DOI:: 10.11992/tis.202311038

Abstract:: Oversampling is a proven strategy for addressing imbalanced data classification challenges. This paper introduces a borderline sample synthesis oversampling method based on K-nearest neighbor (KNN) and random affine transformation to improve both the seed sample selection stage and synthetic sample generation stages of existing oversampling methods. Initially, the three nearest neighbor theory is applied to establish an effective intrinsic neighborhood relationship between samples and remove noise from the dataset. This step helps reduce the risk of overfitting by subsequent classifiers. Next, the minority-class borderline samples that are difficult to learn but contain rich information are accurately identified and treated as sampling seeds. Finally, the method replaces traditional linear interpolation with local random affine transformation, uniformly generating synthetic samples within the approximate manifold of the original data. Compared with traditional oversampling methods, the proposed method more effectively leverages important borderline information within datasets, thereby enhancing classifier performance. Extensive comparative experiments were conducted on 18 benchmark datasets, comparing the proposed method against 8 classic sampling methods, each combined with 4 different classifiers. The results show that this method achieves higher F₁ scores and geometric means (G-mean), addressing the imbalanced data classification problem more effectively. Furthermore, statistical analysis confirms that the method has a higher Friedman ranking.

References:: [1] GUZMáN-PONCE A, SáNCHEZ J S, VALDOVINOS R M, et al. DBIG-US: a two-stage under-sampling algorithm to face the class imbalance problem[J]. Expert systems with applications, 2021, 168: 114301.
[2] WANG Qingyong, ZHOU Yun, ZHANG Weiming, et al. Adaptive sampling using self-paced learning for imbalanced cancer data pre-diagnosis[J]. Expert systems with applications, 2020, 152: 113334.
[3] SHEN Feng, ZHAO Xingchao, KOU Gang, et al. A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique[J].Applied soft computing, 2021, 98: 106852.
[4] RATHORE S S, CHOUHAN S S, JAIN D K, et al. Generative oversampling methods for handling imbalanced data in software fault prediction[J]. IEEE transactions on reliability, 2022, 71(2): 747-76.
[5] WEI Guoliang, MU Weimeng, SONG Yan, et al. An improved and random synthetic minority oversampling technique for imbalanced data[J]. Knowledge-based systems, 2022, 248: 108839.
[6] GUO Haixiang, LI Yijing, SHANG J, et al. Learning from class-imbalanced data: review of methods and applications[J]. Expert systems with applications, 2017, 73: 220-239.
[7] BAO Feng, DENG Yue, KONG Youyong, et al. Learning deep landmarks for imbalanced classification[J]. IEEE transactions on neural networks and learning systems, 2019, 31(8): 2691-2704.
[8] TAO Xinmin, LI Qing, GUO Wenjie, et al. Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering[J]. Information sciences, 2020, 519: 43-73.
[9] EPENDI U, ROCHIM A F, WIBOWO A. A hybrid sampling approach for improving the classification of imbalanced data using ROS and NCL methods[J]. International journal of intelligent engineering and systems, 2023, 16(3): 345-361.
[10] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1): 321-357.
[11] TAO Xinmin, ZHENG Yujia, CHEN Wei, et al. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning[J]. Information sciences, 2022, 588: 13-51.
[12] HAN Hui, WANG Wenyuan, MAO Binghuan. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing. Berlin: Springer, 2005: 878-887.
[13] HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]//2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Hong Kong: IEEE, 2008: 1322-1328.
[14] GAO Xin, JIA Xin, LIU Jing, et al. An ensemble contrastive classification framework for imbalanced learning with sample-neighbors pair construction[J]. Knowledge-based systems, 2022, 249: 109007.
[15] THEJAS G S, HARIPRASAD Y, IYENGAR S S, et al. An extension of synthetic minority oversampling technique based on Kalman filter for imbalanced datasets[J]. Machine learning with applications, 2022, 8: 100267.
[16] 周晶雨, 王士同. 对不平衡目标域的多源在线迁移学习[J]. 智能系统学报, 2022, 17(2): 248-256.
ZHOU Jingyu, WANG Shitong. Multi-source online transfer learning for imbalanced target domains[J]. CAAI transactions on intelligent systems, 2022, 17(2): 248-256.
[17] KOZIARSKI M. Radial-based undersampling for imbalanced data classification[J]. Pattern recognition, 2020, 102: 107262.
[18] 陶佳晴, 贺作伟, 冷强奎等. 基于Tomek链的边界少数类样本合成过采样方法[J]. 计算机应用研究, 2023, 40(2): 463-469.
TAO Jiaqing, HE Zuowei, LENG Qiangkui, et al. Synthetic oversampling method for boundary minority samples based on Tomek links[J]. Application research of computers, 2023, 40(2): 463-469.
[19] LENG Qiangkui, GUO Jiamei, JIAO Erjie, et al. NanBDOS: adaptive and parameter-free borderline oversampling via natural neighbor search for class-imbalance learning[J]. Knowledge-based systems, 2023, 274: 110665.
[20] HE Zuowei, TAO Jiaqing, LENG Qiangkui, et al. HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification[J]. Complex & intelligent systems, 2023, 9(4): 3971-3988.
[21] BELLINGER C, DRUMMOND C, JAPKOWICZ N. Manifold-based synthetic oversampling with manifold conformance estimation[J]. Machine learning, 2018, 107(3): 605-637.
[22] KOZIARSKI M, KRAWCZYK B, WO?NIAK M. Radial-based oversampling for noisy imbalanced data classification[J]. Neurocomputing, 2019, 343: 19-33.
[23] DOUZAS G, BACAO F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J]. Information sciences, 2019, 501: 118-135.
[24] YE Xiucai, LI Hongmin, IMAKURA A, et al. An oversampling framework for imbalanced classification based on Laplacian eigenmaps[J]. Neurocomputing, 2020, 399: 107-116.
[25] BEJ S, DAVTYAN N, WOLFIEN M, et al. LoRAS: an oversampling approach for imbalanced datasets[J]. Machine learning, 2021, 110(2): 279-301.
[26] SA?LAM F, ALI CENGIZ M. A novel SMOTE-based resampling technique trough noise detection and the boosting procedure[J]. Expert systems with applications, 2022, 200: 117023.
[27] SáEZ J A, LUENGO J, STEFANOWSKI J, et al. SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering[J]. Information sciences, 2015, 291: 184-203.
[28] WANG Xinyue, XU Jian, ZENG Tieyong, et al. Local distribution-based adaptive minority oversampling for imbalanced data classification[J]. Neurocomputing, 2021, 422: 200-213.
[29] KELLY M, LONGJOHN R, NOTTINGHAM K. Machine learning repository[EB/OL]. (1988-07-01)[2023-11-24]. https://archive.ics.uci.edu.
[30] CORTES C, VAPNIK V. Support-vector networks[J]. Machine learning, 1995, 20(3): 273-297.
[31] YANG Kaixiang, YU Zhiwen, WEN Xin, et al. Hybrid classifier ensemble for imbalanced data[J]. IEEE transactions on neural networks and learning systems, 2020, 31(4): 1387-1400.
[32] XIE Yuxi, PENG Lizhi, CHEN Zhenxiang, et al. Generative learning for imbalanced data using the Gaussian mixed model[J]. Applied soft computing, 2019, 79: 439-451.
[33] DEM?AR J. Statistical comparisons of classifiers over multiple data sets[J]. Journal of machine learning research, 2006, 7: 1-30.

Similar References:

Memo

Last Update: 2025-03-05

A borderline sample synthesis oversampling method based on KNN and random affine transformation PDF DownloadHTML

Memo

A borderline sample synthesis oversampling method based on KNN and random affine transformation

PDF Download HTML