[1]JI Changpeng,SHANG Jiaqi,DAI Wei.DC-SMOTE oversampling method for an imbalanced dataset[J].CAAI Transactions on Intelligent Systems,2024,19(3):525-533.[doi:10.11992/tis.202204013]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 3
Page number:
525-533
Column:
学术论文—机器学习
Public date:
2024-05-05
- Title:
-
DC-SMOTE oversampling method for an imbalanced dataset
- Author(s):
-
JI Changpeng1; SHANG Jiaqi2; DAI Wei1
-
1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China;
2. Graduate School, Liaoning Technical University, Huludao 125105, China
-
- Keywords:
-
imbalanced dataset; oversampling; Gaussian kernel; local gravity; high-imbalanced data; SMOTE; imbalance ratio; classification
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202204013
- Abstract:
-
Inspired by the poor performance of imbalanced datasets in classification tasks, an oversampling algorithm based on local density and centrality is proposed. First, for all the minority sample points in the dataset, the Gaussian kernel function and local gravity are used to calculate the local density and centrality, respectively. Furthermore, the first type of new samples is synthesized for the portion with small local density to solve the imbalance problem within this kind. According to the difference of centrality, the boundaries of minority samples are distinguished, and the second kind of samples are specifically synthesized to strengthen the boundaries. Meanwhile, new samples are generated adaptively, which solves the problem that most oversampling algorithms fail to clearly define the oversampling quantity or blindly pursue the balance of the number of samples of two categories. Finally, experiments are conducted on 12 public imbalanced datasets and results reveal that the algorithm has good performance in low- and high-imbalanced datasets.