[1]ZHANG Yan,DU Hongle.Imbalanced heterogeneous data ensemble classification based on HVDM-KNN[J].CAAI Transactions on Intelligent Systems,2019,14(4):733-742.[doi:10.11992/tis.201807023]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
14
Number of periods:
2019 4
Page number:
733-742
Column:
学术论文—机器学习
Public date:
2019-07-02
- Title:
-
Imbalanced heterogeneous data ensemble classification based on HVDM-KNN
- Author(s):
-
ZHANG Yan; DU Hongle
-
School of Math and Computer Application, Shangluo University, Shangluo 726000, China
-
- Keywords:
-
heterogeneous data; imbalanced data; heterogeneous value difference metric; ensemble learning; over sampling; undersampling
- CLC:
-
TP391.4
- DOI:
-
10.11992/tis.201807023
- Abstract:
-
A novel classification method, the heterogeneous value difference metric-Adaboost-KNN (HVDM-Adaboost-KNN), is proposed to achieve data resampling, to obtain an ensemble learning algorithm, and to construct a weak classifier for addressing the imbalanced classification of a heterogeneous dataset. This algorithm initially equalizes the dataset using a clustering algorithm to obtain several equalized data subsets and constructs several sub-classifiers. Further, the heterogeneous distance is used to calculate the distance between two samples in the heterogeneous dataset to improve the classification accuracy of the KNN algorithm. Subsequently, the Adaboost algorithm is used to iteratively obtain the final classifier. Eight groups of UCI datasets are used to evaluate the classification performance of the algorithm in imbalanced datasets. The Adaboost experimental results denote that the classification performance of indices, such as the F1 value, AUC, and G-means, using the heterogeneous imbalanced datasets was better when compared with that exhibited by other algorithms.