[1]LYU Jia,QIU Xiaolong.A noisy label deep learning algorithm based on K-means clustering and feature space augmentation[J].CAAI Transactions on Intelligent Systems,2024,19(2):267-277.[doi:10.11992/tis.202303014]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 2
Page number:
267-277
Column:
学术论文—机器学习
Public date:
2024-03-05
- Title:
-
A noisy label deep learning algorithm based on K-means clustering and feature space augmentation
- Author(s):
-
LYU Jia1; 2; QIU Xiaolong1; 2
-
1. College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, China;
2. Chongqing Digital Agriculture Service Engineering Technology Research Center, Chongqing 401331, China
-
- Keywords:
-
noisy label learning; deep learning; semisupervised learning; machine learning; neural network; K-means clustering; feature space augmentation; mix-up algorithm
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202303014
- Abstract:
-
The performance of neural networks in deep learning relies on high-quality samples. However, the presence of noisy labels reduces the classification accuracy of the network. To reduce the impact of noisy labels, we propose a learning algorithm that categorizes training samples into clean and noisy subsets, assigning pseudo-labels to the noisy samples using a semisupervised learning algorithm. Despite these measures, the performance of the noisy label learning algorithm can be hindered by inaccurate pseudo-labels and a lack of sufficient training samples. To address the aforementioned problems, we propose a noisy label deep learning algorithm that leverages K-means clustering and feature space augmentation. First, the algorithm applies the K-means clustering algorithm to cluster the clean samples based on their labels. It then selects noisy samples that are difficult to classify according to the distance between the noisy samples and the cluster center. This process enhances the quality of the training samples. Second, the mix-up algorithm is used to expand both the clean and noisy samples, thereby increasing the number of training samples. Finally, a feature space augmentation algorithm is used to suppress the noise samples generated by the mix-up algorithm, leading to improved network classification accuracy. The effectiveness of the proposed algorithm has been validated on four data sets: CIFAR10, CIFAR100, MNIST, and ANIMAL-10.