[1]吕佳,邱小龙.基于K-means聚类和特征空间增强的噪声标签深度学习算法[J].智能系统学报,2024,19(2):267-277.[doi:10.11992/tis.202303014]
LYU Jia,QIU Xiaolong.A noisy label deep learning algorithm based on K-means clustering and feature space augmentation[J].CAAI Transactions on Intelligent Systems,2024,19(2):267-277.[doi:10.11992/tis.202303014]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第2期
页码:
267-277
栏目:
学术论文—机器学习
出版日期:
2024-03-05
- Title:
-
A noisy label deep learning algorithm based on K-means clustering and feature space augmentation
- 作者:
-
吕佳1,2, 邱小龙1,2
-
1. 重庆师范大学 计算机与信息科学学院,重庆 401331;
2. 重庆市数字农业服务工程技术研究中心,重庆 401331
- Author(s):
-
LYU Jia1,2, QIU Xiaolong1,2
-
1. College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, China;
2. Chongqing Digital Agriculture Service Engineering Technology Research Center, Chongqing 401331, China
-
- 关键词:
-
噪声标签学习; 深度学习; 半监督学习; 机器学习; 神经网络; K-means聚类; 特征空间增强; mixup算法
- Keywords:
-
noisy label learning; deep learning; semisupervised learning; machine learning; neural network; K-means clustering; feature space augmentation; mix-up algorithm
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202303014
- 文献标志码:
-
2023-12-25
- 摘要:
-
深度学习中神经网络的性能依赖于高质量的样本,然而噪声标签会降低网络的分类准确率。为降低噪声标签对网络性能的影响,噪声标签学习算法被提出。该算法首先将训练样本集划分成干净样本集和噪声样本集,然后使用半监督学习算法对噪声样本集赋予伪标签。然而,错误的伪标签以及训练样本数量不足的问题仍然限制着噪声标签学习算法性能的提升。为解决上述问题,提出基于K-means聚类和特征空间增强的噪声标签深度学习算法。首先,该算法利用K-means聚类算法对干净样本集进行标签聚类,并根据噪声样本集与聚类中心的距离大小筛选出难以分类的噪声样本,以提高训练样本的质量;其次,使用mixup算法扩充干净样本集和噪声样本集,以增加训练样本的数量;最后,采用特征空间增强算法抑制mixup算法新生成的噪声样本,从而提高网络的分类准确率。并在CIFAR10、CIFAR100、MNIST和ANIMAL-10共4个数据集上试验验证了该算法的有效性。
- Abstract:
-
The performance of neural networks in deep learning relies on high-quality samples. However, the presence of noisy labels reduces the classification accuracy of the network. To reduce the impact of noisy labels, we propose a learning algorithm that categorizes training samples into clean and noisy subsets, assigning pseudo-labels to the noisy samples using a semisupervised learning algorithm. Despite these measures, the performance of the noisy label learning algorithm can be hindered by inaccurate pseudo-labels and a lack of sufficient training samples. To address the aforementioned problems, we propose a noisy label deep learning algorithm that leverages K-means clustering and feature space augmentation. First, the algorithm applies the K-means clustering algorithm to cluster the clean samples based on their labels. It then selects noisy samples that are difficult to classify according to the distance between the noisy samples and the cluster center. This process enhances the quality of the training samples. Second, the mix-up algorithm is used to expand both the clean and noisy samples, thereby increasing the number of training samples. Finally, a feature space augmentation algorithm is used to suppress the noise samples generated by the mix-up algorithm, leading to improved network classification accuracy. The effectiveness of the proposed algorithm has been validated on four data sets: CIFAR10, CIFAR100, MNIST, and ANIMAL-10.
备注/Memo
收稿日期:2023-03-07。
基金项目:国家自然科学基金重大项目(11991024);重庆市教委“成渝地区双城经济圈建设”科技创新项目(KJCX2020024);重庆市高校创新研究群体资助项目(CXQT20015);重庆市教委科研项目重点项目(KJZD-K202200511).
作者简介:吕佳,教授,博士,主要研究方向为机器学习、数据挖掘,中国计算机学会会员。主持或参与的国家级、省部级科研项目20项,发表学术论文70余篇。E-mail:lvjia@cqnu.edu.cn;邱小龙,硕士研究生,主要研究方向为机器学习、数据挖掘、噪声标签学习算法。E-mail:2021210516067@stu.cqnu.edu.cn
通讯作者:吕佳. E-mail: lvjia@cqnu.edu.cn
更新日期/Last Update:
1900-01-01