<-Previous Article Next Article->

[1]LYU Jia,QIU Xiaolong.A noisy label deep learning algorithm based on K-means clustering and feature space augmentation[J].CAAI Transactions on Intelligent Systems,2024,19(2):267-277.[doi:10.11992/tis.202303014]

Copy

A noisy label deep learning algorithm based on K-means clustering and feature space augmentation

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 19 Number of periods: 2024 2 Page number: 267-277 Column: 学术论文—机器学习 Public date: 2024-03-05

Title:: A noisy label deep learning algorithm based on K-means clustering and feature space augmentation

Author(s):: LYU Jia¹; 2; QIU Xiaolong¹; 2; 1. College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, China;
2. Chongqing Digital Agriculture Service Engineering Technology Research Center, Chongqing 401331, China

Keywords:: noisy label learning; deep learning; semisupervised learning; machine learning; neural network; K-means clustering; feature space augmentation; mix-up algorithm

CLC:: TP181

DOI:: 10.11992/tis.202303014

Abstract:: The performance of neural networks in deep learning relies on high-quality samples. However, the presence of noisy labels reduces the classification accuracy of the network. To reduce the impact of noisy labels, we propose a learning algorithm that categorizes training samples into clean and noisy subsets, assigning pseudo-labels to the noisy samples using a semisupervised learning algorithm. Despite these measures, the performance of the noisy label learning algorithm can be hindered by inaccurate pseudo-labels and a lack of sufficient training samples. To address the aforementioned problems, we propose a noisy label deep learning algorithm that leverages K-means clustering and feature space augmentation. First, the algorithm applies the K-means clustering algorithm to cluster the clean samples based on their labels. It then selects noisy samples that are difficult to classify according to the distance between the noisy samples and the cluster center. This process enhances the quality of the training samples. Second, the mix-up algorithm is used to expand both the clean and noisy samples, thereby increasing the number of training samples. Finally, a feature space augmentation algorithm is used to suppress the noise samples generated by the mix-up algorithm, leading to improved network classification accuracy. The effectiveness of the proposed algorithm has been validated on four data sets: CIFAR10, CIFAR100, MNIST, and ANIMAL-10.

References:: [1] ZHANG Jing, WU Xindong, SHENG V S. Learning from crowdsourced labeled data: a survey[J]. Artificial intelligence review, 2016, 46(4): 543–576.
[2] 伏博毅, 彭云聪, 蓝鑫, 等. 基于深度学习的标签噪声学习算法综述[J]. 计算机应用, 2023, 43(3): 674–684
FU Boyi, PENG Yuncong, LAN Xin, et al. Survey of label noise learning algorithms based on deep learning[J]. Journal of computer applications, 2023, 43(3): 674–684
[3] HAN Bo, YAO Quanming, YU Xingrui, et al. Co-teaching: robust training of deep neural networks with extremely noisy labels[C]//Advances in Neural Information Processing Systems. Montreal: NIPS, 2018: 1602-1613.
[4] YU Xingrui, HAN Bo, YAO Jiangchao, et al. How does disagreement help generalization against label corruption [EB/OL]. (2019-01-14)[2022-12-25]. https://arxiv.org/abs/1901.04215.pdf.
[5] WEI Hongxin, FENG Lei, CHEN Xiangyu, et al. Combating noisy labels by agreement: a joint training method with co-regularization[C]//2020 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2020: 13723-13732.
[6] LI Junnan, SOCHER R, HOI S C H. DivideMix: learning with noisy labels as semi-supervised learning[EB/OL]. (2020-02-18)[2022-12-25]. https://arxiv.org/abs/2002.07394.pdf.
[7] CORDEIRO F R, SACHDEVA R, BELAGIANNIS V, et al. LongReMix: robust learning with high confidence samples in a noisy label environment[J]. Pattern recognition, 2023, 133(1): 565–581.
[8] KARIM N, KHALID U, ESMAEILI A, et al. CNLL: a semi-supervised approach for continual noisy label learning[C]//2022 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022: 3877-3887.
[9] ZAHEER M Z, LEE Jinha, ASTRID M, et al. Cleaning label noise with clusters for minimally supervised anomaly detection[EB/OL]. (2021-04-30)[2022-12-25]. https://arxiv.org/abs/2104.14770.pdf.
[10] CHENG Hao, ZHU Zhaowei, LI Xingyu, et al. Learning with instance-dependent label noise: a sample sieve appro-ach[EB/OL]. (2020-10-05)[2022-12-25]. https://arxiv.org/abs/2010.02347.pdf.
[11] ZHANG Hongyi, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. (2018-04-27)[2022-12-25]. https://arxiv.org/abs/1710.09412.pdf.
[12] LI Boyi, WU F, LIM S N, et al. On feature normalization and data augmentation[C]//2021 IEEE Conference on Computer Vision and Pattern Recognition. Kuala Lumpur: IEEE, 2021: 12378-12387.
[13] BERTHELOT D, RAFFEL C, ROY A, et al. Understanding and improving interpolation in autoencoders via an adversarial regularizer[EB/OL]. (2018-07-23)[2022-12-.25]. https://arxiv.org/abs/1807.07543.pdf.
[14] SOHN K, BERTHELOT D, CARLINI N, et al. Fixmatch: simplifying semi-supervised learning with consistency and confidence[C]//Advances in Neural Information Processing Systems. Addis Ababa: NIPS, 2020: 596-608.
[15] LIU Defu, ZHAO Jiayi, WU Jinzhao, et al. Multi-category classification with label noise by robust binary loss[J]. Neurocomputing, 2022, 482(16): 14–26.
[16] WU Songhua, XIA Xiaobo, LIU Tongliang, et al. Class2Simi: a noise reduction perspective on learning with noisy labels[C]//International Conference on Machine Learning. London: ACM, 2021: 11285-11295.
[17] SHARMA N, JAIN V, MISHRA A. An analysis of convolutional neural networks for image classification[J]. Procedia computer science, 2018, 132(9): 377–384.
[18] DENG L. The mnist database of handwritten digit images for machine learning research[J]. IEEE signal processing magazine, 2012, 29(6): 141–142.
[19] TAN C, XIA J, WU L, et al. Co-learning: Learning from noisy labels with self-supervision[C]//Proceedings of the 29th ACM International Conference on Multimedia. Chengdu: ACM, 2021: 1405-1413.
[20] PATRINI G, ROZZA A, MENON A K, et al. Making deep neural networks robust to label noise: a loss correction approach[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2233-2241.
[21] ZHANG Z, SABUNCU M. Generalized cross entropy loss fortraining deep neural networks with noisy labels[C]//Avances in Neural Information Processing Systems. Montreal: NIPS, 2018: 11400-11411.
[22] ARAZO E, ORTEGO D, ALBERT P, et al. Unsupervised label noise modeling and loss correction[C]//International Conference on Machine Learning. Los Angeles: ACM, 2019: 312-321.
[23] WANG Zhuowei, JIANG Jing, HAN Bo, et al. SemiNLL: a framework of noisy-label learning by semi-supervised learning[EB/OL]. (2020-11-02)[2022-12-27]. https://arxiv.org/abs/2012.00925.pdf.
[24] ZHOU Xiong, LIU Xianming, WANG Chenyang, et al. Learning with noisy labels via sparse regularization[C]//2021 IEEE International Conference on Computer Vision. Montreal: IEEE, 2022: 72-81.
[25] FENG Lei, SHU Senlin, LIN Zhuoyi, et al. Can cross entropy loss be robust to label noise[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. Yokohama: ACM, 2020: 2206-2212.
[26] YI Li, LIU Sheng, SHE Qi, et al. On learning contrastive representations for learning with noisy labels[C]//2022 IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 16661-16670.
[27] MENON A K, RAWAT A S, REDDI S J, et al. Can gradient clipping mitigate label noise[C]//International Conference on Learning Representations. Addis Ababa: ACM, 2020: 6204-6231.
[28] SONG H, KIM M, LEE J G. Selfie: Refurbishing unclean samples for robust deep learning[C]//International Conference on Machine Learning, Los Angeles: ACM, 2019: 5907-5915.
[29] ZHANG Yikai, ZHENG Songzhu, WU Pengxiang, et al. Learning with feature dependent label noise: a progressiv-e approach[EB/OL]. (2021-05-27)[2022-12-25]. https://ar-xiv.org/abs/2103.07756.pdf.
[30] CHEN Yingyi, SHEN Xi, HU S X, et al. Boosting co-teaching with compression regularization for label noise[C]//2021 IEEE Conference on Computer Vision and Pattern Recognition Kuala Lumpur, IEEE, 2021: 2682-2686.
[31] RIPPEL O, GELBART M, ADAMS R. Learning ordered representations with nested dropout[C]//International Conference on Machine Learning. Beijing: ACM, 2014: 1746-1754.

Similar References:

Memo

Last Update: 1900-01-01

A noisy label deep learning algorithm based on K-means clustering and feature space augmentation PDF DownloadHTML

Memo

A noisy label deep learning algorithm based on K-means clustering and feature space augmentation

PDF Download HTML