<-上一篇/Previous Article 下一篇/Next Article->

[1]李冰,王月,张易牧,等.改进RT-DETR的金属表面缺陷检测算法[J].智能系统学报,2025,20(6):1404-1419.[doi:10.11992/tis.202502021]
　LI Bing,WANG Yue,ZHANG Yimu,et al.Metal surface defect detection algorithm based on improved RT-DETR algorithm[J].CAAI Transactions on Intelligent Systems,2025,20(6):1404-1419.[doi:10.11992/tis.202502021]

点击复制

改进RT-DETR的金属表面缺陷检测算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第6期页码: 1404-1419 栏目: 学术论文—机器感知与模式识别出版日期: 2025-11-05

Title:: Metal surface defect detection algorithm based on improved RT-DETR algorithm

作者:: 李冰^1,2, 王月¹, 张易牧¹, 魏乐涛¹, 颉卓凡¹, 叶猛¹, 翟永杰^1,2; 1. 华北电力大学自动化系, 河北保定 071003;
2. 保定市电力系统智能机器人感知与控制重点实验室, 河北保定 071003

Author(s):: LI Bing^1,2, WANG Yue¹, ZHANG Yimu¹, WEI Letao¹, XIE Zhuofan¹, YE Meng¹, ZHAI Yongjie^1,2; 1. Department of Automation, North ChinaElectricPower University, Baoding 071003, China;
2. Baoding Key Laboratory of Intelligent Robot Perception and Control in Electric Power System, Baoding 071003, China

关键词:: 深度学习; 金属表面缺陷; 小目标; RT-DETR; 特征融合; 注意力机制; 差分卷积; 目标检测

Keywords:: deep learning; metal surface defects; small target; RT-DETR; feature fusion; attention mechanism; difference convolution; object detection

分类号:: TP183

DOI:: 10.11992/tis.202502021

摘要:: 针对金属表面缺陷检测任务中检测目标小、尺度变化大、背景复杂等问题，提出了一种基于RT-DETR(real-time detection Transformer)的改进模型——HAS-DETR(high accurancy for small object-DETR)。HAS-DETR通过在骨干网络中引入复合差分卷积，增强对小目标的特征提取能力；构建双重多尺度特征融合模块，有效捕获全局语义信息与细节特征，解决目标尺度变化大的问题；设计全局多尺度注意力机制，替代AIFI(attention-based intra-scale feature interaction)模块中的多头注意力机制，提高模型在复杂背景和多尺度目标场景中的鲁棒性和精确度。在金属表面缺陷数据集上，HAS-DETR在mAP50和mAP50-95上分别较RT-DETR提升了6.5%和4.5%；在公开ADPPP数据集上，mAP50提升了2%，mAP50-95提升了1.3%。实验结果表明：HAS-DETR在保持较高检测效率的同时，有效提升了在复杂背景中对小目标的检测精度，具有良好的实际应用前景。

Abstract:: To address the challenges posed by small detection targets, significant scale variations, and complex backgrounds in metal surface defect detection tasks, an improved model based on RT-DETR (real-time detection transformer) has been proposed. This model is referred to as HAS-DETR (high accuracy for small object-DETR). HAS-DETR enhances the feature extraction capability for small targets by introducing a multiple differential convolution module (MDConv) into the backbone network. A double multiscale feature fusion module is constructed to effectively capture global semantic information and detailed features, addressing the problem of scale variations. Additionally, a global multiscale attention mechanism has been developed to replace the multihead attention mechanism in the AIFI (attention-based intra-scale feature interaction) module. This modification has been shown to enhance the model’s robustness and accuracy in complex backgrounds and multiscale target scenarios. On the metal surface defect dataset, HAS-DETR has been demonstrated to achieve improvements of 6.5% in mAP50 and 4.5% in mAP50-95 compared to RT-DETR. On the public ADPPP dataset, the model demonstrates a 2.0% enhancement in mAP50 and a 1.3% improvement in mAP50-95. Experimental results demonstrate that HAS-DETR significantly enhances the detection accuracy for small objects in complex backgrounds while maintaining high detection efficiency. These findings indicate that HAS-DETR has strong potential for practical industrial applications.

参考文献/References:: [1] 李宗祐, 高春艳, 吕晓玲, 等. 基于深度学习的金属材料表面缺陷检测综述[J]. 制造技术与机床, 2023(6): 61-67.
LI Zongyou, GAO Chunyan, LV Xiaoling, et al. A review of surface defect detection for metal materials based on deep learning[J]. Manufacturing technology & machine tool, 2023(6): 61-67.
[2] 孙卫波, 丁卫. 改进YOLOv7的带钢表面缺陷检测算法[J]. 工业控制计算机, 2024, 37(8): 94-96,101.
SUN Weibo, DING Wei. Improved YOLOv7 strip surface defect detection algorithm[J]. Industrial control computer, 2024, 37(8): 94-96,101.
[3] 马鸽, 邓开宏, 李国章, 等. 基于改进YOLOv5s模型的金属表面缺陷检测方法[J]. 广州大学学报(自然科学版), 2024, 23(4): 9-19.
MA Ge, DENG Kaihong, LI Guozhang, et al. Metal surface defect dectection method based on an improved YOLOv5s model[J]. Journal of Guangzhou University (natural science edition), 2024, 23(4): 9-19.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[5] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
[6] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[7] 向宽, 李松松, 栾明慧, 等. 基于改进Faster RCNN的铝材表面缺陷检测方法[J]. 仪器仪表学报, 2021, 42(1): 191-198.
XIANG Kuan, LI Songsong, LUAN Minghui, et al. Aluminum product surface defect detection method based on improved Faster RCNN[J]. Chinese journal of scientific instrument, 2021, 42(1): 191-198.
[8] WANG H, WANG J, LUO F. Research on surface defect detection of metal sheet and strip based on multi-level feature Faster R-CNN[J]. Mechanical science and technology for aerospace engineering, 2020, 20(4): 94-107.
[9] FANG Junting, TAN Xiaoyang, WANG Yuhui. ACRM: attention cascade R-CNN with mix-NMS for metallic surface defect detection[C]//2020 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 423-430.
[10] WANG Chenglong, XIE Heng. MeDERT: a metal surface defect detection model[J]. IEEE access, 2023, 11: 35469-35478.
[11] 刘浩翰, 孙铖, 贺怀清, 等. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(7): 1226-1235.
LIU Haohan, SUN Cheng, HE Huaiqing, et al. Metal surface defect detection based on improved YOLOv3[J]. Computer engineering & science, 2023, 45(7): 1226-1235.
[12] 凌强, 刘宇, 王春举, 等. DN-YOLOv5的金属双极板表面缺陷检测算法[J]. 哈尔滨工业大学学报, 2023, 55(12): 104-112.
LING Qiang, LIU Yu, WANG Chunju, et al. DN-YOLOv5 algorithm for detecting surface defects of metal bipolar plates[J]. Journal of Harbin Institute of Technology, 2023, 55(12): 104-112.
[13] ZHANG Heng, FU Wei, WANG Xiaoming, et al. An efficient model for metal surface defect detection based on attention mechanism and multi-scale feature[J]. The journal of supercomputing, 2024, 81(1): 40.
[14] ZHAO Yian, LYU Wenyu, XU Shangliang, et al. DETRs beat YOLOs on real-time object detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965-16974.
[15] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Computer Vision– ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
[16] 董适, 赵国瑞, 苟豪, 等. 基于改进RT-Detr的黄瓜果实选择性采摘识别方法[J]. 农业工程学报, 2025, 41(1): 212-220.
DONG Shi, ZHAO Guorui, GOU Hao, et al. Identifying cucumber fruits during selective picking using improved RT-Detr[J]. Transactions of the Chinese society of agricultural engineering, 2025, 41(1): 212-220.
[17] 陶健. 基于空洞卷积与空间注意力的遥感影像小目标检测方法[J]. 测绘与空间地理信息, 2024, 47(10): 104-107,111.
TAO Jian. Small target detection method in remote sensing images based on atrous convolution and spatial attention[J]. Geomatics & spatial information technology, 2024, 47(10): 104-107,111.
[18] CHEN Zixuan, HE Zewei, LU Zheming. DEA-net: single image dehazing based on detail-enhanced convolution and content-guided attention[J]. IEEE transactions on image processing, 2024, 33: 1002-1015.
[19] ZHANG Luping, LUO Junhai, HUANG Yian, et al. MDIGCNet: multidirectional information-guided contextual network for infrared small target detection[C]//IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. [S.l.]: IEEE, 2024: 2063-2076.
[20] YU Zitong, ZHAO Chenxu, WANG Zezheng, et al. Searching central difference convolutional networks for face anti-spoofing[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5294–5304.
[21] 李琼, 考月英, 张莹, 等. 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164.
LI Qiong, KAO Yueying, ZHANG Ying, et al. Review on object detection in UAV aerial images[J]. Journal of graphics, 2024, 45(6): 1145-1164.
[22] WILLIAMS T, LI R. Wavelet pooling for convolutional neural networks[C]//6th International conference on learning representations. Vancouver: [s. n. ], 2018.
[23] WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information[EB/OL]. (2024-02-29) [2025-01-02]. https://arxiv.org/abs/2402.13616.
[24] YAO Ting, PAN Yingwei, LI Yehao, et al. Wave-ViT: unifying wavelet and transformers for visual representation learning[M]//Computer Vision–ECCV 2022. Cham: Springer Nature Switzerland, 2022: 328-345.
[25] 吴铁钰, 杨光, 邹丽. RSG-YOLO: 用于检测道路坑洼的高效神经网络[J]. 计算机技术与发展, 2025, 35(2): 199-206.
WU Tieyu, YANG Guang, ZOU Li. RSG-YOLO: an efficient neural network for road pothole detection[J]. Computer technology and development, 2025, 35(2): 199-206.
[26] 孙己龙, 刘勇, 周黎伟, 等. 基于DCNv2和Transformer Decoder的隧道衬砌裂缝高效检测模型研究[J]. 图学学报, 2024, 45(5): 1050-1061.
SUN Jilong, LIU Yong, ZHOU Liwei, et al. Research on efficient detection model of tunnel lining crack based on DCNv2 and Transformer Decoder[J]. Journal of graphics, 2024, 45(5): 1050-1061.
[27] QI Yaolei, HE Yuting, QI Xiaoming, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 6047-6056.
[28] LI Dachong, LI Li, CHEN Zhuangzhuang, et al. ShiftwiseConv: small convolutional kernel with large kernel effect[EB/OL]. (2024-01-23) [2025-03-13]. https://arxiv.org/abs/2401.12736.
[29] XIA Zhuofan, PAN Xuran, SONG Shiji, et al. Vision transformer with deformable attention[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4784-4793.
[30] LIU Xinyu, PENG Houwen, ZHENG Ningxin, et al. EfficientViT: memory efficient vision transformer with cascaded group attention[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouve: IEEE, 2023: 14420-14430.
[31] JOCHER G, STOKEN A, BOROVEC J, et al. Ultralytic/yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise. ly and YouTube integrations[EB/OL]. (2021-10-12) [2025-01-02]. https://github.com/ultralytics/yolov5.
[32] LI Chuyi, LI Lulu, JIANG Hongliang, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07) [2025-01-02]. https://arxiv.org/abs/2209.02976.
[33] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464-7475.
[34] VARGHESE R, M S. YOLOv8: a novel object detection algorithm with enhanced performance and robustness[C]//2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. Chennai: IEEE, 2024: 1-6.
[35] WANG Aowang, CHEN Hui, LIU Lihao, et al. YOLOv10: real-time end-to-end object detection[EB/OL]. (2024-10-30)[2025-01-02]. https://arxiv.org/abs/2405.14458.
[36] KHANAM R, MUHAMMAD H. YOLOv11: An Overview of the Key Architectural Enhancements[EB/OL]. (2024-10-23) [2025-01-02]. https://arxiv.org/abs/2410.17725.
[37] TIAN Yunjie, YE Qixiang, DOERMAN D. YOLOv12: Attention-Centric Real-Time Object Detectors[EB/OL]. (2025-02-18) [2025-03-13]. https://arxiv.org/abs/2502.12524.
[38] CAI Zhaowei, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.
[39] LU Xin, LI Buyu, YUE Yuxin, et al. Grid R-CNN[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7355-7364.

相似文献/References:: [1]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
　ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[2]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
　DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[3]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
　MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[4]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
　LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[5]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[6]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
　WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[7]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
　HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11():835.[doi:10.11992/tis.201605022]
[8]宋婉茹,赵晴晴,陈昌红,等.行人重识别研究综述[J].智能系统学报,2017,12(6):770.[doi:10.11992/tis.201706084]
　SONG Wanru,ZHAO Qingqing,CHEN Changhong,et al.Survey on pedestrian re-identification research[J].CAAI Transactions on Intelligent Systems,2017,12():770.[doi:10.11992/tis.201706084]
[9]杨梦铎,栾咏红,刘文军,等.基于自编码器的特征迁移算法[J].智能系统学报,2017,12(6):894.[doi:10.11992/tis.201706037]
　YANG Mengduo,LUAN Yonghong,LIU Wenjun,et al.Feature transfer algorithm based on an auto-encoder[J].CAAI Transactions on Intelligent Systems,2017,12():894.[doi:10.11992/tis.201706037]
[10]王科俊,赵彦东,邢向磊.深度学习在无人驾驶汽车领域应用的研究进展[J].智能系统学报,2018,13(1):55.[doi:10.11992/tis.201609029]
　WANG Kejun,ZHAO Yandong,XING Xianglei.Deep learning in driverless vehicles[J].CAAI Transactions on Intelligent Systems,2018,13():55.[doi:10.11992/tis.201609029]

备注/Memo

收稿日期:2025-2-27。
基金项目:国家自然科学基金项目（62373151）；国家自然科学基金联合基金重点支持项目（U21A20486）；中央高校基本科研业务费专项资金项目（2023JC006）；河北省自然科学基金面上项目（F2023502010）.
作者简介:李冰，副教授，博士，主要研究方向为模式识别与计算机视觉。主持中央高校基金面上项目2项、主持横向科研项目5项。发表学术论文30余篇，获发明专利授权4项。E-mail：li_bing@ncepu.edu.cn。;王月，硕士研究生，主要研究方向为电力视觉及目标检测。E-mail：2011616203@qq.com。;翟永杰，教授，博士，主要研究方向为电力视觉。主持国家自然科学基金面上项目2项、河北省自然科学基金项目2项。编著教材1部，著作4部。发表学术论文30余篇。E-mail： zhaiyongjie@ncepu.edu.cn。
通讯作者:翟永杰. E-mail：zhaiyongjie@ncepu.edu.cn

更新日期/Last Update: 1900-01-01

改进RT-DETR的金属表面缺陷检测算法 PDF下载HTML

备注/Memo

改进RT-DETR的金属表面缺陷检测算法

PDF下载 HTML