<-上一篇/Previous Article 下一篇/Next Article->

[1]吴珺,董佳明,刘欣,等.注意力优化的轻量目标检测网络及应用[J].智能系统学报,2023,18(3):506-516.[doi:10.11992/tis.202206014]
　WU Jun,DONG Jiaming,LIU Xin,et al.Lightweight object detection network and its application based on the attention optimization[J].CAAI Transactions on Intelligent Systems,2023,18(3):506-516.[doi:10.11992/tis.202206014]

点击复制

注意力优化的轻量目标检测网络及应用

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第3期页码: 506-516 栏目: 学术论文—机器感知与模式识别出版日期: 2023-07-05

Title:: Lightweight object detection network and its application based on the attention optimization

作者:: 吴珺^1,2, 董佳明¹, 刘欣¹, 王春枝¹; 1. 湖北工业大学计算机学院, 湖北武汉 430068;
2. 武汉理工大学材料科学与工程学院, 湖北武汉 430070

Author(s):: WU Jun^1,2, DONG Jiaming¹, LIU Xin¹, WANG Chunzhi¹; 1. School of Computer Science, Hubei University of Technology, Wuhan 430068, China;
2. School of Materials Science and Engineering, Wuhan University of Technology, Wuhan 430070, China

关键词:: 目标检测; 深度学习; 计算机视觉; 轻量化网络; 空间注意力; 通道注意力; 一阶目标检测网络; 损失函数

Keywords:: object detection; deep learning; computer vision; lightweight network; coordinate attention; squeeze-and-excitation; one-stage object detection network; loss function

分类号:: TP18

DOI:: 10.11992/tis.202206014

摘要:: 本文以轻量化改进YOLO网络为主要目标，选取具有代表性的(squeeze and excitation, SE)通道注意力模块和比较新颖的(coordinate attention, CA)空间注意力模块与YOLOv5s目标检测网络进行融合，提出新的轻量网络模型YOLOv5s-CCA (YOLOv5s-C3-coordinate attention)和YOLOv5s-CSE(YOLOv5s-C3-squeeze-and-excitation)。通过进一步探索，论证出SE和CA注意力模块在YOLOv5s目标检测网络中最优插入位置的策略，实验论证了在轻量化网络模型中CA优于SE注意力模块。本文所提出的YOLOv5s-CCA网络模型在PASCAL VOC 2012数据集和Global Wheat 2020数据集中实现了网络轻量化并且精度较原始网络有所提升；并证实了YOLOv5s-CCA具有一定的通用性和泛化性，为其在实际生产与生活中进行轻量化部署提供了可靠的数据支撑和一定参考价值。

Abstract:: Taking the lightweight improved YOLO network as the main target, the new lightweight network models YOLOv5s-CCA (YOLOv5s-C3-coordinate attention) and YOLOv5s-CSE (YOLOv5s-C3-squeeze-and-excitation) are put forward in this paper by selecting the representative SE (squeeze-and-excitation) channel attention module and relatively novel CA (coordinate attention) spatial attention module to fuse with YOLOv5s object detection network. By further exploration, the strategy for the optimal insertion position of the SE and CA attention modules in YOLOv5s object detection network is demonstrated. The experiment proves that CA is superior to SE attention module in the lightweight network model. The YOLOv5s-CCA network model proposed in this paper realizes the goal of network lightweight in both PASCAL VOC 2012 and Global Wheat 2020 data sets, and its accuracy is improved compared with the original network. It is confirmed that YOLOv5s-CCA has certain universality and generalization, which provides reliable data support and certain reference value for its lightweight deployment in actual production and life.

参考文献/References:: [1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580?587.
[2] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904–1916.
[3] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2016: 1440?1448.
[4] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137–1149.
[5] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936?944.
[6] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980?2988.
[7] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779?788.
[8] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020?04?23)[2022?06?08].https://arxiv.org/abs/2004.10934.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517?6525.
[10] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018?04?08) [2022?06?08].https://arxiv.org/abs/1804.02767.
[11] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: scaling cross stage partial network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13024?13033.
[12] 陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32(4): 1201–1227
CHEN Keqi, ZHU Zhiliang, DENG Xiaoming, et al. Deep learning for multi-scale object detection: a survey[J]. Journal of software, 2021, 32(4): 1201–1227
[13] 毛莺池, 唐江红, 王静, 等. 基于Faster R-CNN的多任务增强裂缝图像检测方法[J]. 智能系统学报, 2021, 16(2): 286–293
MAO Yingchi, TANG Jianghong, WANG Jing, et al. Multi-task enhanced dam crack image detection based on Faster R-CNN[J]. CAAI transactions on intelligent systems, 2021, 16(2): 286–293
[14] 邵江南, 葛洪伟. 一种基于深度学习目标检测的长时目标跟踪算法[J]. 智能系统学报, 2021, 16(3): 433–441
SHAO Jiangnan, GE Hongwei. A long-term object tracking algorithm based on deep learning and object detection[J]. CAAI transactions on intelligent systems, 2021, 16(3): 433–441
[15] 赵文清, 杨盼盼. 双向特征融合与注意力机制结合的目标检测[J]. 智能系统学报, 2021, 16(6): 1098–1105
ZHAO Wenqing, YANG Panpan. Target detection based on bidirectional feature fusion and an attention mechanism[J]. CAAI transactions on intelligent systems, 2021, 16(6): 1098–1105
[16] 田永林, 王雨桐, 王建功, 等. 视觉Transformer研究的关键问题: 现状及展望[J]. 自动化学报, 2022, 48(4): 957–979
TIAN Yonglin, WANG Yutong, WANG Jiangong, et al. Key problems and progress of vision transformers: the state of the art and prospects[J]. Acta automatica sinica, 2022, 48(4): 957–979
[17] 郭璠, 张泳祥, 唐琎, 等. YOLOv3-A: 基于注意力机制的交通标志检测网络[J]. 通信学报, 2021, 42(1): 87–99
GUO Fan, ZHANG Yongxiang, TANG Jin, et al. YOLOv3-A: a traffic sign detection network based on attention mechanism[J]. Journal on communications, 2021, 42(1): 87–99
[18] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. (2015?02?05) [2022?06?08].https://arxiv.org/abs/1506.02025.
[19] HU Jie, SHEN Li, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42(8): 2011–2023.
[20] ZHAO Bo, WU Xiao, FENG Jiashi, et al. Diversified visual attention networks for fine-grained object classification[J]. IEEE transactions on multimedia, 2017, 19(6): 1245–1256.
[21] VOLODYMYR M, NICOLAS H, ALEX G, et al. Recurrent models of visual attention[EB/OL]. (2014?06?24) [2022?06?08].https://arxiv.org/abs/1406.6247v1.
[22] WU Jun, ZHU Jiahui, TONG Xin, et al. Dynamic activation and enhanced image contour features for object detection[J]. Connection Science, 2022, 12: 1–21.
[23] WANG Fei, JIANG Mengqing, QIAN Chen, et al. Residual attention network for image classification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6450?6458.
[24] HOU Qibin, ZHOU Daquan, FENG Jiashi. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708?13717.
[25] HAMI D R, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020: 658?666.
[26] ZHENG Zhaohui, WANG Ping, LIU Wei, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(7): 12993–13000.
[27] ZAPOTOCZNY P. Discrimination of wheat grain varieties using image analysis and neural networks. Part I. Single kernel texture[J]. Journal of cereal science, 2011, 54(1): 60–68.
[28] 万鹏, 孙瑜, 孙永海. 基于计算机视觉的大米粒形识别方法[J]. 吉林大学学报(工学版), 2008, 38(2): 489–492
WAN Peng, SUN Yu, SUN Yonghai. Recognition method of rice kernel shape based on computer vision[J]. Journal of Jilin university (engineering and technology edition), 2008, 38(2): 489–492
[29] WANG Qijin, ZHANG Shengyu, DONG Shifeng, et al. Pest24: a large-scale very small object data set of agricultural pests for multi-target detection[J]. Computers and electronics in agriculture, 2020, 175: 105585.
[30] 刘浏. 基于深度学习的农作物害虫检测方法研究与应用[D]. 合肥: 中国科学技术大学, 2020.
LIU Liu. Research and applications on agricultural crop pest detection techniques based on deep learning[D]. Hefei: University of Science and Technology of China, 2020.
[31] TIAN Yunong, YANG Guodong, WANG Zhe, et al. Apple detection during different growth stages in orchards using the improved YOLO-V3 model[J]. Computers and electronics in agriculture, 2019, 157: 417–426.
[32] 武星, 齐泽宇, 王龙军, 等. 基于轻量化YOLOv3卷积神经网络的苹果检测方法[J]. 农业机械学报, 2020, 51(8): 17–25
WU Xing, QI Zeyu, WANG Longjun, et al. Apple detection method based on light-YOLOv3 convolutional neural network[J]. Transactions of the Chinese society for agricultural machinery, 2020, 51(8): 17–25
[33] DAVID E, MADEC S, SADEGHI-TEHRAN P, et al. Global wheat head detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods[J]. Plant phenomics, 2020: 3521852.

相似文献/References:: [1]胡光龙,秦世引.动态成像条件下基于SURF和Mean shift的运动目标高精度检测[J].智能系统学报,2012,7(1):61.
　HU Guanglong,QIN Shiyin.High precision detection of a mobile object under dynamic imaging based on SURF and Mean shift[J].CAAI Transactions on Intelligent Systems,2012,7():61.
[2]韩峥,刘华平,黄文炳,等.基于Kinect的机械臂目标抓取[J].智能系统学报,2013,8(2):149.[doi:10.3969/j.issn.1673-4785.201212038]
　HAN Zheng,LIU Huaping,HUANG Wenbing,et al.Kinect-based object grasping by manipulator[J].CAAI Transactions on Intelligent Systems,2013,8():149.[doi:10.3969/j.issn.1673-4785.201212038]
[3]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
　ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[4]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
　DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[5]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
　HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10():769.[doi:10.11992/tis.201410010]
[6]曾宪华,易荣辉,何姗姗.流形排序的交互式图像分割[J].智能系统学报,2016,11(1):117.[doi:10.11992/tis.201505037]
　ZENG Xianhua,YI Ronghui,HE Shanshan.Interactive image segmentation based on manifold ranking[J].CAAI Transactions on Intelligent Systems,2016,11():117.[doi:10.11992/tis.201505037]
[7]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
　MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[8]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
　LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[9]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[10]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
　WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[11]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(3):366.[doi:10.11992/tis.201706040]
　GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13():366.[doi:10.11992/tis.201706040]
[12]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(6):967.[doi:10.11992/tis.201801025]
　MO Hongwei,WANG Haibo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13():967.[doi:10.11992/tis.201801025]
[13]刘召,张黎明,耿美晓,等.基于改进的Faster R-CNN高压线缆目标检测方法[J].智能系统学报,2019,14(4):627.[doi:10.11992/tis.201905026]
　LIU Zhao,ZHANG Liming,GENG Meixiao,et al.Object detection of high-voltage cable based on improved Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2019,14():627.[doi:10.11992/tis.201905026]
[14]单义,杨金福,武随烁,等.基于跳跃连接金字塔模型的小目标检测[J].智能系统学报,2019,14(6):1144.[doi:10.11992/tis.201905041]
　SHAN Yi,YANG Jinfu,WU Suishuo,et al.Skip feature pyramid network with a global receptive field for small object detection[J].CAAI Transactions on Intelligent Systems,2019,14():1144.[doi:10.11992/tis.201905041]
[15]赵振兵,江爱雪,戚银城,等.嵌入遮挡关系模块的SSD模型的输电线路图像金具检测[J].智能系统学报,2020,15(4):656.[doi:10.11992/tis.202001008]
　ZHAO Zhenbing,JIANG Aixue,QI Yincheng,et al.Fittings detection in transmission line images with SSD model embedded occlusion relation module[J].CAAI Transactions on Intelligent Systems,2020,15():656.[doi:10.11992/tis.202001008]
[16]张新钰,邹镇洪,李志伟,等.面向自动驾驶目标检测的深度多模态融合技术[J].智能系统学报,2020,15(4):758.[doi:10.11992/tis.202002010]
　ZHANG Xinyu,ZOU Zhenhong,LI Zhiwei,et al.Deep multi-modal fusion in object detection for autonomous driving[J].CAAI Transactions on Intelligent Systems,2020,15():758.[doi:10.11992/tis.202002010]
[17]王照国,张红云,苗夺谦.基于F1值的非极大值抑制阈值自动选取方法[J].智能系统学报,2020,15(5):1006.[doi:10.11992/tis.202006056]
　WANG Zhaoguo,ZHANG Hongyun,MIAO Duoqian.Automatic selection method of non-maximum suppression threshold based on F1 score[J].CAAI Transactions on Intelligent Systems,2020,15():1006.[doi:10.11992/tis.202006056]
[18]伍锡如,凌星雨.基于改进的Faster RCNN面部表情检测算法[J].智能系统学报,2021,16(2):210.[doi:10.11992/tis.201910020]
　WU Xiru,LING Xingyu.Facial expression recognition based on improved Faster RCNN[J].CAAI Transactions on Intelligent Systems,2021,16():210.[doi:10.11992/tis.201910020]
[19]翟永杰,杨旭,赵振兵,等.融合共现推理的Faster R-CNN输电线路金具检测[J].智能系统学报,2021,16(2):237.[doi:10.11992/tis.202012023]
　ZHAI Yongjie,YANG Xu,ZHAO Zhenbing,et al.Integrating co-occurrence reasoning for Faster R-CNN transmission line fitting detection[J].CAAI Transactions on Intelligent Systems,2021,16():237.[doi:10.11992/tis.202012023]
[20]洪恺临,曹江涛,姬晓飞.改进Center-Net网络的自主喷涂机器人室内窗户检测[J].智能系统学报,2021,16(3):425.[doi:10.11992/tis.202005016]
　HONG Kailin,CAO Jiangtao,JI Xiaofei.Indoor window detection of autonomous spraying robot based on improved CenterNet network[J].CAAI Transactions on Intelligent Systems,2021,16():425.[doi:10.11992/tis.202005016]

备注/Memo

收稿日期:2022-06-08。
基金项目:国家自然科学基金项目(61602161, 61772180)；湖北省重点研发项目(2020BAB01)；湖北工业大学研究生基金项目（2021046）.
作者简介:吴珺,副教授,博士,主要研究方向为深度学习及多模态数据分析、大数据分析及应用、智能方法优化。主持国家自然科学基金及湖北省自然科学基金;参与研发各类省部级项目5项,并发表学术论文16篇;董佳明,硕士研究生,主要研究方向为目标检测、大数据技术;刘欣,硕士研究生,主要研究方向为目标检测、智能方法
通讯作者:吴珺.E-mail:wujun@whut.edu.cn

更新日期/Last Update: 1900-01-01

注意力优化的轻量目标检测网络及应用 PDF下载HTML

备注/Memo

注意力优化的轻量目标检测网络及应用

PDF下载 HTML