<-Previous Article Next Article->

[1]WU Jun,DONG Jiaming,LIU Xin,et al.Lightweight object detection network and its application based on the attention optimization[J].CAAI Transactions on Intelligent Systems,2023,18(3):506-516.[doi:10.11992/tis.202206014]

Copy

Lightweight object detection network and its application based on the attention optimization

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 18 Number of periods: 2023 3 Page number: 506-516 Column: 学术论文—机器感知与模式识别 Public date: 2023-07-05

Title:: Lightweight object detection network and its application based on the attention optimization

Author(s):: WU Jun¹; 2; DONG Jiaming¹; LIU Xin¹; WANG Chunzhi¹; 1. School of Computer Science, Hubei University of Technology, Wuhan 430068, China;
2. School of Materials Science and Engineering, Wuhan University of Technology, Wuhan 430070, China

Keywords:: object detection; deep learning; computer vision; lightweight network; coordinate attention; squeeze-and-excitation; one-stage object detection network; loss function

CLC:: TP18

DOI:: 10.11992/tis.202206014

Abstract:: Taking the lightweight improved YOLO network as the main target, the new lightweight network models YOLOv5s-CCA (YOLOv5s-C3-coordinate attention) and YOLOv5s-CSE (YOLOv5s-C3-squeeze-and-excitation) are put forward in this paper by selecting the representative SE (squeeze-and-excitation) channel attention module and relatively novel CA (coordinate attention) spatial attention module to fuse with YOLOv5s object detection network. By further exploration, the strategy for the optimal insertion position of the SE and CA attention modules in YOLOv5s object detection network is demonstrated. The experiment proves that CA is superior to SE attention module in the lightweight network model. The YOLOv5s-CCA network model proposed in this paper realizes the goal of network lightweight in both PASCAL VOC 2012 and Global Wheat 2020 data sets, and its accuracy is improved compared with the original network. It is confirmed that YOLOv5s-CCA has certain universality and generalization, which provides reliable data support and certain reference value for its lightweight deployment in actual production and life.

References:: [1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580?587.
[2] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904–1916.
[3] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2016: 1440?1448.
[4] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137–1149.
[5] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936?944.
[6] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980?2988.
[7] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779?788.
[8] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020?04?23)[2022?06?08].https://arxiv.org/abs/2004.10934.
[9] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517?6525.
[10] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018?04?08) [2022?06?08].https://arxiv.org/abs/1804.02767.
[11] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: scaling cross stage partial network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13024?13033.
[12] 陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32(4): 1201–1227
CHEN Keqi, ZHU Zhiliang, DENG Xiaoming, et al. Deep learning for multi-scale object detection: a survey[J]. Journal of software, 2021, 32(4): 1201–1227
[13] 毛莺池, 唐江红, 王静, 等. 基于Faster R-CNN的多任务增强裂缝图像检测方法[J]. 智能系统学报, 2021, 16(2): 286–293
MAO Yingchi, TANG Jianghong, WANG Jing, et al. Multi-task enhanced dam crack image detection based on Faster R-CNN[J]. CAAI transactions on intelligent systems, 2021, 16(2): 286–293
[14] 邵江南, 葛洪伟. 一种基于深度学习目标检测的长时目标跟踪算法[J]. 智能系统学报, 2021, 16(3): 433–441
SHAO Jiangnan, GE Hongwei. A long-term object tracking algorithm based on deep learning and object detection[J]. CAAI transactions on intelligent systems, 2021, 16(3): 433–441
[15] 赵文清, 杨盼盼. 双向特征融合与注意力机制结合的目标检测[J]. 智能系统学报, 2021, 16(6): 1098–1105
ZHAO Wenqing, YANG Panpan. Target detection based on bidirectional feature fusion and an attention mechanism[J]. CAAI transactions on intelligent systems, 2021, 16(6): 1098–1105
[16] 田永林, 王雨桐, 王建功, 等. 视觉Transformer研究的关键问题: 现状及展望[J]. 自动化学报, 2022, 48(4): 957–979
TIAN Yonglin, WANG Yutong, WANG Jiangong, et al. Key problems and progress of vision transformers: the state of the art and prospects[J]. Acta automatica sinica, 2022, 48(4): 957–979
[17] 郭璠, 张泳祥, 唐琎, 等. YOLOv3-A: 基于注意力机制的交通标志检测网络[J]. 通信学报, 2021, 42(1): 87–99
GUO Fan, ZHANG Yongxiang, TANG Jin, et al. YOLOv3-A: a traffic sign detection network based on attention mechanism[J]. Journal on communications, 2021, 42(1): 87–99
[18] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. (2015?02?05) [2022?06?08].https://arxiv.org/abs/1506.02025.
[19] HU Jie, SHEN Li, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 42(8): 2011–2023.
[20] ZHAO Bo, WU Xiao, FENG Jiashi, et al. Diversified visual attention networks for fine-grained object classification[J]. IEEE transactions on multimedia, 2017, 19(6): 1245–1256.
[21] VOLODYMYR M, NICOLAS H, ALEX G, et al. Recurrent models of visual attention[EB/OL]. (2014?06?24) [2022?06?08].https://arxiv.org/abs/1406.6247v1.
[22] WU Jun, ZHU Jiahui, TONG Xin, et al. Dynamic activation and enhanced image contour features for object detection[J]. Connection Science, 2022, 12: 1–21.
[23] WANG Fei, JIANG Mengqing, QIAN Chen, et al. Residual attention network for image classification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6450?6458.
[24] HOU Qibin, ZHOU Daquan, FENG Jiashi. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13708?13717.
[25] HAMI D R, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020: 658?666.
[26] ZHENG Zhaohui, WANG Ping, LIU Wei, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(7): 12993–13000.
[27] ZAPOTOCZNY P. Discrimination of wheat grain varieties using image analysis and neural networks. Part I. Single kernel texture[J]. Journal of cereal science, 2011, 54(1): 60–68.
[28] 万鹏, 孙瑜, 孙永海. 基于计算机视觉的大米粒形识别方法[J]. 吉林大学学报(工学版), 2008, 38(2): 489–492
WAN Peng, SUN Yu, SUN Yonghai. Recognition method of rice kernel shape based on computer vision[J]. Journal of Jilin university (engineering and technology edition), 2008, 38(2): 489–492
[29] WANG Qijin, ZHANG Shengyu, DONG Shifeng, et al. Pest24: a large-scale very small object data set of agricultural pests for multi-target detection[J]. Computers and electronics in agriculture, 2020, 175: 105585.
[30] 刘浏. 基于深度学习的农作物害虫检测方法研究与应用[D]. 合肥: 中国科学技术大学, 2020.
LIU Liu. Research and applications on agricultural crop pest detection techniques based on deep learning[D]. Hefei: University of Science and Technology of China, 2020.
[31] TIAN Yunong, YANG Guodong, WANG Zhe, et al. Apple detection during different growth stages in orchards using the improved YOLO-V3 model[J]. Computers and electronics in agriculture, 2019, 157: 417–426.
[32] 武星, 齐泽宇, 王龙军, 等. 基于轻量化YOLOv3卷积神经网络的苹果检测方法[J]. 农业机械学报, 2020, 51(8): 17–25
WU Xing, QI Zeyu, WANG Longjun, et al. Apple detection method based on light-YOLOv3 convolutional neural network[J]. Transactions of the Chinese society for agricultural machinery, 2020, 51(8): 17–25
[33] DAVID E, MADEC S, SADEGHI-TEHRAN P, et al. Global wheat head detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods[J]. Plant phenomics, 2020: 3521852.

Similar References:

Memo

Last Update: 1900-01-01

Lightweight object detection network and its application based on the attention optimization PDF DownloadHTML

Memo

Lightweight object detection network and its application based on the attention optimization

PDF Download HTML