<-上一篇/Previous Article 下一篇/Next Article->

[1]张建宇,谢娟英.ObjectBoxG：基于GC3模块的目标检测算法[J].智能系统学报,2024,19(6):1385-1394.[doi:10.11992/tis.202310025]
　ZHANG Jianyu,XIE Juanying.ObjectBoxG: object detection algorithm based on GC3 module[J].CAAI Transactions on Intelligent Systems,2024,19(6):1385-1394.[doi:10.11992/tis.202310025]

点击复制

ObjectBoxG：基于GC3模块的目标检测算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 19 期数: 2024年第6期页码: 1385-1394 栏目: 学术论文—机器学习出版日期: 2024-12-05

Title:: ObjectBoxG: object detection algorithm based on GC3 module

作者:: 张建宇, 谢娟英; 陕西师范大学计算机科学学院, 陕西西安, 710119

Author(s):: ZHANG Jianyu, XIE Juanying; School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

关键词:: 图卷积神经网络; 特征提取; 特征融合; 目标检测; 深度学习; 无锚框方法; 特征金字塔网络; Object-Box检测器; 多尺度特征; 全局特征

Keywords:: graph convolutional neural network; feature extraction; feature fusion; object detection; deep learning; anchor-freem ethods; feature pyram id network; Object-Box detector; multi-scale features; global features

分类号:: TP181

DOI:: 10.11992/tis.202310025

摘要:: 随着对目标检测任务研究的不断深入，以ObjectBox检测器为代表的无锚框方法引起了研究者们的关注。然而，ObjectBox检测器不能充分利用多尺度特征，也未充分考虑目标中心点与全局信息关联。为此，借助图卷积神经网络的节点相互影响原理，提出基于图谱方法的图卷积层模块GConv (graph convolution layer)，学习图像全局特征；融合模块GConv与C3 (cross stage partial network with 3 convolutions) 得到GC3 (graph C3 module)模块，进一步提取图像原始特征、细节特征以及全局特征；将GC3结合广义特征金字塔网络GFPN (generalized feature pyramid network)，提出图广义特征金字塔网络GGFPN (graph generalized feature pyramid network)，并嵌入ObjectBox算法，设计出ObjectBoxG算法。经典数据集的实验测试表明，提出的GC3模块比原C3模块具有更强特征提取能力；提出的GGFPN网络比GC3的特征学习能力更强；提出的ObjectBoxG算法具有优良的目标检测性能。

Abstract:: With the deepening development of the study on object detection tasks, anchor-free methods such as the ObjectBox detector have attracted the attention of researchers. However, the ObjectBox detector has its limitations: it does not fully utilize multiscale features or adequately consider the correlation between target center points and global information. A graph convolution layer module (GConv), which is based on the graph spectrum method, is proposed to learn global image features and address the aforementioned limitations. Additionally, a new module named GC3 combines the proposed GConv module with C3 (cross-stage partial network with 3 conversions) to further extract the original, fine, and global image features. GC3 is combined with the generalized feature pyramid network (GGFPN) to form the GGFPN. The GGFPN is then embedded into the ObjectBox detector, resulting in the ObjectBoxG algorithm. Experiments on benchmark datasets demonstrate that the proposed GC3 module has stronger feature extraction capability than the original C3 module, and the proposed GGFPN network offers superior feature learning capability to GC3. The ObjectBoxG algorithm demonstrates excellent performance in object detection.

参考文献/References:: [1] 张婷婷, 章坚武, 郭春生, 等. 基于深度学习的图像目标检测算法综述[J]. 电子学报, 2020, 36(7): 15.
ZHANG Tingting, ZHANG Jianwu, GUO Chunsheng, et al. Survey of object detection based on deep learning[J]. Acta electronica sinica, 2020, 36(7): 15.
[2] ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital signal processing, 2022, 126: 103514.
[3] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
[4] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 213-229.
[5] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[6] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2020-10-22]. http://arxiv.org/abs/2010.11929.
[7] HAN K, WANG Y, GUO J, et al. Vision Gnn: an image is worth graph of nodes[J]. Advances in neural information processing systems, 2022, 35: 8291-8303.
[8] TAN Mingxing, PANG Ruoming, LE Q V. EfficientDet: scalable and efficient object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
[9] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
[10] FU Chengyang, LIU Wei, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23) [2020-10-22]. http://arxiv.org/abs/1701.06659.
[11] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[12] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768.
[13] JIANG Yiqi, TAN Zhiyu, WANG Junyan, et al. Giraffedet: A heavy-neck paradigm for object detection[EB/OL]//(2022-02-09)[2022-12-12]. https://arxiv.org/abs/2202.04256.
[14] GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7029-7038.
[15] XIE Juanying, LIU Ran. The study progress of object detection algorithms based on deep learning[J]. Journal of Shaanxi Normal University (natural science edition), 2019, 47(5): 1-9.
[16] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[17] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[18] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[19] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016: 21-37.
[20] LI Zuoxin, YANG Lu, ZHOU Fuqiang. FSSD: feature fusion single shot multibox detector[EB/OL]. (2017-12-04) [2020-10-22]. http://arxiv.org/abs/1712.00960.
[21] LAW H, DENG Jia. CornerNet: detecting objects as paired keypoints[C]//European Conference on Computer Vision. Cham: Springer, 2018: 765-781.
[22] TIAN Zhi, SHEN Chunhua, CHEN Hao, et al. FCOS: fully convolutional one-stage object detection[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9626-9635.
[23] ZAND M, ETEMAD A, GREENSPAN M. ObjectBox: from centers to boxes for anchor-free object detection[C]//AVIDAN S, BROSTOW G, CISSé M, et al. European Conference on Computer Vision. Cham: Springer, 2022: 390-406.
[24] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2): 303-338.
[25] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2014: 740-755.
[26] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[EB/OL]. (2013-12-21) [2021-01-01]. http://arxiv.org/abs/1312.6203.
[27] MICHELI A. Neural network for graphs: a contextual constructive approach[J]. IEEE transactions on neural networks, 2009, 20(3): 498-511.
[28] HAMMOND D K, VANDERGHEYNST P, GRIBONVAL R. Wavelets on graphs via spectral graph theory[J]. Applied and computational harmonic analysis, 2011, 30(2): 129-150.
[29] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2016-09-09) [2021-01-01]. http://arxiv.org/abs/1609.02907.
[30] LI Qimai, HAN Zhichao, WU Xiaoming. Deeper insights into graph convolutional networks for semi-supervised learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018.
[31] OONO K, SUZUKI T. Graph neural networks exponentially lose expressive power for node classification[EB/OL]. (2019-05-27) [2021-01-01]. http://arxiv.org/abs/1905.10947.
[32] WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 1571-1580.
[33] ULTRALYTICS COMPANY. Yolov5. [EB/OL].[2021-01-01]. https://github.com/ultralytics/yolov5/.2021.
[34] YI Jingru, WU Pengxiang, METAXAS D N. ASSD: attentive single shot multibox detector[J]. Computer vision and image understanding, 2019, 189: 102827.
[35] LIU Songtao, HUANG Di, WANG Yunhong. Receptive field block net for accurate and fast object detection[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018: 404-419.
[36] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[37] REDMON J, FARHADI A. yolov3: an incremental improvement[EB/OL]. (2018-04-08) [2021-01-01]. http://arxiv.org/abs/1804.02767.
[38] DUAN Kaiwen, BAI Song, XIE Lingxi, et al. CenterNet: keypoint triplets for object detection[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6568-6577.
[39] KIM K, LEE H S. Probabilistic anchor assignment with IoU prediction for object detection[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 355-371.
[40] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464-7475.

相似文献/References:: [1]黄剑华,唐降龙,刘家锋,等.一种基于Homogeneity的文本检测新方法[J].智能系统学报,2007,2(1):69.
　HUANG Jian-hua,TANG Xiang-long,LIU Jia-feng,et al.A new method for text detection based on Homogeneity[J].CAAI Transactions on Intelligent Systems,2007,2():69.
[2]谭营,朱元春.反垃圾电子邮件方法研究进展[J].智能系统学报,2010,5(3):189.
　TAN Ying,ZHU Yuan-chun.Advances in antispam techniques[J].CAAI Transactions on Intelligent Systems,2010,5():189.
[3]王斐,张育中,宁廷会,等.脑-机接口研究进展[J].智能系统学报,2011,6(3):189.
　WANG Fei,ZHANG Yuzhong,NING Tinghui,et al.Research progress in a braincomputer interface[J].CAAI Transactions on Intelligent Systems,2011,6():189.
[4]刘琚,孙建德.独立分量分析的图像/视频分析与应用[J].智能系统学报,2011,6(6):495.
　LIU Ju,SUN Jiande.Independent component analysisbased image/video analysis and applications[J].CAAI Transactions on Intelligent Systems,2011,6():495.
[5]谭营,王军.手指静脉身份识别技术最新进展[J].智能系统学报,2011,6(6):471.
　TAN Ying,WANG Jun.Recent advances in finger vein based biometric techniques[J].CAAI Transactions on Intelligent Systems,2011,6():471.
[6]吴家伟,严京旗,方志宏,等.基于图像显著性特征的铸坯表面缺陷检测[J].智能系统学报,2012,7(1):75.
　WU Jiawei,YAN Jingqi,FANG Zhihong,et al.Defect detection on a steel slab surface based on the characteristics of an image’s saliency region[J].CAAI Transactions on Intelligent Systems,2012,7():75.
[7]张毅,罗明伟,罗元.脑电信号的小波变换和样本熵特征提取方法[J].智能系统学报,2012,7(4):339.
　ZHANG Yi,LUO Mingwei,LUO Yuan.EEG feature extraction method based on wavelet transform and sample entropy[J].CAAI Transactions on Intelligent Systems,2012,7():339.
[8]刘忠宝,王士同.从Parzen窗核密度估计到特征提取方法：新的研究视角[J].智能系统学报,2012,7(6):471.
　LIU Zhongbao,WANG Shitong.From Parzen window estimation to feature extraction: a new perspective[J].CAAI Transactions on Intelligent Systems,2012,7():471.
[9]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,8(3):189.
　SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,8():189.
[10]许可乐,唐涛,蒋咏梅.一种SAR图像稳健特征点提取方法[J].智能系统学报,2013,8(4):287.[doi:10.3969/j.issn.1673-4785.201304038]
　XU Kele,TANG Tao,JIANG Yongmei.A stable feature point extraction approach for SAR image registration[J].CAAI Transactions on Intelligent Systems,2013,8():287.[doi:10.3969/j.issn.1673-4785.201304038]
[11]刘威,王薪予,刘光伟,等.融合关系特征的半监督图像分类方法研究[J].智能系统学报,2022,17(5):886.[doi:10.11992/tis.202109022]
　LIU Wei,WANG Xinyu,LIU Guangwei,et al.Semi-supervised image classification method fused with relational features[J].CAAI Transactions on Intelligent Systems,2022,17():886.[doi:10.11992/tis.202109022]

备注/Memo

收稿日期:2023-10-19。
基金项目:国家自然科学基金项目（62076159，61673251，12031010）；中央高校基本科研业务费项目（GK202105003）.
作者简介:张建宇，硕士研究生，主要研究方向为深度学习、计算机视觉。E-mail：zhangjiany06@qq.com;谢娟英，教授，博士生导师，博士，主要研究方向为机器学习、数据挖掘、生物医学大数据分析。发表学术论文百余篇。E-mail：xiejuany@snnu.edu.cn。
通讯作者:谢娟英. E-mail：xiejuany@snnu.edu.cn

更新日期/Last Update: 2024-11-05

ObjectBoxG：基于GC3模块的目标检测算法 PDF下载HTML

备注/Memo

ObjectBoxG：基于GC3模块的目标检测算法

PDF下载 HTML