<-上一篇/Previous Article 下一篇/Next Article->

[1]鲁斌,孙洋,杨振宇.融合体素图注意力的三维目标检测算法[J].智能系统学报,2024,19(3):598-609.[doi:10.11992/tis.202209008]
　LU Bin,SUN Yang,YANG Zhenyu.3D object detection algorithm with voxel graph attention[J].CAAI Transactions on Intelligent Systems,2024,19(3):598-609.[doi:10.11992/tis.202209008]

点击复制

融合体素图注意力的三维目标检测算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 19 期数: 2024年第3期页码: 598-609 栏目: 学术论文—机器感知与模式识别出版日期: 2024-05-05

Title:: 3D object detection algorithm with voxel graph attention

作者:: 鲁斌^1,2, 孙洋^1,2, 杨振宇^1,2; 1. 华北电力大学控制与计算机工程学院, 河北保定 071003;
2. 复杂能源系统智能计算教育部工程研究中心, 河北保定 071003

Author(s):: LU Bin^1,2, SUN Yang^1,2, YANG Zhenyu^1,2; 1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Engineering Research Center of Intelligent Computing for Complex Energy Systems, Ministry of Education, Baoding 071003, China

关键词:: 点云; 三维目标检测; 图注意力; 特征插值; 多尺度特征; 激光雷达; 体素化; 车辆检测

Keywords:: point cloud; 3D object detection; graph attention; feature interpolation; multiscale features; LiDAR; voxelization; car detection

分类号:: TP391.4

DOI:: 10.11992/tis.202209008

文献标志码:: 2023-09-14

摘要:: 目前基于点云的三维目标检测方法未能充分利用点云局部几何特征，导致对点云稀疏的目标检测效果不佳。为此，本文提出基于原始点云体素图注意力的两阶段三维目标检测算法(voxel graph attention region-CNN, VGT-RCNN)。通过多尺度体素特征插值计算网格中心点特征；在多尺度非空体素特征上构造局部图；通过图注意力机制对体素特征进行加权平均，充分提取并利用目标的局部几何特征完成检测。该算法主要针对当前二阶段算法在进行特征聚合时对不同体素特征的重要性考虑不足进行改进，引入可学习的权重矩阵，动态学习体素特性的权重，提高局部特征表达能力。本文在流行的KITTI自动驾驶数据集上进行了充分测试，取得了具有竞争力的检测效果，尤其是在对点云稀疏的汽车目标检测上，准确率有较大提高。本文还对检测效果进行了可视化分析。

Abstract:: Current point cloud-based 3D object detection methods fail to fully use the local geometric features of the point clouds, leading to poor performance in detecting objects of sparse point clouds. To solve this problem, a two-stage 3D object detection algorithm named voxel graph attention region-CNN (VGT-RCNN) is proposed based on the voxel graph attention of raw point clouds. Initially, the grid center point features are calculated by multiscale voxel feature interpolation. Then, a local graph is constructed on the multiscale non-empty voxel features. Finally, a weighted average is conducted for the voxel features by graph attention mechanism, fully extracting and using the local geometric features of the object to complete detection. The algorithm mainly improves the defect of the present two-stage algorithm, which does not sufficiently consider the significance of different voxel features in feature clustering. In addition, a learnable weight matrix is introduced to dynamically learn the weight of the voxel feature and increase the expression ability of local features. The algorithm has been sufficiently tested on the popular KITTI autonomous driving dataset, obtaining competitive detection effects. The accuracy of cars with sparse point clouds has been markedly improved. A visualized analysis is also carried out to determine the detection effect.

参考文献/References:: [1] 郭毅锋, 吴帝浩, 魏青民. 基于深度学习的点云三维目标检测方法综述[J]. 计算机应用研究, 2023, 40(1): 20–27
GUO Yifeng, WU Dihao, WEI Qingmin. Overview of single-sensor and multi-sensor point cloud 3D target detection methods[J]. Application research of computers, 2023, 40(1): 20–27
[2] CHARLES R Q, HAO Su, MO Kaichun, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 77–85.
[3] CHARLES R Q, YI Li, SU Hao, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 5105–5114.
[4] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4490–4499.
[5] SHI Shaoshuai, WANG Xiaogang, LI Hongsheng. PointRCNN: 3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 770–779.
[6] YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11037–11045.
[7] HE Chenhang, ZENG Hui, HUANG Jianqiang, et al. Structure aware single-stage 3D object detection from point cloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11870–11879.
[8] ZHENG Wu, TANG Weiliang, JIANG Li, et al. SE-SSD: self-ensembling single-stage object detector from point cloud[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 14489–4498.
[9] XU Qiangeng, ZHONG Yiqi, NEUMANN U. Behind the curtain: learning occluded shapes for 3D object detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2022, 36(3): 2893–2901.
[10] SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10526–10535.
[11] DENG Jiajun, SHI Shaoshuai, LI Peiwei, et al. Voxel R-CNN: towards high performance voxel-based 3D object detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(2): 1201–1209.
[12] MAO Jiageng, NIU Minzhe, BAI Haoyue, et al. Pyramid R-CNN: towards better performance and adaptability for 3D object detection[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 2703–2712.
[13] SHENGA Hualian, CAI Sijia, LIU Yuan, et al. Improving 3D object detection with channel-wise transformer[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 2723–2732.
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017–06–12)[2022–09–06]. http://arxiv.org/abs/1706.03762.
[15] SHI Weijing, RAJKUMAR R. Point-GNN: graph neural network for 3D object detection in a point cloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1708-1716.
[16] 王亚东, 田永林, 李国强, 等. 基于卷积神经网络的三维目标检测研究综述[J]. 模式识别与人工智能, 2021, 34(12): 1103–1119
WANG Yadong, TIAN Yonglin, LI Guoqiang, et al. 3D object detection based on convolutional neural networks: a survey[J]. Pattern recognition and artificial intelligence, 2021, 34(12): 1103–1119
[17] ZHANG Yifan, HU Qingyong, XU Guoquan, et al. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds[EB/OL]. (2022–03–21) [2022–09–06]. http://arxiv.org/abs/2203.11139.
[18] PAN Xuran, XIA Zhuofan, SONG Shiji, et al. 3D object detection with pointformer[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7459–7468.
[19] YAN Yan, MAO Yuxing, LI Bo. SECOND: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[20] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12689–12697.
[21] ZHENG Wu, TANG Weiliang, CHEN Sijin, et al. CIA-SSD: confident IoU-aware single-stage object detector from point cloud[J]. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(4): 3555–3562.
[22] KUANG Hongwu, WANG Bei, AN Jianping, et al. Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds[J]. Sensors, 2020, 20(3): 704.
[23] 李文举, 储王慧, 崔柳, 等. 结合图采样和图注意力的3D目标检测方法[J]. 计算机工程与应用, 2023, 59(9): 237–244
LI Wenju, CHU Wanghui, CUI Liu, et al. 3D object detection method combining on graph sampling and graph attention[J]. Computer engineering and applications, 2023, 59(9): 237–244
[24] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia: JMLR Workshop and Conference Proceedings, 2010: 249–256.
[25] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999–3007.
[26] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354–3361.
[27] CHEN Xiaozhi, KUNDU K, ZHU Yukun, et al. 3D object proposals for accurate object class detection[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1. Montreal: ACM, 2015: 424–432.
[28] CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6526–6534.
[29] KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid: ACM, 2018: 1–8.
[30] QI C R, LIU Wei, WU Chenxia, et al. Frustum PointNets for 3D object detection from RGB-D data[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 918–927.
[31] LIANG Ming, YANG Bin, WANG Shenlong, et al. Deep continuous fusion for multi-sensor 3D object detection[C]//European Conference on Computer Vision. Cham: Springer, 2018: 663–678.
[32] ZHAO Xin, LIU Zhe, HU Ruolan, et al. 3D object detection using scale invariant and feature reweighting networks[J]. Proceedings of the AAAI conference on artificial intelligence, 2019, 33(1): 9267–9274.
[33] LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7337–7345.
[34] YOO J H, KIM Y, KIM J, et al. 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection[C]//European Conference on Computer Vision. Cham: Springer, 2020: 720–736.
[35] YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951–1960.
[36] LEHNER J, MITTERECKER A, ADLER T, et al. Patch refinement: localized 3D object detection[EB/OL]. (2019–10–09)[2022–09–06]. http://arxiv.org/abs/1910.04093.

相似文献/References:: [1]张金艺,梁滨,唐笛恺,等.粗匹配和局部尺度压缩搜索下的快速ICP-SLAM[J].智能系统学报,2017,12(3):413.[doi:10.11992/tis.201605029]
　ZHANG Jinyi,LIANG Bin,TANG Dikai,et al.Fast ICP-SLAM with rough alignment and local scale-compressed searching[J].CAAI Transactions on Intelligent Systems,2017,12():413.[doi:10.11992/tis.201605029]
[2]鲁斌,杨振宇,孙洋,等.基于多通道交叉注意力融合的三维目标检测算法[J].智能系统学报,2024,19(4):885.[doi:10.11992/tis.202305029]
　LU Bin,YANG Zhenyu,SUN Yang,et al.3D object detection algorithm with multi-channel cross attention fusion[J].CAAI Transactions on Intelligent Systems,2024,19():885.[doi:10.11992/tis.202305029]
[3]陆军,鲁林超,翟晓阳,等.面向道路交通场景的高效3D目标检测[J].智能系统学报,2025,20(1):91.[doi:10.11992/tis.202311013]
　LU Jun,LU Linchao,ZHAI Xiaoyang,et al.High-efficiency 3D object detection for road traffic scenes[J].CAAI Transactions on Intelligent Systems,2025,20():91.[doi:10.11992/tis.202311013]

备注/Memo

收稿日期:2022-09-06。
基金项目:国家自然科学基金项目（62371188）；河北省在读研究生创新能力培养项目（CXZZBS2023153）.
作者简介:鲁斌，教授，博士生导师，博士，CCF高级会员，主要研究方向为智能计算与计算机视觉，综合能源系统与大数据分析。主持、参与国家、省部级科技项目7项，主持企事业委托项目18项，作为第一完成人获全国商业科技进步二等奖1项，作为校内第一完成人获河北省科技进步奖3项、市级科技进步奖4项，获专利授权10项，发表学术论文68篇，出版专著3部。E-mail: lubin@ncepu.edu.cn;孙洋，博士研究生，主要研究方向为机器学习、计算机视觉。E-mail: bless2016@163.com;杨振宇，博士研究生，主要研究方向为机器学习、计算机视觉。E-mail: yangzhenyu536@163.com
通讯作者:鲁斌. E-mail：lubin@ncepu.edu.cn

更新日期/Last Update: 1900-01-01

融合体素图注意力的三维目标检测算法 PDF下载HTML

备注/Memo

融合体素图注意力的三维目标检测算法

PDF下载 HTML