<-Previous Article Next Article->

[1]LU Bin,YANG Zhenyu,SUN Yang,et al.3D object detection algorithm with multi-channel cross attention fusion[J].CAAI Transactions on Intelligent Systems,2024,19(4):885-897.[doi:10.11992/tis.202305029]

Copy

3D object detection algorithm with multi-channel cross attention fusion

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 19 Number of periods: 2024 4 Page number: 885-897 Column: 学术论文—机器感知与模式识别 Public date: 2024-07-05

Title:: 3D object detection algorithm with multi-channel cross attention fusion

Author(s):: LU Bin¹; 2; YANG Zhenyu¹; 2; SUN Yang¹; 2; LIU Yawei¹; 2; WANG Minghan¹; 2; 1. School of Control and Compute Engineering, North China Electric Power University, Baoding 071000 China;
2. Hebei Key Laboratory of Knowledge Computing for Energy & Power, North China Electric Power University, Baoding 071000, China

Keywords:: 3D point cloud; autonomous driving; LiDAR; deep learning; 3D object detection; pillar; cross attention; single-stage algorithm

CLC:: TP391

DOI:: 10.11992/tis.202305029

Abstract:: To solve the problems that the existing single-stage 3D object detection algorithm utilizes point cloud downsampling features in a single way and the degree of aggregation of features for the long-range contextual information cannot meet the requirement of enhancing the algorithm performance, we propose a single-stage 3D object detection algorithm based on multi-channel cross attention fusion. First, the channel-wise cross attention module is designed to fuse the down sampled features, which can enhance the expression ability of multi-scale features for the long-range spatial information under different receptive field based on the cross attention mechanism. Then, a cascade feature excitation module is proposed to combine the original downsampling features to cascade channel-wise cross attention weighted features to enhance the algorithm’s learning ability for key spatial features. Extensive experiments were conducted on the public autonomous driving dataset KITTI and compared with mainstream algorithms. As a single-stage algorithm, the detection accuracy was 91.34%, 79.85% and 75.98% for the three difficulty levels of car categories, which were 4.83%, 3.26% and 3.32% better than the baseline algorithm. The experimental results demonstrate the effectiveness and advancement of the algorithm and the proposed modules for 3D object detection task.

References:: [1] HE Chenhang, LI Ruihuang, LI Shuai, et al. Voxel set transformer: a set-to-set approach to 3D object detection from point clouds[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8417-8427.
[2] 张新钰, 邹镇洪, 李志伟, 等. 面向自动驾驶目标检测的深度多模态融合技术[J]. 智能系统学报, 2020, 15(4): 758–771
ZHANG Xinyu, ZOU Zhenhong, LI Zhiwei, et al. Deep multi-modal fusion in object detection for autonomous driving[J]. CAAI transactions on intelligent systems, 2020, 15(4): 758–771
[3] 王凤随, 陈金刚, 王启胜, 等. 自适应上下文特征的多尺度目标检测算法[J]. 智能系统学报, 2022, 17(2): 276–285
WANG Fengsui, CHEN Jingang, WANG Qisheng, et al. Multi-scale target detection algorithm based on adaptive context features[J]. CAAI transactions on intelligent systems, 2022, 17(2): 276–285
[4] FERNANDES D, SILVA A, NéVOA R, et al. Point-cloud based 3D object detection and classification methods for self-driving applications: a survey and taxonomy[J]. Information fusion, 2021, 68: 161–191.
[5] QI Charles R, SU Hao, MO Kaichun, et al. PointNet: deep learning on point sets for 3d classification and segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 652-660.
[6] SHI Shaoshuai, WANG Xiaogang, LI Hongsheng. Pointrcnn: 3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 770-779.
[7] YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951-1960.
[8] YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11040-11048.
[9] SHI Weijing, RAJKUMAR R. Point-GNN: graph neural network for 3D object detection in a point cloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1711-1719.
[10] ZHANG Yifan, HU Qingyong, XU Guoquan, et al. Not all points are equal: learning highly efficient point-based detectors for 3D lidar point clouds[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18953-18962.
[11] CHEN Chen, CHEN Zhe, ZHANG Jing, et al. SASA: semantics-augmented set abstraction for point-based 3D object detection[C]//2022 AAAI Conference on Artificial Intelligence. Vancouver: IEEE, 2022, 36(1): 221-229.
[12] GUO Yulan, WANG Hanyun, HU Qingyong, et al. Deep learning for 3D point clouds: a survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(12): 4338–4364.
[13] XIAO Aoran, HUANG Jiaxing, GUAN Dayan, et al. Unsupervised point cloud representation learning with deep neural networks: a survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(9): 11321–11339.
[14] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4490-4499.
[15] YAN Yan, MAO Yuxing, LI Bo. Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[16] DENG Jiajun, SHI Shaoshuai, LI Peiwei, et al. Voxel R-CNN: towards high performance voxel-based 3D object detection[C]//2021 AAAI Conference on Artificial Intelligence. [S.l.]: IEEE, 2021, 35(2): 1201-1209.
[17] LANG A H, VORA S, CAESAR H, et al. Pointpillars: fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12697-12705.
[18] SHI Guangsheng, LI Ruifeng, MA Chao. Pillarnet: real-time and high-performance pillar-based 3D object detection[C]//2022 European Conference on Computer Vision. Tel Aviv: Springer, 2022: 35-52.
[19] WANG Yue, FATHI A, KUNDU A, et al. Pillar-based object detection for autonomous driving[C]//2020 European Conference on Computer Vision. Glasgow: Springer, 2020: 18-34.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 6000–6010.
[21] MAO Jiageng, XUE Yujing, NIU Minzhe, et al. Voxel transformer for 3D object detection[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 3164-3173.
[22] SHENG Hualian, CAI Sijia, LIU Yuan, et al. Improving 3D object detection with channel-wise transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 2743-2752.
[23] 吴军, 崔玥, 赵雪梅, 等. SSA-PointNet++: 空间自注意力机制下的3D点云语义分割网络[J]. 计算机辅助设计与图形学学报, 2022, 34(3): 437–448
WU Jun, CUI Yue, ZHAO Xuemei, et al. SSA-Point Net++: a space self-attention CNN for the semantic segmentation of 3D point cloud[J]. Journal of computer-aided design & computer graphics, 2022, 34(3): 437–448
[24] GUO Menghao, XU Tianxing, LIU Jiangjiang, et al. Attention mechanisms in computer vision: a survey[J]. Computational visual media, 2022, 8(3): 331–368.
[25] LU Dening, XIE Qian, WEI Mingqiang, et al. Transformers in 3D point clouds: a survey[EB/OL]. (2022-09-21) [2023-05-16]. https://www.arxiv.org/abs/2205.07417v2.
[26] GRAHAM B, ENGELCKE M, MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9224-9232.
[27] LIU Zhe, ZHAO Xin, HUANG Tengteng, et al. Tanet: robust 3D object detection from point clouds with triple attention[C]//2020 AAAI Conference on Artificial Intelligence. New York: IEEE, 2020, 34(7): 11677-11684.
[28] CHEN Chunfu, FAN Quanfu, PANDA R. Crossvit: cross-attention multi-scale vision transformer for image classification[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 357-366.
[29] LIU Zechen, WU Zizhang, TóTH R. Smoke: single-stage monocular 3D object detection via keypoint estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020: 996-997.
[30] WU Hai, DENG Jinhao, WEN Chenglu, et al. CasA: a cascade attention network for 3-D object detection from LiDAR point clouds[J]. IEEE transactions on geoscience and remote sensing, 2022, 60: 1–11.
[31] CAI Zhaowei, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(5): 1483–1498.
[32] LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[33] HE Chenhang, ZENG Hui, HUANG Jianqiang, et al. Structure aware single-stage 3D object detection from pointcloud[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11873-11882.
[34] YU Chuanbo, LEI Jianjun, PENG Bo, et al. SIEV-Net: a structure-information enhanced voxel network for 3D object detection from LiDAR point clouds[J]. IEEE transactions on geoscience and remote sensing, 2022, 60: 1–11.
[35] CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1907-1915.
[36] WANG Qi, CHEN Jian, DENG Jianqiang, et al. 3D-CenterNet: 3D object detection network for point clouds with center estimation priority[J]. Pattern recognition, 2021, 115: 107884.
[37] ZHENG Wu, TANG Weiliang, CHEN Sijin, et al. CIA-SSD: confident IoU-aware single-stage object detector from point cloud[C]//2021 AAAI Conference on Artificial Intelligence. [S.l.]: IEEE, 2021, 35(4): 3555-3562.
[38] YE Maosheng, XU Shuangjie, CAO Tongyi. HvNet: hybrid voxel network for lidar based 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1631-1640.
[39] QIAN Rui, LAI Xin, LI Xirong. BADet: boundary-aware 3D object detection from point clouds[J]. Pattern recognition, 2022, 125: 108524.
[40] SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10529-10538.
[41] YANG Honghui, LIU Zili, WU Xiaopei, et al. Graph R-CNN: towards accurate 3D object detection with semantic-decorated local graph[C]//2022 European Conference on Computer Vision. Tel Aviv: Springer, 2022: 662-679.
[42] SHI Shaoshuai, WANG Zhe, SHI Jianping, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(8): 2647–2664.
[43] CHEN Yukang, LI Yanwei, ZHANG Xiangyu, et al. Focal sparse convolutional networks for 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5428-5437.
[44] LI Jiale, DAI Hang, SHAO Ling, et al. From voxel to point: iou-guided 3D object detection for point cloud with voxel-to-point decoder[C]//2021 ACM International Conference on Multimedia. New York: ACM, 2021: 4622-4631.

Similar References:

Memo

Last Update: 1900-01-01

3D object detection algorithm with multi-channel cross attention fusion PDF DownloadHTML

Memo

3D object detection algorithm with multi-channel cross attention fusion

PDF Download HTML