<-Previous Article Next Article->

[1]LU Jun,LU Linchao,ZHAI Xiaoyang,et al.High-efficiency 3D object detection for road traffic scenes[J].CAAI Transactions on Intelligent Systems,2025,20(1):91-100.[doi:10.11992/tis.202311013]

Copy

High-efficiency 3D object detection for road traffic scenes

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 1 Page number: 91-100 Column: 学术论文—机器感知与模式识别 Public date: 2025-01-05

Title:: High-efficiency 3D object detection for road traffic scenes

Author(s):: LU Jun; LU Linchao; ZHAI Xiaoyang; LIU Shuang; College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China

Keywords:: deep learning; 3D object detection; point cloud; random sampling; local feature aggregation; attention mechanism; autonomous driving

CLC:: TP391

DOI:: 10.11992/tis.202311013

Abstract:: Based on the 3D object proposal generation and detection from pointcloud, namely PointRCNN network, this study proposes an RandLA-RCNN architecture to address the issues of high time cost and inefficiency in the point cloud downsampling stage of the current two-stage point cloud object detection algorithm. Firstly, by taking advantage of the efficiency of random sampling method, the large-scale point cloud data are downsampled to handle massive point cloud data. Then, the spatial positions of each neighboring point of the input point cloud are encoded to effectively enhance the ability of each point to extract local features from the neighborhood. Attention-based pooling rules are used to aggregate local feature vectors and obtain global features. Finally, an extended residual module formed by stacking multiple local spatial encoding units and attention pooling units is used to further enhance the global features of each point and avoid the loss of key point information. Experimental results show that this detection algorithm retains the advantages of PointRCNN network in detecting 3D objects, while achieves nearly twice the detection speed compared with PointRCNN, reaching an inference speed of 16 frames per second.

References:: [1] HUANG Keli, SHI Botian, LI Xiang, et al. Multi-modal sensor fusion for auto driving perception: a survey[EB/OL]. (2022-02-06)[2023-11-13]. https://arxiv.org/abs/2202.02703.
[2] 刘通, 高思洁, 聂为之. 基于多模态信息融合的多目标检测算法[J]. 激光与光电子学进展, 2022, 59(8): 339-348.
LIU Tong, GAO Sijie, NIE Weizhi. Multitarget detection algorithm based on multimodal information fusion[J]. Laser & optoelectronics progress, 2022, 59(8): 339-348.
[3] SONG Ziying, LIU Lin, JIA Feiyang, et al. Robustness-aware 3D object detection in autonomous driving: a review and outlook[EB/OL]. (2024-01-12)[2024-08-02]. http://arxiv.org/abs/2401.06542v3.
[4] VORA S, LANG A H, HELOU B, et al. PointPainting: sequential fusion for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4603-4611.
[5] WANG Chunwei, MA Chao, ZHU Ming, et al. PointAugmenting: cross-modal augmentation for 3D object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11789-11798.
[6] XU Shaoqing, ZHOU Dingfu, FANG Jin, et al. FusionPainting: multimodal fusion with adaptive attention for 3D object detection[C]//2021 IEEE International Intelligent Transportation Systems Conference. Indianapolis: IEEE, 2021: 3047-3054.
[7] BAI Xuyang, HU Zeyu, ZHU Xinge, et al. TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 1080-1089.
[8] LIANG Tingting, XIE Hongwei, YU Kaicheng, et al. Bevfusion: A simple and robust lidar-camera fusion framework[J]. Advances in neural information processing systems, 2022, 35: 10421-10434.
[9] LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: lidar-camera deep fusion for multi-modal 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17161-17170.
[10] HU Haotian, WANG Fanyi, SU Jingwen, et al. EA-BEV: edge-aware bird’s-eye-view projector for 3D object detection[EB/OL]. (2023-03-31)[2023-11-13]. https://arxiv.org/abs/2303.17895.
[11] YAN Junjie, LIU Yingfei, SUN Jianjian, et al. Cross modal transformer via coordinates encoding for 3D object dectection[EB/OL]. (2023-01-03)[2023-11-13]. https://arxiv.org/abs/2301.01283.
[12] WANG Haiyang, TANG Hao, SHI Shaoshuai, et al. UniTR: a unified and efficient multi-modal transformer for bird’s-eye-view representation[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 6792-6802.
[13] 张新钰, 邹镇洪, 李志伟, 等. 面向自动驾驶目标检测的深度多模态融合技术[J]. 智能系统学报, 2020, 15(4): 758-771.
ZHANG Xinyu, ZOU Zhenhong, LI Zhiwei, et al. Deep multi-modal fusion in object detection for autonomous driving[J]. CAAI transactions on intelligent systems, 2020, 15(4): 758-771.
[14] 鲁斌, 杨振宇, 孙洋, 等. 基于多通道交叉注意力融合的三维目标检测算法[J]. 智能系统学报, 2024, 19(4): 885-897.
LU Bin, YANG Zhenyu, SUN Yang, et al. 3D object detection algorithm with multi-channel cross attention fusion[J]. CAAI transactions on intelligent systems, 2024, 19(4): 885-897.
[15] YAN Yan, MAO Yuxing, LI Bo. SECOND: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[16] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12689-12697.
[17] YANG Zetong, ZHOU Yin, CHEN Zhifeng, et al. 3D-MAN: 3D multi-frame attention network for object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1863-1872.
[18] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
[19] YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951-1960.
[20] YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: point-based 3D single stage object detector[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11037-11045.
[21] SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10526-10535.
[22] SHI Shaoshuai, WANG Xiaogang, LI Hongsheng. PointRCNN: 3D object proposal generation and detection from point cloud[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 770-779.
[23] LI Bo, YAN Junjie, WU Wei, et al. High performance visual tracking with siamese region proposal network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8971-8980.
[24] QI C R, YI Li, SU Hao, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[EB/OL]. (2017-06-07)[2023-11-13]. http://arxiv.org/abs/1706.02413v1.
[25] JIANG Yingying, ZHU Xiangyu, WANG Xiaobing, et al. R2CNN: rotational region CNN for orientation robust scene text detection[EB/OL]. (2017-06-29)[2023-11-13]. https://arxiv.org/abs/1706.09579.
[26] HOU Yi, ZHANG Hong, ZHOU Shilin, et al. Efficient ConvNet feature extraction with multiple RoI pooling for landmark-based visual localization of autonomous vehicles[J]. Mobile information systems, 2017: 8104386.
[27] LI Yangyan, BU Rui, SUN Mingchao, et al. Pointcnn: convolution on X-transformed points[J]. Advances in neural information processing systems, 2018, 31: 820-830.
[28] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[29] WANG Hua, NIE Feiping, HUANG Heng. Robust distance metric learning via simultaneous l1-norm minimization and maximization[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing: PMLR, 2014: 1836-1844.
[30] GROH F, WIESCHOLLEK P, LENSCH H P A. Flex-convolution[C]//Asian Conference on Computer Vision. Cham: Springer, 2018: 105-122.
[31] DOVRAT O, LANG I, AVIDAN S. Learning to sample[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2755-2764.
[32] YANG Jiancheng, ZHANG Qiang, NI Bingbing, et al. Modeling point clouds with self-attention and gumbel subset sampling[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3318-3327.
[33] XU K, BA J L, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: ICML, 2015: 2048-2057.
[34] BROWN R A. Building a balanced k-d tree in O(kn log n) time[EB/OL]. (2014-10-20)[2023-11-13]. http://arxiv.org/abs/1410.5420v46.
[35] CHEGN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3d object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1907–1915.

Similar References:

Memo

Last Update: 2025-01-05

High-efficiency 3D object detection for road traffic scenes PDF DownloadHTML

Memo

High-efficiency 3D object detection for road traffic scenes

PDF Download HTML