<-上一篇/Previous Article 下一篇/Next Article->

[1]吴一全,蔡佳琦.自动驾驶中深度学习的三维目标检测方法综述[J].智能系统学报,2026,21(2):297-320.[doi:10.11992/tis.202504021]
　WU Yiquan,CAI Jiaqi.Deep learning-based 3D object detection for autonomous driving：a comprehensive review[J].CAAI Transactions on Intelligent Systems,2026,21(2):297-320.[doi:10.11992/tis.202504021]

点击复制

自动驾驶中深度学习的三维目标检测方法综述

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 21 期数: 2026年第2期页码: 297-320 栏目: 综述出版日期: 2026-03-05

Title:: Deep learning-based 3D object detection for autonomous driving：a comprehensive review

作者:: 吴一全, 蔡佳琦; 南京航空航天大学电子信息工程学院, 江苏南京 211106

Author(s):: WU Yiquan, CAI Jiaqi; School of Electronic Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

关键词:: 自动驾驶; 三维目标检测; 深度学习; 点云; 多传感器融合; 卷积神经网络; 数据集; 性能评价指标

Keywords:: autonomous driving; 3D object detection; deep learning; point cloud; multi-sensor fusion; convolutional neural network; dataset; performance evaluation metrics

分类号:: TP391.41

DOI:: 10.11992/tis.202504021

摘要:: 自动驾驶技术的快速发展对车辆感知系统准确性和实时性的要求日益提升。三维目标检测作为车辆感知系统的核心组成部分，对于确保行车安全和提升驾驶体验至关重要。首先将三维目标检测算法按传感器所获取的数据类型分为3类：视觉算法(包括基于二维特征和三维特征的子类)、激光点云算法(涵盖网格化点云、原始点云和混合点云)、基于多传感器的算法(按照网络串行融合和并行融合的方式进行分类)。据此总结了具体算法的特点、贡献及局限性。随后，介绍了典型三维目标检测数据集及其评价指标，并比较了代表性算法在不同数据集上的性能。最后，分析了当前技术面临的挑战，并对未来发展方向进行了展望。

Abstract:: The rapid advancement of autonomous driving technology has increasingly heightened the demands for the accuracy and real-time performance of vehicle perception systems. 3D Object Detection, as a core component of vehicle perception systems, is of vital importance for ensuring driving safety and enhancing the driving experience. Firstly, 3D object detection algorithms are categorized into three types based on the data types acquired by sensors: Visual algorithms encompass subcategories based on 2D and 3D features; LiDAR point cloud algorithms cover grid-based point clouds, raw point clouds, and hybrid point cloud approaches; multi-sensor-based algorithms are classified based on the modes of serial and parallel fusion of the network. Accordingly, the features, contributions, and limitations of specific algorithms are summarized. Subsequently, typical 3D object detection datasets and their evaluation metrics are reviewed, and the performance of representative algorithms on different datasets is compared. Finally, the current technical challenges are analyzed, and the future development directions are prospected.

参考文献/References:: [1] 百度地图, 北京交通发展研究院, 清华大学数据科学研究院交通大数据研究中心, 等. 2024年度中国城市交通报告[R]. 北京: 百度地图, 2024: 7-8. BAIDU Maps, Beijing Traffic Development Research Institute, Tsinghua University Data Science Research Institute Traffic Big Data Center, et al. 2024 Annual China Urban Traffic Report[R]. Beijing: Baidu Maps, 2024: 7-8.
[2] 段伟. 汽车自动驾驶技术简述[J]. 中国自动识别技术, 2024(2): 66-68 DUAN Wei. A brief introduction to automobile autonomous driving technology[J]. China automatic identification technology, 2024(2): 66-68
[3] 郭毅锋, 吴帝浩, 魏青民. 基于深度学习的点云三维目标检测方法综述[J]. 计算机应用研究, 2023, 40(1): 20-27 GUO Yifeng, WU Dihao, WEI Qingmin. Overview of single-sensor and multi-sensor point cloud 3D target detection methods[J]. Application research of computers, 2023, 40(1): 20-27
[4] 李佳男, 王泽, 许廷发. 基于点云数据的三维目标检测技术研究进展[J]. 光学学报, 2023, 43(15): 296-312 LI Jianan, WANG Ze, XU Tingfa. Three-Dimensional Object Detection Technology Based on Point Cloud Data[J]. Acta optica sinica, 2023, 43(15): 296-312
[5] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. CAO Jiale, LI Yali, SUN Hanqin, et al. A survey on deep learning based visual object detection[J]. Journal of image and graphics, 2022, 27(6): 1697-1722.
[6] 贾明达, 杨金明, 孟维亮, 等. 融合点云与图像的环境目标检测研究进展[J]. 中国图象图形学报, 2024, 29(6): 1765-1784. JIA Minda, YANG Jinming, MENG Weiliang, et al. 2024. Survey on the fusion of point clouds and images for environmental object detection[J]. Journal of image and graphics, 2024, 29(6): 1765-1784.
[7] CUI Yaodong, CHEN Ren, CHU Wenbo, et al. Deep learning for image and point cloud fusion in autonomous driving: a review[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(2): 722-739.
[8] 陈慧娴, 吴一全, 张耀. 基于深度学习的三维点云分析方法研究进展[J]. 仪器仪表学报, 2023, 44(11): 130-158. CHEN Huixian, WU Yiquan, ZHANG Yao. Research progress on 3D point cloud analysis methods based on deep learning[J]. Chinese journal of scientific instrument, 2023, 44(11): 130-158.
[9] 周燕, 许业文, 蒲磊, 等. 自动驾驶场景下的图像三维目标检测研究进展[J]. 计算机科学, 2024, 1-18. ZHOU Yan, XU Yewen, PU Lei, et al. Research progress on image 3D target detection in autonomous driving scenarios[J]. Computer science, 2024, 1-18.
[10] DROBNITZKY M, FRIEDERICH J, EGGER B, et al. Survey and systematization of 3D object detection models and methods[J]. The visual computer, 2024, 40(3): 1867-1913.
[11] 任柯燕, 谷美颖, 袁正谦, 等. 自动驾驶 3D 目标检测研究综述[J]. 控制与决策, 2023, 38(4): 865-889. REN Keyan, GU Meiyin, YUAN Zhengqian, et al. Review of research on 3D target detection in autonomous driving[J]. Control and decision, 2023, 38(4): 865-889.
[12] 张新宇, 徐子贤, 闫冬梅, 等. 基于深度学习的3D目标检测算法综述[J]. 控制工程, 2024, 31(3): 526-534. ZHANG Xinyu, XU Zixian, YAN Dongmei, et al. Review of 3D object detection algorithms based on deep learning[J]. Control engineering, 2024, 31(3): 526-534.
[13] MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7074-7082.
[14] LI Buyu, OUYANG Wanli, Lu Sheng, et al. GS3D: an efficient 3D object detection framework for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1019-1028.
[15] LUO Shujie, DAI Hang, SHAO Ling, et al. M3DSSD: monocular 3D single stage object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 6145-6154.
[16] QIN Zengyi, WANG Jinglu, LU Yan. Triangulation learning network: from monocular to stereo 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7615-7623.
[17] GUO Xiaoyang, SHI Shaoshuai, WANG Xiaogang, et al. LIGA-Stereo: learning LiDAR geometry aware representations for stereo-based 3D detector[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 3153-3163.
[18] 迟旭然, 裴伟, 朱永英, 等. Fast Stereo-RCNN三维目标检测算法[J]. 小型微型计算机系统, 2022, 43(10): 2157-2161. CHI Xuran, PEI Wei, ZHU Yongying, et al. Fast Stereo-RCNN 3D target detection algorithm[J]. Mini-micro computer systems, 2022, 43(10): 2157-2161.
[19] HEN Xiaozhi, KUNDU K, ZHU Yukun, et al. 3D object proposals using stereo imagery for accurate object class detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(5): 1259-1272.
[20] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[21] CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2040-2049.
[22] KUNDU A, LI Yin, REHG J M. 3D-RCNN: instance-level 3D object reconstruction via render-and-compare[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3559-3568.
[23] KREISS S, BERTONI L, ALAHI A. PifPaf: composite fields for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 11977-11986.
[24] BERTONI L, KREISS S, ALAHI A. MonoLoco: monocular 3D pedestrian localization and uncertainty estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 6861-6871.
[25] LI Peixuan, ZHAO Huaici, LIU Pengfei, et al. RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2020: 644-660.
[26] CAI Yingjie, LI Buyu, JIAO Zeyu, et al. Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 10478-10485.
[27] LIU Zongdai, ZHOU Dingfu, LU Feixiang, et al. AutoShape: real-time shape-aware monocular 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15641-15650.
[28] SHUAI Qingyao, ZHANG Chi, YANG Kaizhi, et al. DPF-Net: combining explicit shape priors in deformable primitive field for unsupervised structural reconstruction of 3D objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 14321-14329.
[29] DUAN Fan, YU Jiahao, CHEN Li. T-CorresNet: template guided 3D point cloud completion with correspondence pooling query generation strategy[C]//European Conference on Computer Vision. Milan: Springer Nature Switzerland, 2024: 90-106.
[30] CHEN Yongjian, TAI Lei, SUN Kai, et al. MonoPair: monocular 3D object detection using pairwise spatial relationships[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 12093-12102.
[31] MA Xinzhu, ZHANG Yinmin, XU Dan, et al. Delving into localization errors for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4721-4730.
[32] ZHANG Yunpeng, LU Jiwen, ZHOU Jie, et al. Objects are different: flexible monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3289-3298.
[33] 汪萌, 诸兵. 不确定性建模在2D和3D目标检测中的应用[J]. 系统工程与电子技术, 2023, 45(8): 2370-2376. WANG Meng, ZHU Bing. Application of uncertainty modeling in 2D and 3D target detection[J]. Systems engineering and electronics, 2023, 45(8): 2370-2376.
[34] HUANG K C, WU T H, SU H T, et al. MonoDTR: monocular 3D object detection with depth-aware Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4012-4021.
[35] LI Zhiqi, WANG Wenhai, LI Hongyang, et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers[EB/OL]. (2022-03-31)[2025-04-24]. https://arxiv.org/abs/2203.17270.
[36] WANG Zeyu, LI Dingwen, LUO Chenxu, et al. DistillBEV: boosting multi-camera 3D object detection with cross-modal knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 8637-8646.
[37] XU Bin, CHEN Zhenzhong. Multi-level fusion based 3D object detection from monocular images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2345-2353.
[38] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 270-279.
[39] DING Mingyu, HUO Yuqi, YI Hongwei, et al. Learning depth-guided convolutions for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Virtual Conference: IEEE, 2020: 1000-1001.
[40] PENG Liang, WU Xiaopei, YANG Zheng, et al. DID-M3D: decoupling instance depth for monocular 3D object detection[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 71-88.
[41] RODDICK T, KENDALL A, CIPOLLA R. Orthographic feature transform for monocular 3D object detection[EB/OL]. (2018-11-20)[2025-04-24]. https://arxiv.org/abs/1811.08188
[42] LIU Yingfei, WANG Tiancai, ZHANG Xiangyu, et al. PETR: position embedding transformation for multi-view 3D object detection[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 531-548.
[43] ?BONTAR J, LECUN Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of machine learning research, 2016, 17(65): 1-32.
[44] KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 66-75.
[45] WANG Yan, CHAO Weilun, GARG D, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 8445-8453.
[46] FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2002-2011.
[47] CHANG Jiaren, CHEN Yongshen. Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5410-5418.
[48] WANG Xinlong, YIN Wei, KONG Tao, et al. Task-aware monocular depth estimation for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12257-12264.
[49] LI Chengyao, KU J, WASLANDER S L. Confidence guided stereo 3D object detection with split depth estimation[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Las Vegas: IEEE, 2020: 5776-5783.
[50] HOSSAIN S, LIN Xianke. Efficient stereo depth estimation for pseudo-LiDAR: a self-supervised approach based on multi-input ResNet encoder[J]. Sensors, 2023, 23(3): 1650.
[51] OH C, JANG Y, SHIM D, et al. Automatic pseudo-LiDAR annotation: generation of training data for 3D object detection networks[J]. IEEE access, 2024.
[52] LI Bo, ZHANG Tianlei, XIA Tian. Vehicle detection from 3D LiDAR using fully convolutional network[EB/OL]. (2016-08-29)[2025-04-24]. https://arxiv.org/abs/1608.07916
[53] YANG Bin, LUO Wenjie, URTASUN R. PIXOR: real-time 3D object detection from point clouds[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7652-7660.
[54] BELTR?N J, GUINDEL C, MORENO F M, et al. BirdNet: a 3D object detection framework from LiDAR information[C]//2018 21st International Conference on Intelligent Transportation Systems(ITSC). Miami: IEEE, 2018: 3517-3523.
[55] MEYER G P, LADDHA A, KEE E, et al. LaserNet: an efficient probabilistic 3D object detector for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12677-12686.
[56] DENG Jiajun, ZHOU Wengang, ZHANG Yanyong, et al. From multi-view to hollow-3D: hallucinated hollow-3D R-CNN for 3D object detection[J]. IEEE transactions on circuits and systems for video technology, 2021, 31(12): 4722-4734.
[57] SUN Pei, WANG Weiyue, CHAI Yuning, et al. RSN: range sparse net for efficient, accurate LiDAR 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 5725-5734.
[58] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4490-4499.
[59] YAN Yan, MAO Yuxing, LI Bo. Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[60] LANG A H, VORA S, CAESAR H, et al. Pointpillars: fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12697-12705.
[61] LI Haisheng, LU Yanling. 3D object detection based on point cloud in automatic driving scene[J]. Multimedia tools and applications, 2024, 83(5): 13029-13044.
[62] Wang Bei, An Jianping, Cao Jiayan, et al. Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds[J]. Sensors, 2020, 20(3): 704.
[63] LIU Zhe, ZHAO Xin, HUANG Tengteng, et al. TANet: robust 3D object detection from point clouds with triple attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 11677-11684.
[64] CHEN Yukang, LIU Jianhui, ZHANG Xiangyu, et al. VoxelNeXt: fully sparse voxelnet for 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 21674-21683.
[65] ZHENG Wu, TANG Weiliang, CHEN Sijin, et al. CIA-SSD: confident IOU-aware single-stage object detector from point cloud[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3555-3562.
[66] FAN Lue, PANG Ziqi, ZHANG Tianyuan, et al. Embracing single stride 3D object detector with sparse Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8458-8468.
[67] HE Chenhang, LI Ruihuang, LI Shuai, et al. Voxel set Transformer: a set-to-set approach to 3D object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8417-8427.
[68] HE Chenhang, ZENG Hui, HUANG Jianqiang, et al. Structure aware single-stage 3D object detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11873-11882.
[69] ZHAO Tianchen, NING Xuefei, HONG Ke, et al. Ada3D: exploiting the spatial redundancy with adaptive inference for efficient 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17728-17738.
[70] QI C R, SU Hao, MO Kaichun, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Montreal: IEEE, 2017: 652-660.
[71] QI C R, YI Li, SU Hao, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017, 30.
[72] YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951-1960.
[73] 陈熙源, 戈明明, 姚志婷, 等. 雨雪天气下的激光雷达滤波算法研究[J]. 仪器仪表学报, 2023, 44(7): 172-181. CHEN Xiyuan, GE Mingming, YAO Zhiting, et al. Filtering algorithm of LiDAR in rainy and snowy weather[J]. Chinese journal of scientific instrument, 2023, 44(7): 172-181.
[74] TAO Manli, ZHAO Chaoyang, TANG Ming, et al. Objformer: boosting 3D object detection via instance-wise interaction[J]. Pattern Recognition, 2024, 146: 110061.
[75] CHEN Chen, CHEN Zhe, ZHANG Jing, et al. SASA: semantics-augmented set abstraction for point-based 3D object detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2022, 36(1): 221-229.
[76] ZHANG Yifan, HU Qingyong, XU Guoquan, et al. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18953-18962.
[77] 王理嘉, 于欢, 刘守印. 动态环境中多帧点云融合算法及三维目标检测算法研究[J]. 计算机应用研究, 2023, 40(3): 909-913. WANG Lijia, YU Huan, LIU Shouyin. Research on multi-frame point cloud fusion and 3D target detection algorithms in dynamic environments[J]. Application research of computers, 2023, 40(3): 909-913.
[78] ZHANG Gang, CHEN Junnan, GAO Guohuan, et al. HEDNet: a hierarchical encoder-decoder network for 3D object detection in point clouds[J]. Advances in neural information processing systems, 2024, 36.
[79] LI Yangyan, BU Rui, SUN Mingchao, et al. PointCNN: convolution on X-transformed points[J]. Advances in neural information processing systems, 2018, 31.
[80] YIN Tianwei, ZHOU Xingyi, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11784-11793.
[81] 涂新奎, 郑少武, 于善虎, 等. 基于对称形状生成的三维目标检测网络[J]. 仪器仪表学报, 2023, 44(6): 252-263. TU Xinkui, ZHENG Shaowu, YU Shanhu, et al. 3D target detection network based on symmetric shape generation[J]. Chinese journal of scientific instrument, 2023, 44(6): 252-263.
[82] 陶乐, 王海, 蔡英凤, 等. 面向自动驾驶场景的多目标点云检测算法[J]. 汽车工程, 2024, 46(7): 1208-1218, 1238. TAO Le, WANG Hai, CAI Yingfeng, et al. Multi-object point cloud detection algorithm for autonomous driving scenarios[J]. Automotive engineering, 2024, 46(7): 1208-1218, 1238.
[83] 周昊, 齐洪钢, 邓永强, 等. 融合点云深度信息的3D目标检测与分类[J]. 中国图象图形学报, 2024, 29(8): 2399-2412. ZHOU Hao, QI Honggang, DENG Yongqiang, et al. 3D target detection and classification using fused point cloud depth information[J]. Journal of image and graphics, 2024, 29(8): 2399-2412.
[84] SHI Weijing, RAJKUMAR R. Point-GNN: graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 1711-1719.
[85] NAJIBI M, LAI Guangda, KUNDU A, et al. DOPS: learning to detect 3D objects and predict their 3D shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11913-11922.
[86] ZHANG Yanan, HUANG Di, WANG Yunhong. PC-RGNN: point cloud completion and graph neural network for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3430-3437.
[87] LIU Zhijian, TANG Haotian, LIN Yujun, et al. Point-voxel CNN for efficient 3D deep learning[J]. Advances in neural information processing systems, 2019, 32.
[88] NOH J, LEE S, HAM B. HVPR: hybrid voxel-point representation for single-stage 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 14605-14614.
[89] SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 10529-10538.
[90] WU Peng, GU Lipeng, YAN Xuefeng, et al. PV-RCNN++: semantical point-voxel feature interaction for 3D object detection[J]. The visual computer, 2023, 39(6): 2425-2440.
[91] ZHOU Wei, ZHANG Xiaodan, HAO Xin, et al. Multi point-voxel convolution(MPVConv) for deep learning on point clouds[J]. Computers & graphics, 2023, 112: 72-80.
[92] 李虎辰, 管海燕, 雷相达, 等. 基于点–体素一致性约束的城市激光雷达点云分类[J]. 中国激光, 2024, 51(13): 251-264. LI Huchen, GUAN Haiyan, LEI Xiangda, et al. Urban LiDAR point cloud classification based on point-voxel consistency constraints[J]. Chinese journal of lasers, 2024, 51(13): 251-264.
[93] DENG Pengzhen, ZHOU Li, CHEN Jie. PVC-SSD: point-voxel dual-channel fusion with cascade point estimation for anchor-free single-stage 3D object detection[J]. IEEE sensors journal, 2024.
[94] CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1907-1915.
[95] KU Jason, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Madrid: IEEE, 2018: 1-8.
[96] QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 918-927.
[97] WANG Zhixin, JIA Kui. Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Osaka: IEEE, 2019: 1742-1749.
[98] VORA S, LANG A H, HELOU B, et al. Pointpainting: sequential fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 4604-4612.
[99] LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7345-7353.
[100] WU Xiaopei, PENG Liang, YANG Honghui, et al. Sparse fuse dense: towards high quality 3D detection with depth completion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5418-5427.
[101] 黄漫, 黄勃, 高永彬. 引入深度补全与实例分割的三维目标检测[J]. 传感器与微系统, 2021, 40(1): 129-132. HUANG Man, HUANG Bo, GAO Yongbin. 3D target detection with depth completion and instance segmentation[J]. Sensors and microsystems, 2021, 40(1): 129-132.
[102] XIE Yichen, XU Chenfeng, RAKOTOSAONA M J, et al. Sparsefusion: Fusing multi-modal sparse representations for multi-sensor 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17591-17602.
[103] ZHANG Yanan, CHEN Jiaxin, HUANG Di. CAT-Det: contrastively augmented Transformer for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 908-917.
[104] GUNN J, LENYK Z, SHARMA A, et al. Lift-Attend-Splat: bird’s-eye-view camera-LiDAR fusion using Transformers[EB/OL]. (2023-12-22)[2025-04-24]. https://arxiv.org/abs/2312.14919
[105] LIU Zhijian, TANG Haotian, AMINI A, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation[C]//2023 IEEE International Conference on Robotics and Automation(ICRA). London: IEEE, 2023: 2774-2781.
[106] WANG Ke, ZHOU Tianqiang, ZHANG Zhichuang, et al. PVF-DectNet: multi-modal 3D detection network based on perspective-voxel fusion[J]. Engineering applications of artificial intelligence, 2023, 120: 105951.
[107] LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17182-17191.
[108] 周治国, 马文浩. 一种多层多模态融合3D目标检测方法[J]. 电子学报, 2024, 52(3): 696-708. ZHOU Zhigui, MA Wenhao. A multi-layer multi-modal fusion 3D target detection method[J]. Acta electronica sinica, 2024, 52(3): 696-708.
[109] LIU Huaijin, DU Jixiang, ZHANG Yong, et al. PVConvNet: pixel-voxel sparse convolution for multimodal 3D object detection[J]. Pattern recognition, 2024, 149: 110284.
[110] 金宇锋, 陶重犇. 基于Transformer的融合信息增强3D目标检测算法[J]. 仪器仪表学报, 2023, 44(12): 297-306. JIN Yufeng, TAO Zhongben. Fusion-enhanced 3D target detection algorithm based on Transformer[J]. Chinese journal of scientific instrument, 2023, 44(12): 297-306.
[111] XIA Chenxing, LI Xubing, GAO Xiuju, et al. PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion[J]. Neural computing and applications, 2024: 1-18.
[112] 王五岳, 徐召飞, 曲春燕, 等. 基于红外与激光雷达融合的鸟瞰图空间三维目标检测算法[J]. 光子学报, 2024, 53(1): 73-84. WANG Wuyue, XU Zhaofei, QU Chunyan, et al. 3D target detection algorithm in bird’s-eye view space based on infrared and LiDAR fusion[J]. Acta photonica sinica, 2024, 53(1): 73-84.
[113] 董钰婷, 官磊. 基于自适应加权融合激光雷达和相机的三维目标检测方法[J]. 计算机应用, 2024, 44(S1): 250-255. DONG Yyuting, GUAN Lei. 3D target detection method based on adaptive weighted fusion of LiDAR and camera[J]. Computer applications, 2024, 44(S1): 250-255.
[114] 李文礼, 喻飞, 石晓辉, 等. BEV特征下激光雷达和单目相机融合的目标检测算法研究[J]. 计算机工程与应用, 2024, 60(11): 182-193. LI Wenli, YU Fei, SHI Xiaohui, et al. Target detection algorithm based on BEV features for LiDAR and monocular camera fusion[J]. Computer engineering and applications, 2024, 60(11): 182-193.
[115] NABATI R, QI Hairong. CenterFusion: center-based radar and camera fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 1527-1536.
[116] BANSAL K, RUNGTA K, BHARADIA D. RadSegNet: a reliable approach to radar camera fusion[EB/OL]. (2022-08-08)[2025-04-24]. https://arxiv.org/abs/2208.03849
[117] KIM Y, KIM S, CHOI J W, et al. CRAFT: camera-radar 3D object detection with spatio-contextual fusion Transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 1160-1168.
[118] KIM Y, SHIN J, KIM S, et al. CRN: camera radar net for accurate, robust, efficient 3D perception[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17615-17626.
[119] 车俐, 吕连辉, 蒋留兵. AF-CenterNet: 基于交叉注意力机制的毫米波雷达和相机融合的目标检测[J]. 计算机应用研究, 2024, 41(4): 1258-1263. CHE Li, LYU Lianhui, JIANG Liubing. AF-CenterNet: Cross-attention mechanism-based millimeter-wave radar and camera fusion for target detection[J]. Application research of computers, 2024, 41(4): 1258-1263.
[120] LIU Xiang, LI Zhenglin, ZHOU Yang, et al. Camera–radar fusion with modality interaction and radar Gaussian expansion for 3D object detection[J]. Cyborg and bionic systems, 2024, 5: 0079.
[121] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.
[122] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11621-11631.
[123] SUN Pei, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 2446-2454.
[124] HUANG Xinyu, WANG Peng, CHENG Xinjing, et al. The apolloscape open dataset for autonomous driving and its application[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(10): 2702-2719.
[125] CHOI Y, KIM N, HWANG S, et al. KAIST multi-spectral day/night data set for autonomous and assisted driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(03): 934-948.
[126] HOUSTON J, ZUIDHOF G, BERGAMINI L, et al. One thousand and one hours: Self-driving motion prediction dataset[C]//Conference on Robot Learning. Palo Alto: PMLR, 2021: 409-418.
[127] YU Haibao, LUO Yizhen, SHU Mao, et al. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 21361-21370.
[128] G?HLERT N, JOURDAN N, CORDTS M, et al. Cityscapes 3D: Dataset and benchmark for 9 dof vehicle detection[EB/OL]. (2020-06-14)[2025-04-24]. https://arxiv.org/abs/2006.07864
[129] WILSON B, QI W, AGARWAL T, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting[EB/OL]. (2023-01-02)[2025-04-24]. https://arxiv.org/abs/2301.00493
[130] XIAO Pengchuan, SHAO Zhenlei, HAO S, et al. PandaSet: advanced sensor suite dataset for autonomous driving[C]//2021 IEEE International Intelligent Transportation Systems Conference(ITSC). Indianapolis: IEEE, 2021: 3095-3101.
[131] PATIL A, MALLA S, GANG H, et al. The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes[C]//2019 International Conference on Robotics and Automation(ICRA). Montreal: IEEE, 2019: 9552-9557.
[132] CONG Peishan, ZHU Xinge, QIAO Feng, et al. STCrowd: a multimodal dataset for pedestrian perception in crowded scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 19608-19617.
[133] XIAO Aoran, HUANG Jiaxing, GUAN Dayan, et al. Transfer learning from synthetic to real LiDAR point cloud for semantic segmentation[EB/OL]. (2021-07-12)[2025-04-24]. https://arxiv.org/abs/2107.05399
[134] PANG Su, MORRIS D, RADHA H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE, 2020: 10386-10393.
[135] WU Hai, WEN Chenglu, SHI Shaoshuai, et al. Virtual sparse convolution for multimodal 3D object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver: IEEE, 2023: 21653-21662.

相似文献/References:: [1]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(3):366.[doi:10.11992/tis.201706040]
　GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13():366.[doi:10.11992/tis.201706040]
[2]王星,赵海良,王志刚.基于邻域系统的智能车辆最优轨迹规划方法[J].智能系统学报,2019,14(5):1040.[doi:10.11992/tis.201805004]
　WANG Xing,ZHAO Hailiang,WANG Zhigang.Optimal trajectory planning method of intelligent vehicles based on neighborhood system[J].CAAI Transactions on Intelligent Systems,2019,14():1040.[doi:10.11992/tis.201805004]
[3]张新钰,邹镇洪,李志伟,等.面向自动驾驶目标检测的深度多模态融合技术[J].智能系统学报,2020,15(4):758.[doi:10.11992/tis.202002010]
　ZHANG Xinyu,ZOU Zhenhong,LI Zhiwei,et al.Deep multi-modal fusion in object detection for autonomous driving[J].CAAI Transactions on Intelligent Systems,2020,15():758.[doi:10.11992/tis.202002010]
[4]陆军,李杨,鲁林超.远距离和遮挡下三维目标检测算法研究[J].智能系统学报,2024,19(2):259.[doi:10.11992/tis.202301001]
　LU Jun,LI Yang,LU Linchao.Long-distance and occluded 3D target detection algorithm[J].CAAI Transactions on Intelligent Systems,2024,19():259.[doi:10.11992/tis.202301001]
[5]鲁斌,孙洋,杨振宇.融合体素图注意力的三维目标检测算法[J].智能系统学报,2024,19(3):598.[doi:10.11992/tis.202209008]
　LU Bin,SUN Yang,YANG Zhenyu.3D object detection algorithm with voxel graph attention[J].CAAI Transactions on Intelligent Systems,2024,19():598.[doi:10.11992/tis.202209008]
[6]唐友名,孙冠豫,孙贵斌,等.基于城市超车工况的智能车辆避障规划方法研究[J].智能系统学报,2024,19(3):619.[doi:10.11992/tis.202209060]
　TANG Youming,SUN Guanyu,SUN Guibin,et al.Autonomous vehicle trajectory planning based on urban overtaking conditions[J].CAAI Transactions on Intelligent Systems,2024,19():619.[doi:10.11992/tis.202209060]
[7]胡丹丹,张忠婷.基于改进YOLOv5s的面向自动驾驶场景的道路目标检测算法[J].智能系统学报,2024,19(3):653.[doi:10.11992/tis.202206034]
　HU Dandan,ZHANG Zhongting.Road target detection algorithm for autonomous driving scenarios based on improved YOLOv5s[J].CAAI Transactions on Intelligent Systems,2024,19():653.[doi:10.11992/tis.202206034]
[8]陆军,鲁林超,翟晓阳,等.面向道路交通场景的高效3D目标检测[J].智能系统学报,2025,20(1):91.[doi:10.11992/tis.202311013]
　LU Jun,LU Linchao,ZHAI Xiaoyang,et al.High-efficiency 3D object detection for road traffic scenes[J].CAAI Transactions on Intelligent Systems,2025,20():91.[doi:10.11992/tis.202311013]
[9]陆军,王旭东,汲广宇,等.基于恒定转弯率和加速度模型的点云多目标跟踪算法[J].智能系统学报,2025,20(6):1328.[doi:10.11992/tis.202503034]
　LU Jun,WANG Xudong,JI Guangyu,et al.Point cloud multitarget tracking algorithm based on the constant turn rate and acceleration model[J].CAAI Transactions on Intelligent Systems,2025,20():1328.[doi:10.11992/tis.202503034]
[10]宫彦,王乃棒,张新钰,等.面向智能网联汽车的 BEV 感知技术与发展趋势[J].智能系统学报,2026,21(1):41.[doi:10.11992/tis.202505027]
　GONG Yan,WANG Naibang,ZHANG Xinyu,et al.BEV perception technologies and development trends for intelligent connected vehicles[J].CAAI Transactions on Intelligent Systems,2026,21():41.[doi:10.11992/tis.202505027]
[11]鲁斌,杨振宇,孙洋,等.基于多通道交叉注意力融合的三维目标检测算法[J].智能系统学报,2024,19(4):885.[doi:10.11992/tis.202305029]
　LU Bin,YANG Zhenyu,SUN Yang,et al.3D object detection algorithm with multi-channel cross attention fusion[J].CAAI Transactions on Intelligent Systems,2024,19():885.[doi:10.11992/tis.202305029]
[12]陆军,赵颢然,鲁林超.基于多模态融合的三维目标检测方法研究[J].智能系统学报,2025,20(5):1167.[doi:10.11992/tis.202502015]
　LU Jun,ZHAO Haoran,LU Linchao.Research on 3D object detection based on multi-modal fusion[J].CAAI Transactions on Intelligent Systems,2025,20():1167.[doi:10.11992/tis.202502015]

备注/Memo

收稿日期:2025-4-24。
基金项目:国家自然科学基金项目 (61573183) .
作者简介:吴一全，教授，主要研究方向为视觉检测与图像测量、视频处理与智能分析。主持国家自然科学基金等项目48 项。发表学术论文350 余篇。E-mail：nuaaimage@163.com。;蔡佳琦，硕士研究生，主要研究方向为计算机视觉、图像处理。E-mail：Caij-q@nuaa.edu.cn。
通讯作者:吴一全. E-mail：nuaaimage@163.com

更新日期/Last Update: 1900-01-01

自动驾驶中深度学习的三维目标检测方法综述 PDF下载HTML

备注/Memo

自动驾驶中深度学习的三维目标检测方法综述

PDF下载 HTML