[1]吴一全,蔡佳琦.自动驾驶中深度学习的三维目标检测方法综述[J].智能系统学报,2026,21(2):297-320.[doi:10.11992/tis.202504021]
 WU Yiquan,CAI Jiaqi.Deep learning-based 3D object detection for autonomous driving:a comprehensive review[J].CAAI Transactions on Intelligent Systems,2026,21(2):297-320.[doi:10.11992/tis.202504021]
点击复制

自动驾驶中深度学习的三维目标检测方法综述

参考文献/References:
[1] 百度地图, 北京交通发展研究院, 清华大学数据科学研究院交通大数据研究中心, 等. 2024年度中国城市交通报告[R]. 北京: 百度地图, 2024: 7-8. BAIDU Maps, Beijing Traffic Development Research Institute, Tsinghua University Data Science Research Institute Traffic Big Data Center, et al. 2024 Annual China Urban Traffic Report[R]. Beijing: Baidu Maps, 2024: 7-8.
[2] 段伟. 汽车自动驾驶技术简述[J]. 中国自动识别技术, 2024(2): 66-68 DUAN Wei. A brief introduction to automobile autonomous driving technology[J]. China automatic identification technology, 2024(2): 66-68
[3] 郭毅锋, 吴帝浩, 魏青民. 基于深度学习的点云三维目标检测方法综述[J]. 计算机应用研究, 2023, 40(1): 20-27 GUO Yifeng, WU Dihao, WEI Qingmin. Overview of single-sensor and multi-sensor point cloud 3D target detection methods[J]. Application research of computers, 2023, 40(1): 20-27
[4] 李佳男, 王泽, 许廷发. 基于点云数据的三维目标检测技术研究进展[J]. 光学学报, 2023, 43(15): 296-312 LI Jianan, WANG Ze, XU Tingfa. Three-Dimensional Object Detection Technology Based on Point Cloud Data[J]. Acta optica sinica, 2023, 43(15): 296-312
[5] 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. CAO Jiale, LI Yali, SUN Hanqin, et al. A survey on deep learning based visual object detection[J]. Journal of image and graphics, 2022, 27(6): 1697-1722.
[6] 贾明达, 杨金明, 孟维亮, 等. 融合点云与图像的环境目标检测研究进展[J]. 中国图象图形学报, 2024, 29(6): 1765-1784. JIA Minda, YANG Jinming, MENG Weiliang, et al. 2024. Survey on the fusion of point clouds and images for environmental object detection[J]. Journal of image and graphics, 2024, 29(6): 1765-1784.
[7] CUI Yaodong, CHEN Ren, CHU Wenbo, et al. Deep learning for image and point cloud fusion in autonomous driving: a review[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(2): 722-739.
[8] 陈慧娴, 吴一全, 张耀. 基于深度学习的三维点云分析方法研究进展[J]. 仪器仪表学报, 2023, 44(11): 130-158. CHEN Huixian, WU Yiquan, ZHANG Yao. Research progress on 3D point cloud analysis methods based on deep learning[J]. Chinese journal of scientific instrument, 2023, 44(11): 130-158.
[9] 周燕, 许业文, 蒲磊, 等. 自动驾驶场景下的图像三维目标检测研究进展[J]. 计算机科学, 2024, 1-18. ZHOU Yan, XU Yewen, PU Lei, et al. Research progress on image 3D target detection in autonomous driving scenarios[J]. Computer science, 2024, 1-18.
[10] DROBNITZKY M, FRIEDERICH J, EGGER B, et al. Survey and systematization of 3D object detection models and methods[J]. The visual computer, 2024, 40(3): 1867-1913.
[11] 任柯燕, 谷美颖, 袁正谦, 等. 自动驾驶 3D 目标检测研究综述[J]. 控制与决策, 2023, 38(4): 865-889. REN Keyan, GU Meiyin, YUAN Zhengqian, et al. Review of research on 3D target detection in autonomous driving[J]. Control and decision, 2023, 38(4): 865-889.
[12] 张新宇, 徐子贤, 闫冬梅, 等. 基于深度学习的3D目标检测算法综述[J]. 控制工程, 2024, 31(3): 526-534. ZHANG Xinyu, XU Zixian, YAN Dongmei, et al. Review of 3D object detection algorithms based on deep learning[J]. Control engineering, 2024, 31(3): 526-534.
[13] MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7074-7082.
[14] LI Buyu, OUYANG Wanli, Lu Sheng, et al. GS3D: an efficient 3D object detection framework for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1019-1028.
[15] LUO Shujie, DAI Hang, SHAO Ling, et al. M3DSSD: monocular 3D single stage object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 6145-6154.
[16] QIN Zengyi, WANG Jinglu, LU Yan. Triangulation learning network: from monocular to stereo 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7615-7623.
[17] GUO Xiaoyang, SHI Shaoshuai, WANG Xiaogang, et al. LIGA-Stereo: learning LiDAR geometry aware representations for stereo-based 3D detector[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 3153-3163.
[18] 迟旭然, 裴伟, 朱永英, 等. Fast Stereo-RCNN三维目标检测算法[J]. 小型微型计算机系统, 2022, 43(10): 2157-2161. CHI Xuran, PEI Wei, ZHU Yongying, et al. Fast Stereo-RCNN 3D target detection algorithm[J]. Mini-micro computer systems, 2022, 43(10): 2157-2161.
[19] HEN Xiaozhi, KUNDU K, ZHU Yukun, et al. 3D object proposals using stereo imagery for accurate object class detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(5): 1259-1272.
[20] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[21] CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2040-2049.
[22] KUNDU A, LI Yin, REHG J M. 3D-RCNN: instance-level 3D object reconstruction via render-and-compare[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3559-3568.
[23] KREISS S, BERTONI L, ALAHI A. PifPaf: composite fields for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 11977-11986.
[24] BERTONI L, KREISS S, ALAHI A. MonoLoco: monocular 3D pedestrian localization and uncertainty estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 6861-6871.
[25] LI Peixuan, ZHAO Huaici, LIU Pengfei, et al. RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2020: 644-660.
[26] CAI Yingjie, LI Buyu, JIAO Zeyu, et al. Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 10478-10485.
[27] LIU Zongdai, ZHOU Dingfu, LU Feixiang, et al. AutoShape: real-time shape-aware monocular 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15641-15650.
[28] SHUAI Qingyao, ZHANG Chi, YANG Kaizhi, et al. DPF-Net: combining explicit shape priors in deformable primitive field for unsupervised structural reconstruction of 3D objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 14321-14329.
[29] DUAN Fan, YU Jiahao, CHEN Li. T-CorresNet: template guided 3D point cloud completion with correspondence pooling query generation strategy[C]//European Conference on Computer Vision. Milan: Springer Nature Switzerland, 2024: 90-106.
[30] CHEN Yongjian, TAI Lei, SUN Kai, et al. MonoPair: monocular 3D object detection using pairwise spatial relationships[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 12093-12102.
[31] MA Xinzhu, ZHANG Yinmin, XU Dan, et al. Delving into localization errors for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4721-4730.
[32] ZHANG Yunpeng, LU Jiwen, ZHOU Jie, et al. Objects are different: flexible monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3289-3298.
[33] 汪萌, 诸兵. 不确定性建模在2D和3D目标检测中的应用[J]. 系统工程与电子技术, 2023, 45(8): 2370-2376. WANG Meng, ZHU Bing. Application of uncertainty modeling in 2D and 3D target detection[J]. Systems engineering and electronics, 2023, 45(8): 2370-2376.
[34] HUANG K C, WU T H, SU H T, et al. MonoDTR: monocular 3D object detection with depth-aware Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4012-4021.
[35] LI Zhiqi, WANG Wenhai, LI Hongyang, et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers[EB/OL]. (2022-03-31)[2025-04-24]. https://arxiv.org/abs/2203.17270.
[36] WANG Zeyu, LI Dingwen, LUO Chenxu, et al. DistillBEV: boosting multi-camera 3D object detection with cross-modal knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 8637-8646.
[37] XU Bin, CHEN Zhenzhong. Multi-level fusion based 3D object detection from monocular images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2345-2353.
[38] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 270-279.
[39] DING Mingyu, HUO Yuqi, YI Hongwei, et al. Learning depth-guided convolutions for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Virtual Conference: IEEE, 2020: 1000-1001.
[40] PENG Liang, WU Xiaopei, YANG Zheng, et al. DID-M3D: decoupling instance depth for monocular 3D object detection[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 71-88.
[41] RODDICK T, KENDALL A, CIPOLLA R. Orthographic feature transform for monocular 3D object detection[EB/OL]. (2018-11-20)[2025-04-24]. https://arxiv.org/abs/1811.08188
[42] LIU Yingfei, WANG Tiancai, ZHANG Xiangyu, et al. PETR: position embedding transformation for multi-view 3D object detection[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 531-548.
[43] ?BONTAR J, LECUN Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of machine learning research, 2016, 17(65): 1-32.
[44] KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 66-75.
[45] WANG Yan, CHAO Weilun, GARG D, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 8445-8453.
[46] FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2002-2011.
[47] CHANG Jiaren, CHEN Yongshen. Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5410-5418.
[48] WANG Xinlong, YIN Wei, KONG Tao, et al. Task-aware monocular depth estimation for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12257-12264.
[49] LI Chengyao, KU J, WASLANDER S L. Confidence guided stereo 3D object detection with split depth estimation[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Las Vegas: IEEE, 2020: 5776-5783.
[50] HOSSAIN S, LIN Xianke. Efficient stereo depth estimation for pseudo-LiDAR: a self-supervised approach based on multi-input ResNet encoder[J]. Sensors, 2023, 23(3): 1650.
[51] OH C, JANG Y, SHIM D, et al. Automatic pseudo-LiDAR annotation: generation of training data for 3D object detection networks[J]. IEEE access, 2024.
[52] LI Bo, ZHANG Tianlei, XIA Tian. Vehicle detection from 3D LiDAR using fully convolutional network[EB/OL]. (2016-08-29)[2025-04-24]. https://arxiv.org/abs/1608.07916
[53] YANG Bin, LUO Wenjie, URTASUN R. PIXOR: real-time 3D object detection from point clouds[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7652-7660.
[54] BELTR?N J, GUINDEL C, MORENO F M, et al. BirdNet: a 3D object detection framework from LiDAR information[C]//2018 21st International Conference on Intelligent Transportation Systems(ITSC). Miami: IEEE, 2018: 3517-3523.
[55] MEYER G P, LADDHA A, KEE E, et al. LaserNet: an efficient probabilistic 3D object detector for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12677-12686.
[56] DENG Jiajun, ZHOU Wengang, ZHANG Yanyong, et al. From multi-view to hollow-3D: hallucinated hollow-3D R-CNN for 3D object detection[J]. IEEE transactions on circuits and systems for video technology, 2021, 31(12): 4722-4734.
[57] SUN Pei, WANG Weiyue, CHAI Yuning, et al. RSN: range sparse net for efficient, accurate LiDAR 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 5725-5734.
[58] ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4490-4499.
[59] YAN Yan, MAO Yuxing, LI Bo. Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[60] LANG A H, VORA S, CAESAR H, et al. Pointpillars: fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12697-12705.
[61] LI Haisheng, LU Yanling. 3D object detection based on point cloud in automatic driving scene[J]. Multimedia tools and applications, 2024, 83(5): 13029-13044.
[62] Wang Bei, An Jianping, Cao Jiayan, et al. Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds[J]. Sensors, 2020, 20(3): 704.
[63] LIU Zhe, ZHAO Xin, HUANG Tengteng, et al. TANet: robust 3D object detection from point clouds with triple attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 11677-11684.
[64] CHEN Yukang, LIU Jianhui, ZHANG Xiangyu, et al. VoxelNeXt: fully sparse voxelnet for 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 21674-21683.
[65] ZHENG Wu, TANG Weiliang, CHEN Sijin, et al. CIA-SSD: confident IOU-aware single-stage object detector from point cloud[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3555-3562.
[66] FAN Lue, PANG Ziqi, ZHANG Tianyuan, et al. Embracing single stride 3D object detector with sparse Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8458-8468.
[67] HE Chenhang, LI Ruihuang, LI Shuai, et al. Voxel set Transformer: a set-to-set approach to 3D object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8417-8427.
[68] HE Chenhang, ZENG Hui, HUANG Jianqiang, et al. Structure aware single-stage 3D object detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11873-11882.
[69] ZHAO Tianchen, NING Xuefei, HONG Ke, et al. Ada3D: exploiting the spatial redundancy with adaptive inference for efficient 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17728-17738.
[70] QI C R, SU Hao, MO Kaichun, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Montreal: IEEE, 2017: 652-660.
[71] QI C R, YI Li, SU Hao, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017, 30.
[72] YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951-1960.
[73] 陈熙源, 戈明明, 姚志婷, 等. 雨雪天气下的激光雷达滤波算法研究[J]. 仪器仪表学报, 2023, 44(7): 172-181. CHEN Xiyuan, GE Mingming, YAO Zhiting, et al. Filtering algorithm of LiDAR in rainy and snowy weather[J]. Chinese journal of scientific instrument, 2023, 44(7): 172-181.
[74] TAO Manli, ZHAO Chaoyang, TANG Ming, et al. Objformer: boosting 3D object detection via instance-wise interaction[J]. Pattern Recognition, 2024, 146: 110061.
[75] CHEN Chen, CHEN Zhe, ZHANG Jing, et al. SASA: semantics-augmented set abstraction for point-based 3D object detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2022, 36(1): 221-229.
[76] ZHANG Yifan, HU Qingyong, XU Guoquan, et al. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18953-18962.
[77] 王理嘉, 于欢, 刘守印. 动态环境中多帧点云融合算法及三维目标检测算法研究[J]. 计算机应用研究, 2023, 40(3): 909-913. WANG Lijia, YU Huan, LIU Shouyin. Research on multi-frame point cloud fusion and 3D target detection algorithms in dynamic environments[J]. Application research of computers, 2023, 40(3): 909-913.
[78] ZHANG Gang, CHEN Junnan, GAO Guohuan, et al. HEDNet: a hierarchical encoder-decoder network for 3D object detection in point clouds[J]. Advances in neural information processing systems, 2024, 36.
[79] LI Yangyan, BU Rui, SUN Mingchao, et al. PointCNN: convolution on X-transformed points[J]. Advances in neural information processing systems, 2018, 31.
[80] YIN Tianwei, ZHOU Xingyi, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11784-11793.
[81] 涂新奎, 郑少武, 于善虎, 等. 基于对称形状生成的三维目标检测网络[J]. 仪器仪表学报, 2023, 44(6): 252-263. TU Xinkui, ZHENG Shaowu, YU Shanhu, et al. 3D target detection network based on symmetric shape generation[J]. Chinese journal of scientific instrument, 2023, 44(6): 252-263.
[82] 陶乐, 王海, 蔡英凤, 等. 面向自动驾驶场景的多目标点云检测算法[J]. 汽车工程, 2024, 46(7): 1208-1218, 1238. TAO Le, WANG Hai, CAI Yingfeng, et al. Multi-object point cloud detection algorithm for autonomous driving scenarios[J]. Automotive engineering, 2024, 46(7): 1208-1218, 1238.
[83] 周昊, 齐洪钢, 邓永强, 等. 融合点云深度信息的3D目标检测与分类[J]. 中国图象图形学报, 2024, 29(8): 2399-2412. ZHOU Hao, QI Honggang, DENG Yongqiang, et al. 3D target detection and classification using fused point cloud depth information[J]. Journal of image and graphics, 2024, 29(8): 2399-2412.
[84] SHI Weijing, RAJKUMAR R. Point-GNN: graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 1711-1719.
[85] NAJIBI M, LAI Guangda, KUNDU A, et al. DOPS: learning to detect 3D objects and predict their 3D shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11913-11922.
[86] ZHANG Yanan, HUANG Di, WANG Yunhong. PC-RGNN: point cloud completion and graph neural network for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3430-3437.
[87] LIU Zhijian, TANG Haotian, LIN Yujun, et al. Point-voxel CNN for efficient 3D deep learning[J]. Advances in neural information processing systems, 2019, 32.
[88] NOH J, LEE S, HAM B. HVPR: hybrid voxel-point representation for single-stage 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 14605-14614.
[89] SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 10529-10538.
[90] WU Peng, GU Lipeng, YAN Xuefeng, et al. PV-RCNN++: semantical point-voxel feature interaction for 3D object detection[J]. The visual computer, 2023, 39(6): 2425-2440.
[91] ZHOU Wei, ZHANG Xiaodan, HAO Xin, et al. Multi point-voxel convolution(MPVConv) for deep learning on point clouds[J]. Computers & graphics, 2023, 112: 72-80.
[92] 李虎辰, 管海燕, 雷相达, 等. 基于点–体素一致性约束的城市激光雷达点云分类[J]. 中国激光, 2024, 51(13): 251-264. LI Huchen, GUAN Haiyan, LEI Xiangda, et al. Urban LiDAR point cloud classification based on point-voxel consistency constraints[J]. Chinese journal of lasers, 2024, 51(13): 251-264.
[93] DENG Pengzhen, ZHOU Li, CHEN Jie. PVC-SSD: point-voxel dual-channel fusion with cascade point estimation for anchor-free single-stage 3D object detection[J]. IEEE sensors journal, 2024.
[94] CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1907-1915.
[95] KU Jason, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Madrid: IEEE, 2018: 1-8.
[96] QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 918-927.
[97] WANG Zhixin, JIA Kui. Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Osaka: IEEE, 2019: 1742-1749.
[98] VORA S, LANG A H, HELOU B, et al. Pointpainting: sequential fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 4604-4612.
[99] LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7345-7353.
[100] WU Xiaopei, PENG Liang, YANG Honghui, et al. Sparse fuse dense: towards high quality 3D detection with depth completion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5418-5427.
[101] 黄漫, 黄勃, 高永彬. 引入深度补全与实例分割的三维目标检测[J]. 传感器与微系统, 2021, 40(1): 129-132. HUANG Man, HUANG Bo, GAO Yongbin. 3D target detection with depth completion and instance segmentation[J]. Sensors and microsystems, 2021, 40(1): 129-132.
[102] XIE Yichen, XU Chenfeng, RAKOTOSAONA M J, et al. Sparsefusion: Fusing multi-modal sparse representations for multi-sensor 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17591-17602.
[103] ZHANG Yanan, CHEN Jiaxin, HUANG Di. CAT-Det: contrastively augmented Transformer for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 908-917.
[104] GUNN J, LENYK Z, SHARMA A, et al. Lift-Attend-Splat: bird’s-eye-view camera-LiDAR fusion using Transformers[EB/OL]. (2023-12-22)[2025-04-24]. https://arxiv.org/abs/2312.14919
[105] LIU Zhijian, TANG Haotian, AMINI A, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation[C]//2023 IEEE International Conference on Robotics and Automation(ICRA). London: IEEE, 2023: 2774-2781.
[106] WANG Ke, ZHOU Tianqiang, ZHANG Zhichuang, et al. PVF-DectNet: multi-modal 3D detection network based on perspective-voxel fusion[J]. Engineering applications of artificial intelligence, 2023, 120: 105951.
[107] LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17182-17191.
[108] 周治国, 马文浩. 一种多层多模态融合3D目标检测方法[J]. 电子学报, 2024, 52(3): 696-708. ZHOU Zhigui, MA Wenhao. A multi-layer multi-modal fusion 3D target detection method[J]. Acta electronica sinica, 2024, 52(3): 696-708.
[109] LIU Huaijin, DU Jixiang, ZHANG Yong, et al. PVConvNet: pixel-voxel sparse convolution for multimodal 3D object detection[J]. Pattern recognition, 2024, 149: 110284.
[110] 金宇锋, 陶重犇. 基于Transformer的融合信息增强3D目标检测算法[J]. 仪器仪表学报, 2023, 44(12): 297-306. JIN Yufeng, TAO Zhongben. Fusion-enhanced 3D target detection algorithm based on Transformer[J]. Chinese journal of scientific instrument, 2023, 44(12): 297-306.
[111] XIA Chenxing, LI Xubing, GAO Xiuju, et al. PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion[J]. Neural computing and applications, 2024: 1-18.
[112] 王五岳, 徐召飞, 曲春燕, 等. 基于红外与激光雷达融合的鸟瞰图空间三维目标检测算法[J]. 光子学报, 2024, 53(1): 73-84. WANG Wuyue, XU Zhaofei, QU Chunyan, et al. 3D target detection algorithm in bird’s-eye view space based on infrared and LiDAR fusion[J]. Acta photonica sinica, 2024, 53(1): 73-84.
[113] 董钰婷, 官磊. 基于自适应加权融合激光雷达和相机的三维目标检测方法[J]. 计算机应用, 2024, 44(S1): 250-255. DONG Yyuting, GUAN Lei. 3D target detection method based on adaptive weighted fusion of LiDAR and camera[J]. Computer applications, 2024, 44(S1): 250-255.
[114] 李文礼, 喻飞, 石晓辉, 等. BEV特征下激光雷达和单目相机融合的目标检测算法研究[J]. 计算机工程与应用, 2024, 60(11): 182-193. LI Wenli, YU Fei, SHI Xiaohui, et al. Target detection algorithm based on BEV features for LiDAR and monocular camera fusion[J]. Computer engineering and applications, 2024, 60(11): 182-193.
[115] NABATI R, QI Hairong. CenterFusion: center-based radar and camera fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 1527-1536.
[116] BANSAL K, RUNGTA K, BHARADIA D. RadSegNet: a reliable approach to radar camera fusion[EB/OL]. (2022-08-08)[2025-04-24]. https://arxiv.org/abs/2208.03849
[117] KIM Y, KIM S, CHOI J W, et al. CRAFT: camera-radar 3D object detection with spatio-contextual fusion Transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 1160-1168.
[118] KIM Y, SHIN J, KIM S, et al. CRN: camera radar net for accurate, robust, efficient 3D perception[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17615-17626.
[119] 车俐, 吕连辉, 蒋留兵. AF-CenterNet: 基于交叉注意力机制的毫米波雷达和相机融合的目标检测[J]. 计算机应用研究, 2024, 41(4): 1258-1263. CHE Li, LYU Lianhui, JIANG Liubing. AF-CenterNet: Cross-attention mechanism-based millimeter-wave radar and camera fusion for target detection[J]. Application research of computers, 2024, 41(4): 1258-1263.
[120] LIU Xiang, LI Zhenglin, ZHOU Yang, et al. Camera–radar fusion with modality interaction and radar Gaussian expansion for 3D object detection[J]. Cyborg and bionic systems, 2024, 5: 0079.
[121] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.
[122] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11621-11631.
[123] SUN Pei, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 2446-2454.
[124] HUANG Xinyu, WANG Peng, CHENG Xinjing, et al. The apolloscape open dataset for autonomous driving and its application[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(10): 2702-2719.
[125] CHOI Y, KIM N, HWANG S, et al. KAIST multi-spectral day/night data set for autonomous and assisted driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(03): 934-948.
[126] HOUSTON J, ZUIDHOF G, BERGAMINI L, et al. One thousand and one hours: Self-driving motion prediction dataset[C]//Conference on Robot Learning. Palo Alto: PMLR, 2021: 409-418.
[127] YU Haibao, LUO Yizhen, SHU Mao, et al. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 21361-21370.
[128] G?HLERT N, JOURDAN N, CORDTS M, et al. Cityscapes 3D: Dataset and benchmark for 9 dof vehicle detection[EB/OL]. (2020-06-14)[2025-04-24]. https://arxiv.org/abs/2006.07864
[129] WILSON B, QI W, AGARWAL T, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting[EB/OL]. (2023-01-02)[2025-04-24]. https://arxiv.org/abs/2301.00493
[130] XIAO Pengchuan, SHAO Zhenlei, HAO S, et al. PandaSet: advanced sensor suite dataset for autonomous driving[C]//2021 IEEE International Intelligent Transportation Systems Conference(ITSC). Indianapolis: IEEE, 2021: 3095-3101.
[131] PATIL A, MALLA S, GANG H, et al. The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes[C]//2019 International Conference on Robotics and Automation(ICRA). Montreal: IEEE, 2019: 9552-9557.
[132] CONG Peishan, ZHU Xinge, QIAO Feng, et al. STCrowd: a multimodal dataset for pedestrian perception in crowded scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 19608-19617.
[133] XIAO Aoran, HUANG Jiaxing, GUAN Dayan, et al. Transfer learning from synthetic to real LiDAR point cloud for semantic segmentation[EB/OL]. (2021-07-12)[2025-04-24]. https://arxiv.org/abs/2107.05399
[134] PANG Su, MORRIS D, RADHA H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE, 2020: 10386-10393.
[135] WU Hai, WEN Chenglu, SHI Shaoshuai, et al. Virtual sparse convolution for multimodal 3D object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver: IEEE, 2023: 21653-21662.
相似文献/References:
[1]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(3):366.[doi:10.11992/tis.201706040]
 GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13():366.[doi:10.11992/tis.201706040]
[2]王星,赵海良,王志刚.基于邻域系统的智能车辆最优轨迹规划方法[J].智能系统学报,2019,14(5):1040.[doi:10.11992/tis.201805004]
 WANG Xing,ZHAO Hailiang,WANG Zhigang.Optimal trajectory planning method of intelligent vehicles based on neighborhood system[J].CAAI Transactions on Intelligent Systems,2019,14():1040.[doi:10.11992/tis.201805004]
[3]张新钰,邹镇洪,李志伟,等.面向自动驾驶目标检测的深度多模态融合技术[J].智能系统学报,2020,15(4):758.[doi:10.11992/tis.202002010]
 ZHANG Xinyu,ZOU Zhenhong,LI Zhiwei,et al.Deep multi-modal fusion in object detection for autonomous driving[J].CAAI Transactions on Intelligent Systems,2020,15():758.[doi:10.11992/tis.202002010]
[4]陆军,李杨,鲁林超.远距离和遮挡下三维目标检测算法研究[J].智能系统学报,2024,19(2):259.[doi:10.11992/tis.202301001]
 LU Jun,LI Yang,LU Linchao.Long-distance and occluded 3D target detection algorithm[J].CAAI Transactions on Intelligent Systems,2024,19():259.[doi:10.11992/tis.202301001]
[5]鲁斌,孙洋,杨振宇.融合体素图注意力的三维目标检测算法[J].智能系统学报,2024,19(3):598.[doi:10.11992/tis.202209008]
 LU Bin,SUN Yang,YANG Zhenyu.3D object detection algorithm with voxel graph attention[J].CAAI Transactions on Intelligent Systems,2024,19():598.[doi:10.11992/tis.202209008]
[6]唐友名,孙冠豫,孙贵斌,等.基于城市超车工况的智能车辆避障规划方法研究[J].智能系统学报,2024,19(3):619.[doi:10.11992/tis.202209060]
 TANG Youming,SUN Guanyu,SUN Guibin,et al.Autonomous vehicle trajectory planning based on urban overtaking conditions[J].CAAI Transactions on Intelligent Systems,2024,19():619.[doi:10.11992/tis.202209060]
[7]胡丹丹,张忠婷.基于改进YOLOv5s的面向自动驾驶场景的道路目标检测算法[J].智能系统学报,2024,19(3):653.[doi:10.11992/tis.202206034]
 HU Dandan,ZHANG Zhongting.Road target detection algorithm for autonomous driving scenarios based on improved YOLOv5s[J].CAAI Transactions on Intelligent Systems,2024,19():653.[doi:10.11992/tis.202206034]
[8]陆军,鲁林超,翟晓阳,等.面向道路交通场景的高效3D目标检测[J].智能系统学报,2025,20(1):91.[doi:10.11992/tis.202311013]
 LU Jun,LU Linchao,ZHAI Xiaoyang,et al.High-efficiency 3D object detection for road traffic scenes[J].CAAI Transactions on Intelligent Systems,2025,20():91.[doi:10.11992/tis.202311013]
[9]陆军,王旭东,汲广宇,等.基于恒定转弯率和加速度模型的点云多目标跟踪算法[J].智能系统学报,2025,20(6):1328.[doi:10.11992/tis.202503034]
 LU Jun,WANG Xudong,JI Guangyu,et al.Point cloud multitarget tracking algorithm based on the constant turn rate and acceleration model[J].CAAI Transactions on Intelligent Systems,2025,20():1328.[doi:10.11992/tis.202503034]
[10]宫彦,王乃棒,张新钰,等.面向智能网联汽车的 BEV 感知技术与发展趋势[J].智能系统学报,2026,21(1):41.[doi:10.11992/tis.202505027]
 GONG Yan,WANG Naibang,ZHANG Xinyu,et al.BEV perception technologies and development trends for intelligent connected vehicles[J].CAAI Transactions on Intelligent Systems,2026,21():41.[doi:10.11992/tis.202505027]
[11]鲁斌,杨振宇,孙洋,等.基于多通道交叉注意力融合的三维目标检测算法[J].智能系统学报,2024,19(4):885.[doi:10.11992/tis.202305029]
 LU Bin,YANG Zhenyu,SUN Yang,et al.3D object detection algorithm with multi-channel cross attention fusion[J].CAAI Transactions on Intelligent Systems,2024,19():885.[doi:10.11992/tis.202305029]
[12]陆军,赵颢然,鲁林超.基于多模态融合的三维目标检测方法研究[J].智能系统学报,2025,20(5):1167.[doi:10.11992/tis.202502015]
 LU Jun,ZHAO Haoran,LU Linchao.Research on 3D object detection based on multi-modal fusion[J].CAAI Transactions on Intelligent Systems,2025,20():1167.[doi:10.11992/tis.202502015]

备注/Memo

收稿日期:2025-4-24。
基金项目:国家自然科学基金项目 (61573183) .
作者简介:吴一全,教授,主要研究方向为视觉检测与图像测量、视频处理与智能分析。主持国家自然科学基金等项目48 项。发表学术论文350 余篇。E-mail:nuaaimage@163.com。;蔡佳琦,硕士研究生,主要研究方向为计算机视觉、图像处理。E-mail:Caij-q@nuaa.edu.cn。
通讯作者:吴一全. E-mail:nuaaimage@163.com

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com