[1]LU Jun,ZHAO Haoran,LU Linchao.Research on 3D object detection based on multi-modal fusion[J].CAAI Transactions on Intelligent Systems,2025,20(5):1167-1177.[doi:10.11992/tis.202502015]
Copy

Research on 3D object detection based on multi-modal fusion

References:
[1] 张耀丹. 无人驾驶汽车的现状及发展趋势[J]. 汽车实用技术, 2018, 43(6): 10, 15.
ZHANG Yaodan. The current situation and tendency of driverless cars[J]. Automobile applied technology, 2018, 43(6): 10, 15.
[2] 王世峰, 戴祥, 徐宁, 等. 无人驾驶汽车环境感知技术综述[J]. 长春理工大学学报(自然科学版), 2017, 40(1): 1-6.
WANG Shifeng, DAI Xiang, XU Ning, et al. Overview on environment perception technology for unmanned ground vehicle[J]. Journal of Changchun University of Science and Technology (natural science edition), 2017, 40(1): 1-6.
[3] JANA P, MOHANTA P P. Recent trends in 2D object detection and applications in video event recognition[EB/OL]. (2022-02-07)[2025-02-26]. https://arxiv.org/abs/2202.03206.
[4] PRAVALLIKA A, HASHMI M F, GUPTA A. Deep learning frontiers in 3D object detection: a comprehensive review for autonomous driving[J]. IEEE access, 2024, 12: 173936-173980.
[5] ZHU Minling, GONG Yadong, TIAN Chunwei, et al. A systematic survey of transformer-based 3D object detection for autonomous driving: methods, challenges and trends[J]. Drones, 2024, 8(8): 412.
[6] TANG Yingjuan, HE Hongwen, WANG Yong, et al. Multi-modality 3D object detection in autonomous driving: a review[J]. Neurocomputing, 2023, 553: 126587.
[7] VORA S, LANG A H, HELOU B, et al. PointPainting: sequential fusion for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4604-4612.
[8] WANG Chunwei, MA Chao, ZHU Ming, et al. PointAugmenting: cross-modal augmentation for 3D object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11794-11803.
[9] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[10] XU Shaoqing, ZHOU Dingfu, FANG Jin, et al. FusionPainting: multimodal fusion with adaptive attention for 3D object detection[C]//2021 IEEE International Intelligent Transportation Systems Conference. Indianapolis: IEEE, 2021: 3047-3054.
[11] BAI Xuyang, HU Zeyu, ZHU Xinge, et al. TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 1080-1089.
[12] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[13] LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: LiDAR-camera deep fusion for multi-modal 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17161-17170.
[14] LIANG Tingting, XIE Hongwei, YU Kaicheng, et al. BEVFusion: a simple and robust LiDAR-camera fusion framework[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022: 10421-10434.
[15] HU Haotian, WANG Fanyi, SU Jingwen, et al. EA-BEV: edge-aware bird’s-eye-view projector for 3D object detection[EB/OL]. (2023-03-31)[2025-02-26]. https://arxiv.org/abs/2303.17895.
[16] YAN Junjie, LIU Yingfei, SUN Jianjian, et al. Cross modal transformer via coordinates encoding for 3D object dectection[EB/OL]. (2023-01-03)[2025-02-26]. https://arxiv.org/abs/2301.01283.
[17] WANG Haiyang, TANG Hao, SHI Shaoshuai, et al. UniTR: a unified and efficient multi-modal transformer for bird’s-eye-view representation[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 6792-6802.
[18] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11621-11631.
[19] LEE W, KIM H, AHN J. Defect-free atomic array formation using the Hungarian matching algorithm[J]. Physical review A, 2017, 95(5): 053424.
[20] TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al. MLP-Mixer: an all-MLP architecture for vision[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 24261-24272.
[21] EGGERT S, KLIEMANN L, SRIVASTAV A. Bipartite graph matchings in the semi-streaming model[C]//Algorithms-ESA 2009. Berlin: Springer Berlin Heidelberg, 2009: 492-503.
[22] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[23] CONTRIBUTORS M. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection[EB/OL]. (2019-06-17)[2025-02-26]. https://arxiv.org/abs/1906.07155.
[24] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[C]//International Conference on Learning Representations. Singapore: OpenReview.net, 2025: 1-18.
[25] SMITH L N, TOPIN N. Super-convergence: very fast training of neural networks using large learning rates[C]//Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. Baltimore: SPIE, 2019: 369-386.
[26] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12689-12697.
[27] LI Yanwei, CHEN Yilun, QI Xiaojuan, et al. Unifying voxel-based representation with transformer for 3D object detection[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022: 18442-18455.
[28] YIN Tianwei, ZHOU Xingyi, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11779-11788.
[29] CHEN Yukang, LIU Jianhui, ZHANG Xiangyu, et al. VoxelNeXt: fully sparse VoxelNet for 3D object detection and tracking[C]//2023 IEEE/CVF conference on computer vision and pattern recognition. Vancouver: IEEE, 2023: 21674-21683.
[30] YOO J H, KIM Y, KIM J, et al. 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection[C]//Computer Vision–ECCV 2020. Cham: Springer International Publishing, 2020: 720-736.
[31] YIN Tianwei, ZHOU Xingyi, KR?HENBüHL P. Multimodal virtual point 3D detection[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 16494-16507.
[32] CHEN Zehui, LI Zhenyu, ZHANG Shiquan, et al. Deformable feature aggregation for dynamic multi-modal 3D object detection[C]//Computer Vision–ECCV 2022. Cham: Springer Nature Switzerland, 2022: 628-644.
[33] HUANG Tengteng, LIU Zhe, CHEN Xiwu, et al. EPNet: enhancing point features with image semantics for 3D object detection[C]//Computer Vision–ECCV 2020. Cham: Springer International Publishing, 2020: 35-52.
Similar References:

Memo

-

Last Update: 2025-09-05

Copyright © CAAI Transactions on Intelligent Systems