[1]陆军,赵颢然,鲁林超.基于多模态融合的三维目标检测方法研究[J].智能系统学报,2025,20(5):1167-1177.[doi:10.11992/tis.202502015]
 LU Jun,ZHAO Haoran,LU Linchao.Research on 3D object detection based on multi-modal fusion[J].CAAI Transactions on Intelligent Systems,2025,20(5):1167-1177.[doi:10.11992/tis.202502015]
点击复制

基于多模态融合的三维目标检测方法研究

参考文献/References:
[1] 张耀丹. 无人驾驶汽车的现状及发展趋势[J]. 汽车实用技术, 2018, 43(6): 10, 15.
ZHANG Yaodan. The current situation and tendency of driverless cars[J]. Automobile applied technology, 2018, 43(6): 10, 15.
[2] 王世峰, 戴祥, 徐宁, 等. 无人驾驶汽车环境感知技术综述[J]. 长春理工大学学报(自然科学版), 2017, 40(1): 1-6.
WANG Shifeng, DAI Xiang, XU Ning, et al. Overview on environment perception technology for unmanned ground vehicle[J]. Journal of Changchun University of Science and Technology (natural science edition), 2017, 40(1): 1-6.
[3] JANA P, MOHANTA P P. Recent trends in 2D object detection and applications in video event recognition[EB/OL]. (2022-02-07)[2025-02-26]. https://arxiv.org/abs/2202.03206.
[4] PRAVALLIKA A, HASHMI M F, GUPTA A. Deep learning frontiers in 3D object detection: a comprehensive review for autonomous driving[J]. IEEE access, 2024, 12: 173936-173980.
[5] ZHU Minling, GONG Yadong, TIAN Chunwei, et al. A systematic survey of transformer-based 3D object detection for autonomous driving: methods, challenges and trends[J]. Drones, 2024, 8(8): 412.
[6] TANG Yingjuan, HE Hongwen, WANG Yong, et al. Multi-modality 3D object detection in autonomous driving: a review[J]. Neurocomputing, 2023, 553: 126587.
[7] VORA S, LANG A H, HELOU B, et al. PointPainting: sequential fusion for 3D object detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 4604-4612.
[8] WANG Chunwei, MA Chao, ZHU Ming, et al. PointAugmenting: cross-modal augmentation for 3D object detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11794-11803.
[9] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[10] XU Shaoqing, ZHOU Dingfu, FANG Jin, et al. FusionPainting: multimodal fusion with adaptive attention for 3D object detection[C]//2021 IEEE International Intelligent Transportation Systems Conference. Indianapolis: IEEE, 2021: 3047-3054.
[11] BAI Xuyang, HU Zeyu, ZHU Xinge, et al. TransFusion: robust LiDAR-camera fusion for 3D object detection with transformers[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 1080-1089.
[12] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[13] LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: LiDAR-camera deep fusion for multi-modal 3D object detection[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17161-17170.
[14] LIANG Tingting, XIE Hongwei, YU Kaicheng, et al. BEVFusion: a simple and robust LiDAR-camera fusion framework[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022: 10421-10434.
[15] HU Haotian, WANG Fanyi, SU Jingwen, et al. EA-BEV: edge-aware bird’s-eye-view projector for 3D object detection[EB/OL]. (2023-03-31)[2025-02-26]. https://arxiv.org/abs/2303.17895.
[16] YAN Junjie, LIU Yingfei, SUN Jianjian, et al. Cross modal transformer via coordinates encoding for 3D object dectection[EB/OL]. (2023-01-03)[2025-02-26]. https://arxiv.org/abs/2301.01283.
[17] WANG Haiyang, TANG Hao, SHI Shaoshuai, et al. UniTR: a unified and efficient multi-modal transformer for bird’s-eye-view representation[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 6792-6802.
[18] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11621-11631.
[19] LEE W, KIM H, AHN J. Defect-free atomic array formation using the Hungarian matching algorithm[J]. Physical review A, 2017, 95(5): 053424.
[20] TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al. MLP-Mixer: an all-MLP architecture for vision[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 24261-24272.
[21] EGGERT S, KLIEMANN L, SRIVASTAV A. Bipartite graph matchings in the semi-streaming model[C]//Algorithms-ESA 2009. Berlin: Springer Berlin Heidelberg, 2009: 492-503.
[22] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[23] CONTRIBUTORS M. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection[EB/OL]. (2019-06-17)[2025-02-26]. https://arxiv.org/abs/1906.07155.
[24] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[C]//International Conference on Learning Representations. Singapore: OpenReview.net, 2025: 1-18.
[25] SMITH L N, TOPIN N. Super-convergence: very fast training of neural networks using large learning rates[C]//Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. Baltimore: SPIE, 2019: 369-386.
[26] LANG A H, VORA S, CAESAR H, et al. PointPillars: fast encoders for object detection from point clouds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12689-12697.
[27] LI Yanwei, CHEN Yilun, QI Xiaojuan, et al. Unifying voxel-based representation with transformer for 3D object detection[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022: 18442-18455.
[28] YIN Tianwei, ZHOU Xingyi, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11779-11788.
[29] CHEN Yukang, LIU Jianhui, ZHANG Xiangyu, et al. VoxelNeXt: fully sparse VoxelNet for 3D object detection and tracking[C]//2023 IEEE/CVF conference on computer vision and pattern recognition. Vancouver: IEEE, 2023: 21674-21683.
[30] YOO J H, KIM Y, KIM J, et al. 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection[C]//Computer Vision–ECCV 2020. Cham: Springer International Publishing, 2020: 720-736.
[31] YIN Tianwei, ZHOU Xingyi, KR?HENBüHL P. Multimodal virtual point 3D detection[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 16494-16507.
[32] CHEN Zehui, LI Zhenyu, ZHANG Shiquan, et al. Deformable feature aggregation for dynamic multi-modal 3D object detection[C]//Computer Vision–ECCV 2022. Cham: Springer Nature Switzerland, 2022: 628-644.
[33] HUANG Tengteng, LIU Zhe, CHEN Xiwu, et al. EPNet: enhancing point features with image semantics for 3D object detection[C]//Computer Vision–ECCV 2020. Cham: Springer International Publishing, 2020: 35-52.
相似文献/References:
[1]温晓红,刘华平,阎高伟,等.基于超限学习机的非线性典型相关分析及应用[J].智能系统学报,2018,13(4):633.[doi:10.11992/tis.201703034]
 WEN Xiaohong,LIU Huaping,YAN Gaowei,et al.Nonlinear canonical correlation analysis and application based on extreme learning machine[J].CAAI Transactions on Intelligent Systems,2018,13():633.[doi:10.11992/tis.201703034]
[2]贾晨,刘华平,续欣莹,等.基于宽度学习方法的多模态信息融合[J].智能系统学报,2019,14(1):150.[doi:10.11992/tis.201803022]
 JIA Chen,LIU Huaping,XU Xinying,et al.Multi-modal information fusion based on broad learning method[J].CAAI Transactions on Intelligent Systems,2019,14():150.[doi:10.11992/tis.201803022]
[3]王召新,续欣莹,刘华平,等.基于级联宽度学习的多模态材质识别[J].智能系统学报,2020,15(4):787.[doi:10.11992/tis.201908021]
 WANG Zhaoxin,XU Xinying,LIU Huaping,et al.Cascade broad learning for multi-modal material recognition[J].CAAI Transactions on Intelligent Systems,2020,15():787.[doi:10.11992/tis.201908021]
[4]赵小明,唐志伟,张石清.面向听视觉信息的多模态人格识别研究进展[J].智能系统学报,2021,16(2):189.[doi:10.11992/tis.202101034]
 ZHAO Xiaoming,TANG Zhiwei,ZHANG Shiqing.Research advance of multimodal personality recognition based on audio and visual cues[J].CAAI Transactions on Intelligent Systems,2021,16():189.[doi:10.11992/tis.202101034]
[5]鲁斌,孙洋,杨振宇.融合体素图注意力的三维目标检测算法[J].智能系统学报,2024,19(3):598.[doi:10.11992/tis.202209008]
 LU Bin,SUN Yang,YANG Zhenyu.3D object detection algorithm with voxel graph attention[J].CAAI Transactions on Intelligent Systems,2024,19():598.[doi:10.11992/tis.202209008]
[6]鲁斌,杨振宇,孙洋,等.基于多通道交叉注意力融合的三维目标检测算法[J].智能系统学报,2024,19(4):885.[doi:10.11992/tis.202305029]
 LU Bin,YANG Zhenyu,SUN Yang,et al.3D object detection algorithm with multi-channel cross attention fusion[J].CAAI Transactions on Intelligent Systems,2024,19():885.[doi:10.11992/tis.202305029]
[7]潘在宇,徐家梦,王军,等.基于模态信息度评估策略的掌纹掌静脉特征识别方法[J].智能系统学报,2024,19(5):1136.[doi:10.11992/tis.202310002]
 PAN Zaiyu,XU Jiameng,WANG Jun,et al.Palmprint and palm vein recognition method based on modal information evaluation strategy[J].CAAI Transactions on Intelligent Systems,2024,19():1136.[doi:10.11992/tis.202310002]
[8]黄志鸿,杜瑞,张辉.面向复杂电力环境场景理解的可见光和红外图像特征级融合方法[J].智能系统学报,2025,20(3):631.[doi:10.11992/tis.202404014]
 HUANG Zhihong,DU Rui,ZHANG Hui.Feature-level fusion method of visible and infrared images for scene understanding in complex power environments[J].CAAI Transactions on Intelligent Systems,2025,20():631.[doi:10.11992/tis.202404014]
[9]仲兆满,樊继冬,张渝,等.基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型[J].智能系统学报,2025,20(4):999.[doi:10.11992/tis.202409012]
 ZHONG Zhaoman,FAN Jidong,ZHANG Yu,et al.Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating[J].CAAI Transactions on Intelligent Systems,2025,20():999.[doi:10.11992/tis.202409012]

备注/Memo

收稿日期:2025-2-26。
基金项目:黑龙江省自然科学基金项目(F201123).
作者简介:陆军,教授,博士生导师,博士,主要研究方向为计算机视觉、机器感知和机械臂控制。科技部科技型中小企业创新基金项目评审专家,国家自然科学基金同行评议专家。发表学术论文80余篇,出版著作5部。E-mail:lujun0260@sina.com。;赵颢然,硕士研究生,主要研究方向为三维目标检测、计算机视觉。E-mail:1793961894@qq.com。;鲁林超,硕士,主要研究方向为三维目标检测、计算机视觉。E-mail: llczsr@163.com。
通讯作者:陆军. E-mail:lujun0260@sina.com

更新日期/Last Update: 2025-09-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com