[1]张含笑,邢向磊.融合深度学习与神经隐式表征的视觉SLAM系统[J].智能系统学报,2026,21(1):120-131.[doi:10.11992/tis.202505029]
ZHANG Hanxiao,XING Xianglei.Deep-learning-enhanced visual SLAM with neural implicit scene representation[J].CAAI Transactions on Intelligent Systems,2026,21(1):120-131.[doi:10.11992/tis.202505029]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
21
期数:
2026年第1期
页码:
120-131
栏目:
学术论文—机器感知与模式识别
出版日期:
2026-03-05
- Title:
-
Deep-learning-enhanced visual SLAM with neural implicit scene representation
- 作者:
-
张含笑, 邢向磊
-
哈尔滨工程大学 智能科学与工程学院, 黑龙江 哈尔滨 150001
- Author(s):
-
ZHANG Hanxiao, XING Xianglei
-
College of Intelligent Science and Engineering, Harbin Engineering University, Harbin 150001, China
-
- 关键词:
-
神经辐射场; 视觉SLAM; 回环检测; 位姿估计; 深度学习; 三维重建; 语义嵌入; 轨迹预测
- Keywords:
-
neural radiation field; visual SLAM; loop detection; pose estimation; deep learning; 3D reconstruction; semantic embedding; trajectory prediction
- 分类号:
-
TP391.41
- DOI:
-
10.11992/tis.202505029
- 摘要:
-
近年来,神经辐射场在三维重建任务中展现出卓越性能。然而,应用在视觉同时定位与地图构建(simultaneous localization and mapping, SLAM)中因缺乏全局优化机制容易导致系统定位精度不足以及重建失败。针对该问题,本文提出一种融合深度学习位姿估计与神经隐式表征的视觉SLAM系统。通过稠密束调整层以及高效的全局优化机制对相机位姿和深度进行像素级的循环迭代,并基于神经辐射场方法更新全局一致的隐式重建表面,使得系统在精准定位的同时能够重建高保真场景,并且在此基础上引入语言查询机制,增强系统的交互能力。在EuRoC和Replica数据集上进行大量实验,在不同的输入条件下,分别与3类基准方法进行对比,结果表明该系统在跟踪鲁棒性和重建精度方面相较于现有方法表现更优。本方法可为后续基于神经辐射场的视觉SLAM方法提供参考。
- Abstract:
-
In recent years, neural radiation fields have demonstrated strong capability in high-fidelity three-dimensional scene reconstruction. However, visual simultaneous localization and mapping(SLAM) systems that employ neural radiance fields still face challenges in localization accuracy and the flexibility of explicit scene representation. To address these limitations, this work proposes a visual SLAM system that integrates deep-learning-based pose estimation with neural implicit scene representation. Through dense bundle adjustment layers and efficient global optimization mechanisms, the camera pose and depth are iteratively optimized at the pixel level, and a globally consistent implicit reconstruction surface is incrementally updated based on neural radiation fields, enabling the system to reconstruct high-fidelity scenes while achieving accurate localization. Furthermore, a language query mechanism was introduced to enhance the system’s interactive capability. Extensive experiments were conducted on the EuRoC and Replica datasets, and the results were compared with those of three benchmark methods under different input conditions. The results showed that the proposed system outperformed existing methods in terms of tracking robustness and reconstruction accuracy, providing a reference for subsequent visual SLAM methods based on neural radiation fields.
备注/Memo
收稿日期:2025-5-28。
基金项目:国家自然科学基金项目(62076078, 61703119);中央高校基本科研业务费项目(3072024LJ0403).
作者简介:张含笑,硕士,主要研究方向为计算机视觉。E-mail:2682706067@qq.com。;邢向磊,教授,博士生导师,主要研究方向为模式识别与计算机视觉。获得黑龙江省高校科学技术奖(自然科学类)一等奖,获《智能系统学报》优秀论文奖。发表学术论文 60 余篇。E-mail:xingxl@hrbeu.edu.cn。
通讯作者:邢向磊. E-mail:xingxl@hrbeu.edu.cn
更新日期/Last Update:
2026-01-05