[1]ZHANG Hanxiao,XING Xianglei.Deep-learning-enhanced visual SLAM with neural implicit scene representation[J].CAAI Transactions on Intelligent Systems,2026,21(1):120-131.[doi:10.11992/tis.202505029]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
21
Number of periods:
2026 1
Page number:
120-131
Column:
学术论文—机器感知与模式识别
Public date:
2026-03-05
- Title:
-
Deep-learning-enhanced visual SLAM with neural implicit scene representation
- Author(s):
-
ZHANG Hanxiao; XING Xianglei
-
College of Intelligent Science and Engineering, Harbin Engineering University, Harbin 150001, China
-
- Keywords:
-
neural radiation field; visual SLAM; loop detection; pose estimation; deep learning; 3D reconstruction; semantic embedding; trajectory prediction
- CLC:
-
TP391.41
- DOI:
-
10.11992/tis.202505029
- Abstract:
-
In recent years, neural radiation fields have demonstrated strong capability in high-fidelity three-dimensional scene reconstruction. However, visual simultaneous localization and mapping(SLAM) systems that employ neural radiance fields still face challenges in localization accuracy and the flexibility of explicit scene representation. To address these limitations, this work proposes a visual SLAM system that integrates deep-learning-based pose estimation with neural implicit scene representation. Through dense bundle adjustment layers and efficient global optimization mechanisms, the camera pose and depth are iteratively optimized at the pixel level, and a globally consistent implicit reconstruction surface is incrementally updated based on neural radiation fields, enabling the system to reconstruct high-fidelity scenes while achieving accurate localization. Furthermore, a language query mechanism was introduced to enhance the system’s interactive capability. Extensive experiments were conducted on the EuRoC and Replica datasets, and the results were compared with those of three benchmark methods under different input conditions. The results showed that the proposed system outperformed existing methods in terms of tracking robustness and reconstruction accuracy, providing a reference for subsequent visual SLAM methods based on neural radiation fields.