<-上一篇/Previous Article 下一篇/Next Article->

[1]朱少凯,孟庆浩,金晟,等.基于深度强化学习的室内视觉局部路径规划[J].智能系统学报,2022,17(5):908-918.[doi:10.11992/tis.202107059]
　ZHU Shaokai,MENG Qinghao,JIN Sheng,et al.Indoor visual local path planning based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17(5):908-918.[doi:10.11992/tis.202107059]

点击复制

基于深度强化学习的室内视觉局部路径规划

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第5期页码: 908-918 栏目: 学术论文—机器学习出版日期: 2022-09-05

Title:: Indoor visual local path planning based on deep reinforcement learning

作者:: 朱少凯, 孟庆浩, 金晟, 戴旭阳; 天津大学电气自动化与信息工程学院机器人与自主系统研究所，天津 300072

Author(s):: ZHU Shaokai, MENG Qinghao, JIN Sheng, DAI Xuyang; Institute of Robotics and Autonomous Systems, School of Electrical and Information Engineering, Tianjin University, 300072, China

关键词:: 视觉导航; 深度学习; 强化学习; 局部路径规划; 避障; 视觉SLAM; 近端策略优化; 移动机器人

Keywords:: visual navigation; deep learning; reinforcement learning; local path planning; obstacle avoidance; visual SLAM; proximal policy optimization (PPO); mobile robot

分类号:: TP391

DOI:: 10.11992/tis.202107059

文献标志码:: 2022-06-24

摘要:: 传统的机器人局部路径规划方法多为已有先验地图的情况设计，导致其在与视觉(simultaneous localization and mapping, SLAM)结合的导航中效果不佳。为此传统的机器人局部路径规划方法多为已有先验地图的情况设计，导致其在与视觉SLAM结合的导航中效果不佳。为此，本文提出一种基于深度强化学习的视觉局部路径规划策略。首先，基于视觉同时定位与建图(SLAM)技术建立周围环境的栅格地图，并使用A*算法规划全局路径；其次，综合考虑避障、机器人行走效率、位姿跟踪等问题，构建基于深度强化学习的局部路径规划策略，设计以前进、左转、右转为基本元素的离散动作空间，以及基于彩色图、深度图、特征点图等视觉观测的状态空间，利用近端策略优化(proximal policy optimization, PPO)算法学习和探索最佳状态动作映射网络。Habitat仿真平台运行结果表明，所提出的局部路径规划策略能够在实时创建的地图上规划出一条最优或次优路径。相比于传统的局部路径规划算法，平均成功率提高了53.9%，位姿跟踪丢失率减小了66.5%，碰撞率减小了30.1%。

Abstract:: Traditional robot local path planning methods are mostly designed for situations with prior maps, thus leading to poor results in navigation when combined with visual simultaneous localization and mapping (SLAM). Therefore, this paper proposes a visual local path planning strategy based on deep reinforcement learning. First, a grid map of the surrounding environment is built based on the visual SLAM technology, and the global path is planned using the A* algorithm. Second, considering the problems of obstacle avoidance, robot walking efficiency, and pose tracking, a local path planning strategy is constructed based on deep reinforcement learning to design the discrete action space with forward movement, left turn, and right turn as the basic elements, as well as the state space based on visual observation maps, such as color, depth, and feature point maps. The proximal policy optimization (PPO) algorithm is used to learn and explore the best state–action mapping network. The running results of the habitat simulation platform show that the proposed local path planning strategy can design an optimal or sub-optimal path on a map generated in real time. Compared with traditional local path planning algorithms, the average success rate of the proposed strategy is increased by 53.9%, and the average tracking failure rate and collision rate are reduced by 66.5% and 30.1%, respectively.

参考文献/References:: [1] PANDEY A. Mobile robot navigation and obstacle avoidance techniques: a review[J]. International robotics & automation journal, 2017, 2(3): 96–105.
[2] YASUDA Y D V, MARTINS L E G, CAPPABIANCO F A M. Autonomous visual navigation for mobile robots: a systematic literature review[J]. ACM computing surveys, 2021, 53(1): 13.
[3] FANG Baofu, MEI Gaofei, YUAN Xiaohui, et al. Visual SLAM for robot navigation in healthcare facility[J]. Pattern recognition, 2021, 113: 107822.
[4] YANG Shaowu, SCHERER S A, YI Xiaodong, et al. Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles[J]. Robotics and autonomous systems, 2017, 93: 116–134.
[5] 张瑜, 宋荆洲, 张琪祁. 基于改进动态窗口法的户外清扫机器人局部路径规划[J]. 机器人, 2020, 42(5): 617–625
ZHANG Yu, SONG Jingzhou, ZHANG Qiqi. Local path planning of outdoor cleaning robot based on an improved DWA[J]. Robot, 2020, 42(5): 617–625
[6] 王殿君. 基于改进A*算法的室内移动机器人路径规划[J]. 清华大学学报(自然科学版), 2012, 52(8): 1085–1089
WANG Dianjun. Indoor mobile-robot path planning based on an improved A* algorithm[J]. Journal of tsinghua university (science and technology edition), 2012, 52(8): 1085–1089
[7] 张飞, 白伟, 乔耀华, 等. 基于改进D*算法的无人机室内路径规划[J]. 智能系统学报, 2019, 14(4): 662–669
ZHANG Fei, BAI Wei, QIAO Yaohua, et al. UAV indoor path planning based on improved D* algorithm[J]. CAAI transactions on intelligent systems, 2019, 14(4): 662–669
[8] FOX D, BURGARD W, THRUN S. The dynamic window approach to collision avoidance[J]. IEEE robotics & automation magazine, 1997, 4(1): 23–33.
[9] ROESMANN C, FEITEN W, WOESCH T, et al. Trajectory modification considering dynamic constraints of autonomous robots[C]//ROBOTIK 2012; 7th German Conference on Robotics. Munich, VDE, 2012: 1?6.
[10] R?smann C. Time-optimal nonlinear model predictive control[D]. Dissertation, Technische Universit?t Dortmund, 2019.
[11] CHEN Chunlin, LI Hanxiong, DONG Daoyi. Hybrid control for robot navigation-A hierarchical Q-learning algorithm[J]. IEEE robotics & automation magazine, 2008, 15(2): 37–47.
[12] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C] // Proceedings of the AAAI Conference on Artificial Intelligence, Arizona, USA 2016: 2094?2100.
[13] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 65–70
ZHANG Fuhai, LI Ning, YUAN Rupeng, et al. Robot path planning algorithm based on reinforcement learning[J]. Journal of Huazhong university of science and technology (natural science edition), 2018, 46(12): 65–70
[14] GULDENRING R, G?RNER M, HENDRICH N, et al. Learning local planners for human-aware navigation in indoor environments[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, IEEE, 2021: 6053?6060.
[15] BALAKRISHNAN K, CHAKRAVARTY P, SHRIVASTAVA S. An A* curriculum approach to reinforcement learning for RGBD indoor robot navigation[EB/OL]. (2021?01?01)[2021?12?12]. https://arxiv. org/abs/2101.01774.
[16] CHAPLOT D S, GANDHI D, GUPTA S, et al. Learning to explore using active neural slam[C]// 2020 International Conference on Learning Representations (ICLR), Addis Ababa, 2020.
[17] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017?08?28)[2020?12?12]. https://arxiv.org/abs/1707.06347.
[18] SAVVA M, KADIAN A, MAKSYMETS O, et al. Habitat: a platform for embodied AI research[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, IEEE, 2019: 9338?9346.
[19] 林志林, 张国良, 王蜂, 等. 一种基于VSLAM的室内导航地图制备方法[J]. 电光与控制, 2018, 25(1): 98–103
LIN Zhilin, ZHANG Guoliang, WANG Feng, et al. A method for indoor navigation mapping based on VSLAM[J]. Electronics optics & control, 2018, 25(1): 98–103
[20] 马跃龙, 曹雪峰, 万刚, 等. 一种基于深度相机的机器人室内导航点云地图生成方法[J]. 测绘工程, 2018, 27(3): 6–10,15
MA Yuelong, CAO Xuefeng, WAN Gang, et al. A method of generating point cloud maps for indoor auto-navigation of robots based on depth camera[J]. Engineering of surveying and mapping, 2018, 27(3): 6–10,15
[21] 张毅, 陈起, 罗元. 室内环境下移动机器人三维视觉SLAM[J]. 智能系统学报, 2015, 10(4): 615–619
ZHANG Yi, CHEN Qi, LUO Yuan. Three dimensional visual SLAM for mobile robots in indoor environments[J]. CAAI transactions on intelligent systems, 2015, 10(4): 615–619
[22] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//2011 International Conference on Computer Vision. Barcelona, IEEE, 2011: 2564?2571.
[23] MUR-ARTAL R, TARDóS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE transactions on robotics, 2017, 33(5): 1255–1262.
[24] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International Conference on Machine Learning(PMLR), Lille, 2015: 1889?1897.
[25] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, IEEE, 2016: 770?778.
[26] XIA Fei, ZAMIR A R, HE Zhiyang, et al. Gibson env: real-world perception for embodied agents[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, IEEE, 2018: 9068?9079.
[27] MISHKIN D, DOSOVITSKIY A, KOLTUN V. Benchmarking classic and learned navigation in complex 3d environments[EB/OL]. (2019?03?28)[2021?04?05]. https://arxiv.org/abs/1901.10915.

相似文献/References:: [1]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
　ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[2]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
　DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[3]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
　MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[4]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
　LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[5]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[6]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
　WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[7]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
　HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11():835.[doi:10.11992/tis.201605022]
[8]宋婉茹,赵晴晴,陈昌红,等.行人重识别研究综述[J].智能系统学报,2017,12(6):770.[doi:10.11992/tis.201706084]
　SONG Wanru,ZHAO Qingqing,CHEN Changhong,et al.Survey on pedestrian re-identification research[J].CAAI Transactions on Intelligent Systems,2017,12():770.[doi:10.11992/tis.201706084]
[9]杨梦铎,栾咏红,刘文军,等.基于自编码器的特征迁移算法[J].智能系统学报,2017,12(6):894.[doi:10.11992/tis.201706037]
　YANG Mengduo,LUAN Yonghong,LIU Wenjun,et al.Feature transfer algorithm based on an auto-encoder[J].CAAI Transactions on Intelligent Systems,2017,12():894.[doi:10.11992/tis.201706037]
[10]王科俊,赵彦东,邢向磊.深度学习在无人驾驶汽车领域应用的研究进展[J].智能系统学报,2018,13(1):55.[doi:10.11992/tis.201609029]
　WANG Kejun,ZHAO Yandong,XING Xianglei.Deep learning in driverless vehicles[J].CAAI Transactions on Intelligent Systems,2018,13():55.[doi:10.11992/tis.201609029]

备注/Memo

收稿日期:2021-07-27。
基金项目:中国博士后科学基金项目(2021M692390)；天津市自然科学基金项目(20JCZDJC00150, 20JCYBJC00320).
作者简介:朱少凯，硕士研究生，主要研究方向为基于视觉的同时定位与建图、机器人视觉导航;孟庆浩，教授，博士生导师，主要研究方向为机器人感知、导航与控制。完成科研项目10余项。发表学术论文百余篇;金晟，博士研究生，主要研究方向为机器人视觉导航、深度强化学习
通讯作者:金晟. E-mail：shengjin@tju.edu.cn

更新日期/Last Update: 1900-01-01

基于深度强化学习的室内视觉局部路径规划 PDF下载HTML

备注/Memo

基于深度强化学习的室内视觉局部路径规划

PDF下载 HTML