<-上一篇/Previous Article 下一篇/Next Article->

[1]欧阳勇平,魏长赟,蔡帛良.动态环境下分布式异构多机器人避障方法研究[J].智能系统学报,2022,17(4):752-763.[doi:10.11992/tis.202106044]
　OUYANG Yongping,WEI Changyun,CAI Boliang.Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J].CAAI Transactions on Intelligent Systems,2022,17(4):752-763.[doi:10.11992/tis.202106044]

点击复制

动态环境下分布式异构多机器人避障方法研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第4期页码: 752-763 栏目: 学术论文—智能系统出版日期: 2022-07-05

Title:: Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments

作者:: 欧阳勇平¹, 魏长赟¹, 蔡帛良^1,2; 1. 河海大学机电工程学院，江苏常州 213022;
2. 英国卡迪夫大学工学院，威尔士卡迪夫 CF10 3A

Author(s):: OUYANG Yongping¹, WEI Changyun¹, CAI Boliang^1,2; 1. College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China;
2. School of Engineering, Cardiff University, Cardiff CF10 3AT, UK

关键词:: 异构多机器人; 深度强化学习; 非结构环境; 多特征策略梯度; 动态避障; 自学习; 分布式控制; 控制策略

Keywords:: heterogeneous multi-robot systems; deep reinforcement learning; non-structural environment; multi-feature policy gradients; dynamic collision avoidance; self-learning; distributed control; control policy

分类号:: TP273+.2

DOI:: 10.11992/tis.202106044

摘要:: 多机器人系统在联合搜救、智慧车间、智能交通等领域得到了日益广泛的应用。目前，多个机器人之间、机器人与动态环境之间的路径规划和导航避障仍需依赖精确的环境地图，给多机器人系统在非结构环境下的协调与协作带来了挑战。针对上述问题，本文提出了不依赖精确地图的分布式异构多机器人导航避障方法，建立了基于深度强化学习的多特征策略梯度优化算法，并考虑了人机协同环境下的社会范式，使分布式机器人能够通过与环境的试错交互，学习最优的导航避障策略；并在Gazebo仿真环境下进行了最优策略的训练学习，同时将模型移植到多个异构实体机器人上，将机器人控制信号解码，进行真实环境测试。实验结果表明：本文提出的多特征策略梯度优化算法能够通过自学习获得最优的导航避障策略，为分布式异构多机器人在动态环境下的应用提供了一种技术参考。

Abstract:: Multirobot systems have been widely used in cooperative search and rescue missions, intelligent warehouses, intelligent transportation, and other fields. At present, the path planning and collision avoidance problems between multiple robots and the dynamic environment still rely on accurate maps, which brings challenges to the coordination and cooperation of multirobot systems in unstructured environments. To address the above problem, this paper presents a navigation and collision avoidance approach that does not require accurate maps and is based on the deep reinforcement learning framework. A multifeatured policy gradients algorithm is proposed in this work, and social norms are also integrated so that the learning agent can obtain the optimal control policy via trial-and-error interactions with the environment. The optimal policy is trained and obtained in the Gazebo environment, and afterward, the optimal policy is transferred to several heterogeneous real robots by decoding the control signals. The experimental results show that the multifeature policy gradients algorithm proposed can obtain the optimal navigation collision avoidance policy through self-learning, and it provides a technical reference for the application of distributed heterogeneous multirobot systems in dynamic environments.

参考文献/References:: [1] SHI Huiyuan, SU Chengli, CAO Jiangtao, et al. Nonlinear adaptive predictive functional control based on the Takagi-sugeno model for average cracking outlet temperature of the ethylene cracking furnace[J]. Industrial & engineering chemistry research, 2015, 54(6): 1849–1860.
[2] MELLINGER D, KUSHLEYEV A, KUMAR V. Mixed-integer quadratic program trajectory generation for heterogeneous quadrotor teams[C]//2012 IEEE International Conference on Robotics and Automation. Saint Paul: IEEE, 2012: 477?483.
[3] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[M]//Autonomous robot vehicles. New York: Springer New York, 1986: 396?404.
[4] ZHANG Pengpeng, WEI Changyun, CAI Boliang, et al. Mapless navigation for autonomous robots: a deep reinforcement learning approach[C]//2019 Chinese Automation Congress. Hangzhou: IEEE, 2019: 3141?3146.
[5] CHEN Yufan, LIU Miao, EVERETT M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017: 285?292.
[6] TAI Lei, PAOLO G, LIU Ming. Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver: IEEE, 2017: 31?36.
[7] MINSKY M. Theory of neural-analog reinforcement systems and its application to the brain-model problem[M]. New Jersey: Princeton University, 1954.
[8] BELLMAN R. Dynamic programming[J]. Science, 1966, 153(3731): 34–37.
[9] FAN Tingxiang, LONG Pinxin, LIU Wenxi, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios[J]. The international journal of robotics research, 2020, 39(7): 856–892.
[10] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients[EB/OL]. New York: arXiv, 2018: (2018?04?23)[2021?06?25].https://arxiv.org/abs/1804.08617.
[11] NA S, NIU Hanlin, LENNOX B, et al. Universal artificial pheromone framework with deep reinforcement learning for robotic systems[C]//2021 6th International Conference on Control and Robotics Engineering. Beijing: IEEE, 2021: 28?32.
[12] HUANG Liang, BI Suzhi, ZHANG Y J A. Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks[J]. IEEE transactions on mobile computing, 2020, 19(11): 2581–2593.
[13] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International conference on machine learning. New York: PMLR, 2015: 1889?1897.
[14] WANG Yuhui, HE Hao, TAN Xiaoyang. Truly proximal policy optimization[C]// Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. New York: PMLR, 2020: 113?122.
[15] 赵冬斌, 邵坤, 朱圆恒, 等. 深度强化学习综述: 兼论计算机围棋的发展[J]. 控制理论与应用, 2016, 33(6): 701–717
ZHAO Dongbin, SHAO Kun, ZHU Yuanheng, et al. Review of deep reinforcement learning and discussions on the development of computer go[J]. Control theory & applications, 2016, 33(6): 701–717
[16] AGOSTINELLI F, HOCQUET G, SINGH S, et al. From reinforcement learning to deep reinforcement learning: an overview[M]//Braverman readings in machine learning. Key ideas from inception to current state. Cham: Springer, 2018: 298?328.
[17] NIELSEN M A. Neural networks and deep learning[M]. San Francisco: Determination press, 2015.
[18] HU Junyan, NIU Hanlin, CARRASCO J, et al. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning[J]. IEEE transactions on vehicular technology, 2020, 69(12): 14413–14423.
[19] CHRISTIANOS F, SCH?FER L, ALBRECHT S V. Shared experience actor-critic for multi-agent reinforcement learning[J]. Advances in neural information processing systems, 2020, 33: 10707–10717.
[20] GAO Junli, YE Weijie, GUO Jing, et al. Deep reinforcement learning for indoor mobile robot path planning[J]. Sensors, 2020, 20(19): 5493.
[21] JAKOB Foerster, GREGORY Farquhar, TRIAN T AFYLLOS Afouras, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI conference on artificial intelligence. New Orleans: PKP, 2018, 32(1).
[22] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. New York: arXiv, 2017. (2017?06?07) [2021?06?25].https://arxiv.org/abs/1706.02275.

相似文献/References:: [1]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[2]王作为,徐征,张汝波,等.记忆神经网络在机器人导航领域的应用与研究进展[J].智能系统学报,2020,15(5):835.[doi:10.11992/tis.202002020]
　WANG Zuowei,XU Zheng,ZHANG Rubo,et al.Research progress and application of memory neural network in robot navigation[J].CAAI Transactions on Intelligent Systems,2020,15():835.[doi:10.11992/tis.202002020]
[3]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888.[doi:10.11992/tis.202003031]
　YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15():888.[doi:10.11992/tis.202003031]
[4]赵玉新,杜登辉,成小会,等.基于强化学习的海洋移动观测网络观测路径规划方法[J].智能系统学报,2022,17(1):192.[doi:10.11992/tis.202106004]
　ZHAO Yuxin,DU Denghui,CHENG Xiaohui,et al.Path planning for mobile ocean observation network based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():192.[doi:10.11992/tis.202106004]
[5]王竣禾,姜勇.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2.[doi:10.11992/tis.202201006]
　WANG Junhe,JIANG Yong.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():2.[doi:10.11992/tis.202201006]
[6]陶鑫钰,王艳,纪志成.基于深度强化学习的节能工艺路线发现方法[J].智能系统学报,2023,18(1):23.[doi:10.11992/tis.202112030]
　TAO Xinyu,WANG Yan,JI Zhicheng.Energy-saving process route discovery method based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():23.[doi:10.11992/tis.202112030]
[7]张钰欣,赵恩娇,赵玉新.规则耦合下的多异构子网络MADDPG博弈对抗算法[J].智能系统学报,2024,19(1):190.[doi:10.11992/tis.202303037]
　ZHANG Yuxin,ZHAO Enjiao,ZHAO Yuxin.MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling[J].CAAI Transactions on Intelligent Systems,2024,19():190.[doi:10.11992/tis.202303037]
[8]李康斌,朱齐丹,牟进友,等.基于改进DDQN船舶自动靠泊路径规划方法[J].智能系统学报,2025,20(1):73.[doi:10.11992/tis.202401005]
　LI Kangbin,ZHU Qidan,MU Jinyou,et al.Automatic ship berthing path-planning method based on improved DDQN[J].CAAI Transactions on Intelligent Systems,2025,20():73.[doi:10.11992/tis.202401005]
[9]李庆华,冉泳屹,刘启晨,等.数据中心冷热电联产系统的前摄式智能节能优化算法[J].智能系统学报,2025,20(1):139.[doi:10.11992/tis.202312037]
　LI Qinghua,RAN Yongyi,LIU Qichen,et al.Proactive intelligent energy-saving optimization algorithm for data center CCHP system[J].CAAI Transactions on Intelligent Systems,2025,20():139.[doi:10.11992/tis.202312037]
[10]田顺钰,欧阳勇平,魏长赟.融合专家纠偏策略的移动机器人动态环境避障方法[J].智能系统学报,2024,19(6):1492.[doi:10.11992/tis.202304056]
　TIAN Shunyu,OUYANG Yongping,WEI Changyun.Collision avoidance approach with heuristic correction policy for mobile robot navigation in dynamic environments[J].CAAI Transactions on Intelligent Systems,2024,19():1492.[doi:10.11992/tis.202304056]

备注/Memo

收稿日期:2021-06-25。
基金项目:国家自然科学基金项目（61703138）；中央高校基本科研业务费项目（B200202224）.
作者简介:欧阳勇平，硕士研究生，主要研究方向为智能自主无人系统;魏长赟，副教授，博士，荷兰代尔夫特理工大学人工智能专业博士，英国卡迪夫大学机器人及自主系统实验室访问学者，主要研究方向是智能自主无人系统。发表学术论文30余篇;蔡帛良，英国卡迪夫大学博士，主要研究方向为多机器人协作、智能无人系统
通讯作者:魏长赟. E-mail：c.wei@hhu.edu.cn

更新日期/Last Update: 1900-01-01

动态环境下分布式异构多机器人避障方法研究 PDF下载HTML

备注/Memo

动态环境下分布式异构多机器人避障方法研究

PDF下载 HTML