<-Previous Article Next Article->

[1]OUYANG Yongping,WEI Changyun,CAI Boliang.Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J].CAAI Transactions on Intelligent Systems,2022,17(4):752-763.[doi:10.11992/tis.202106044]

Copy

Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 17 Number of periods: 2022 4 Page number: 752-763 Column: 学术论文—智能系统 Public date: 2022-07-05

Title:: Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments

Author(s):: OUYANG Yongping¹; WEI Changyun¹; CAI Boliang¹; 2; 1. College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China;
2. School of Engineering, Cardiff University, Cardiff CF10 3AT, UK

Keywords:: heterogeneous multi-robot systems; deep reinforcement learning; non-structural environment; multi-feature policy gradients; dynamic collision avoidance; self-learning; distributed control; control policy

CLC:: TP273+.2

DOI:: 10.11992/tis.202106044

Abstract:: Multirobot systems have been widely used in cooperative search and rescue missions, intelligent warehouses, intelligent transportation, and other fields. At present, the path planning and collision avoidance problems between multiple robots and the dynamic environment still rely on accurate maps, which brings challenges to the coordination and cooperation of multirobot systems in unstructured environments. To address the above problem, this paper presents a navigation and collision avoidance approach that does not require accurate maps and is based on the deep reinforcement learning framework. A multifeatured policy gradients algorithm is proposed in this work, and social norms are also integrated so that the learning agent can obtain the optimal control policy via trial-and-error interactions with the environment. The optimal policy is trained and obtained in the Gazebo environment, and afterward, the optimal policy is transferred to several heterogeneous real robots by decoding the control signals. The experimental results show that the multifeature policy gradients algorithm proposed can obtain the optimal navigation collision avoidance policy through self-learning, and it provides a technical reference for the application of distributed heterogeneous multirobot systems in dynamic environments.

References:: [1] SHI Huiyuan, SU Chengli, CAO Jiangtao, et al. Nonlinear adaptive predictive functional control based on the Takagi-sugeno model for average cracking outlet temperature of the ethylene cracking furnace[J]. Industrial & engineering chemistry research, 2015, 54(6): 1849–1860.
[2] MELLINGER D, KUSHLEYEV A, KUMAR V. Mixed-integer quadratic program trajectory generation for heterogeneous quadrotor teams[C]//2012 IEEE International Conference on Robotics and Automation. Saint Paul: IEEE, 2012: 477?483.
[3] KHATIB O. Real-time obstacle avoidance for manipulators and mobile robots[M]//Autonomous robot vehicles. New York: Springer New York, 1986: 396?404.
[4] ZHANG Pengpeng, WEI Changyun, CAI Boliang, et al. Mapless navigation for autonomous robots: a deep reinforcement learning approach[C]//2019 Chinese Automation Congress. Hangzhou: IEEE, 2019: 3141?3146.
[5] CHEN Yufan, LIU Miao, EVERETT M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017: 285?292.
[6] TAI Lei, PAOLO G, LIU Ming. Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver: IEEE, 2017: 31?36.
[7] MINSKY M. Theory of neural-analog reinforcement systems and its application to the brain-model problem[M]. New Jersey: Princeton University, 1954.
[8] BELLMAN R. Dynamic programming[J]. Science, 1966, 153(3731): 34–37.
[9] FAN Tingxiang, LONG Pinxin, LIU Wenxi, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios[J]. The international journal of robotics research, 2020, 39(7): 856–892.
[10] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients[EB/OL]. New York: arXiv, 2018: (2018?04?23)[2021?06?25].https://arxiv.org/abs/1804.08617.
[11] NA S, NIU Hanlin, LENNOX B, et al. Universal artificial pheromone framework with deep reinforcement learning for robotic systems[C]//2021 6th International Conference on Control and Robotics Engineering. Beijing: IEEE, 2021: 28?32.
[12] HUANG Liang, BI Suzhi, ZHANG Y J A. Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks[J]. IEEE transactions on mobile computing, 2020, 19(11): 2581–2593.
[13] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International conference on machine learning. New York: PMLR, 2015: 1889?1897.
[14] WANG Yuhui, HE Hao, TAN Xiaoyang. Truly proximal policy optimization[C]// Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. New York: PMLR, 2020: 113?122.
[15] 赵冬斌, 邵坤, 朱圆恒, 等. 深度强化学习综述: 兼论计算机围棋的发展[J]. 控制理论与应用, 2016, 33(6): 701–717
ZHAO Dongbin, SHAO Kun, ZHU Yuanheng, et al. Review of deep reinforcement learning and discussions on the development of computer go[J]. Control theory & applications, 2016, 33(6): 701–717
[16] AGOSTINELLI F, HOCQUET G, SINGH S, et al. From reinforcement learning to deep reinforcement learning: an overview[M]//Braverman readings in machine learning. Key ideas from inception to current state. Cham: Springer, 2018: 298?328.
[17] NIELSEN M A. Neural networks and deep learning[M]. San Francisco: Determination press, 2015.
[18] HU Junyan, NIU Hanlin, CARRASCO J, et al. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning[J]. IEEE transactions on vehicular technology, 2020, 69(12): 14413–14423.
[19] CHRISTIANOS F, SCH?FER L, ALBRECHT S V. Shared experience actor-critic for multi-agent reinforcement learning[J]. Advances in neural information processing systems, 2020, 33: 10707–10717.
[20] GAO Junli, YE Weijie, GUO Jing, et al. Deep reinforcement learning for indoor mobile robot path planning[J]. Sensors, 2020, 20(19): 5493.
[21] JAKOB Foerster, GREGORY Farquhar, TRIAN T AFYLLOS Afouras, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI conference on artificial intelligence. New Orleans: PKP, 2018, 32(1).
[22] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. New York: arXiv, 2017. (2017?06?07) [2021?06?25].https://arxiv.org/abs/1706.02275.

Similar References:

Memo

Last Update: 1900-01-01

Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments PDF DownloadHTML

Memo

Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments

PDF Download HTML