<-上一篇/Previous Article 下一篇/Next Article->

[1]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202-207.[doi:10.11992/tis.201609017]
　ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13(2):202-207.[doi:10.11992/tis.201609017]

点击复制

强化学习的地-空异构多智能体协作覆盖研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 13 期数: 2018年第2期页码: 202-207 栏目: 学术论文—机器学习出版日期: 2018-04-15

Title:: Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning

作者:: 张文旭, 马磊, 贺荟霖, 王晓东; 西南交通大学电气工程学院, 四川成都 610031

Author(s):: ZHANG Wenxu, MA Lei, HE Huilin, WANG Xiaodong; School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China

关键词:: 异构多智能体; 覆盖问题; 地-空; UAV/UGV; DEC-POMDPs; 强化学习

Keywords:: heterogeneous multi-agent system; coverage; air-ground; UAV/UGV; DEC-POMDPs; reinforced learning

分类号:: TP181

DOI:: 10.11992/tis.201609017

摘要:: 以无人机（unmanned aerial vehicle, UAV）和无人车（unmanned ground vehicle, UGV）的异构协作任务为背景，通过UAV和UGV的异构特性互补，为了扩展和改进异构多智能体的动态覆盖问题，提出了一种地-空异构多智能体协作覆盖模型。在覆盖过程中，UAV可以利用速度与观测范围的优势对UGV的行动进行指导；同时考虑智能体的局部观测性与不确定性，以分布式局部可观测马尔可夫（decentralized partially observable Markov decision processes，DEC-POMDPs）为模型搭建覆盖场景，并利用多智能体强化学习算法完成对环境的覆盖。仿真实验表明，UAV与 UGV间的协作加快了团队对环境的覆盖速度，同时强化学习算法也提高了覆盖模型的有效性。

Abstract:: With the heterogeneous coordinate task of unmanned aerial vehicles (UAVs) and unmanned ground vehicle (UGVs) as the background to this study, a novel air-ground heterogeneous coverage model for a coordinated multi-agent is proposed by the complementation between UAV and UGV heterogeneity, in order to extend and improve the dynamic coverage of a heterogeneous multi-agent system. During the coverage process, the advantages of mobility and the observation scope of the UAV were used in order to guide the actions of the UGV. Moreover, in view of the partial agent observability and uncertainty, decentralized and partially observable Markov decision processes (DEC-POMDPs) were applied as the model in order to establish the coverage environment. Additionally, the reinforced learning algorithm of multi-agents was utilized in order to complete the coverage of the environment. The simulation results revealed that the coverage process was accelerated by the cooperation of the UAV and UGV. Additionally, the reinforced learning algorithm also improved the effectiveness of the coverage model.

参考文献/References:: [1] KANTAROS Y, ZAVLANOS M M. Distributed communication-aware coverage control by mobile sensor networks[J]. Automatica, 2016, 63: 209-220.
[2] 蔡自兴, 崔益安. 多机器人覆盖技术研究进展[J]. 控制与决策, 2008, 23(5): 481-486, 491.
CAI Zixing, CUI Yi’an. Survey of multi-robot coverage[J]. Control and decision, 2008, 23(5): 481-486, 491.
[3] MAHBOUBI H, MOEZZI K, AGHDAM A G, et al. Distributed deployment algorithms for improved coverage in a network of wireless mobile sensors[J]. IEEE transactions on industrial informatics, 2014, 10(1): 163-174.
[4] TAO Dan, WU T Y. A survey on barrier coverage problem in directional sensor networks[J]. IEEE sensors journal, 2015, 15(2): 876-885.
[5] TIAN Yuping, ZHANG Ya. High-order consensus of heterogeneous multi-agent systems with unknown communication delays[J]. Automatica, 2012, 48(6): 1205-1212.
[6] SONG Cheng, LIU Lu, FENG Gang, et al. Coverage control for heterogeneous mobile sensor networks on a circle[J]. Automatica, 2016, 63: 349-358.
[7] KANTAROS Y, THANOU M, TZES A. Distributed coverage control for concave areas by a heterogeneous robot-swarm with visibility sensing constraints[J]. Automatica, 2015, 53: 195-207.
[8] WANG Xinbing, HAN Sihui, WU Yibo, et al. Coverage and energy consumption control in mobile heterogeneous wireless sensor networks[J]. IEEE transactions on automatic control, 2013, 58(4): 975-988.
[9] SHARIFI F, CHAMSEDDINE A, MAHBOUBI H, et al. A distributed deployment strategy for a network of cooperative autonomous vehicles[J]. IEEE transactions on control systems technology, 2015, 23(2): 737-745.
[10] CHEN Jie, ZHANG Xing, XIN Bin, et al. Coordination between unmanned aerial and ground vehicles: a taxonomy and optimization perspective[J]. IEEE transactions on cybernetics, 2016, 46(4): 959-972.
[11] ZHOU Yi, CHENG Nan, LU Ning, et al. Multi-UAV-aided networks: aerial-ground cooperative vehicular networking architecture[J]. IEEE vehicular technology magazine, 2015, 10(4): 36-44.
[12] PAPACHRISTOS C, TZES A. The power-tethered UAV-UGV team: a collaborative strategy for navigation in partially-mapped environments[C]//Proceedings of 22nd Mediterranean Conference of Control and Automation. Palermo, Italy, 2014: 1153-1158.
[13] GROCHOLSKY B, KELLER J, KUMAR V, et al. Cooperative air and ground surveillance[J]. IEEE robotics and automation magazine, 2006, 13(3): 16-25.
[14] KHALEGHI A M, XU Dong, WANG Zhenrui, et al. A DDDAMS-based planning and control framework for surveillance and crowd control via UAVs and UGVs[J]. Expert systems with applications, 2013, 40(18): 7168-7183.
[15] 马磊, 张文旭, 戴朝华. 多机器人系统强化学习研究综述[J]. 西南交通大学学报, 2014, 49(6): 1032-1044.
MA Lei, ZHANG Wenxu, DAI Chaohua. A review of developments in reinforcement learning for multi-robot systems[J]. Journal of southwest Jiaotong university, 2014, 49(6): 1032-1044.
[16] PUTERMAN M L. Markov decision processes: discrete stochastic dynamic programming[M]. New York: John Wiley and Sons, 1994.
[17] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279-292.
[18] WU Feng, ZILBERSTEIN S, CHEN Xiaoping. Online planning for multi-agent systems with bounded communication[J]. Artificial intelligence, 2011, 175(2): 487-511.

备注/Memo

收稿日期:2016-09-21。
基金项目:国家自然科学基金青年基金项目（61304166）.
作者简介:张文旭,男,1985年生,博士研究生,主要研究方向为多智能体系统、机器学习,发表学术论文4篇,其中被EI检索4篇;马磊,男,1972年生,教授,博士,主要研究方向为控制理论及其在机器人、新能源和轨道交通系统中的应用等,主持国内外项目14项,发表学术论文40余篇,其中被EI检索37篇;贺荟霖,女,1993年生,硕士研究生,主要研究方向为机器学习;王晓东,男,1992年生,硕士研究生,主要研究方向为机器学习,获得国家发明型专利3项,发表学术论文4篇。
通讯作者:张文旭.E-mail:wenxu_zhang@163.com.

更新日期/Last Update: 1900-01-01

强化学习的地-空异构多智能体协作覆盖研究 PDF下载HTML

备注/Memo

强化学习的地-空异构多智能体协作覆盖研究

PDF下载 HTML