[1]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(01):93-98.[doi:10.11992/tis.201807010]
 XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14(01):93-98.[doi:10.11992/tis.201807010]
点击复制

事件驱动的强化学习多智能体编队控制(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年01期
页码:
93-98
栏目:
出版日期:
2019-01-05

文章信息/Info

Title:
Event-triggered reinforcement learning formation control for multi-agent
作者:
徐鹏1 谢广明123 文家燕12 高远1
1. 广西科技大学 电气与信息工程学院, 广西 柳州 545006;
2. 北京大学 工学院, 北京 100871;
3. 北京大学 海洋研究院, 北京 100871
Author(s):
XU Peng1 XIE Guangming123 WEN Jiayan12 GAO Yuan1
1. School of Electric and Information Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China;
2. College of Engineering, Peking University, Beijing 100871, China;
3. Institute of Ocean Research, Peking University, Beijing 100871, China
关键词:
强化学习多智能体事件驱动编队控制马尔可夫过程集群智能动作决策粒子群算法
Keywords:
reinforcement learningmulti-agentevent-triggeredformation controlMarkov decision processesswarm intelligenceaction-decisionsparticle swarm optimization
分类号:
TP391.8
DOI:
10.11992/tis.201807010
摘要:
针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制,智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有效降低多智能体的动作决策频率和资源消耗。
Abstract:
A large consumption of communication and computing capabilities has been reported in classical reinforcement learning of multi-agent formation. This paper introduces an event-triggered mechanism so that the multi-agent’s decisions do not need to be carried out periodically; instead, the multi-agent’s actions are replaced depending on the event-triggered condition. Both the sum of total reward and variance in current rewards are considered when designing an event-triggered condition, so a joint optimization strategy is obtained by exchanging information among multiple agents. Numerical simulation results demonstrate that the multi-agent formation control algorithm can effectively reduce the frequency of a multi-agent’s action decisions and consumption of resources while ensuring system performance.

参考文献/References:

[1] POLYDOROS A S, NALPANTIDIS L. Survey of model-based reinforcement learning:applications on robotics[J]. Journal of intelligent & robotic systems, 2017, 86(2):153-173.
[2] TSAURO G, TOURCTZKY D S, LN T K, et al. Advances in neural information processing systems[J]. Biochemical and biophysical research communications, 1997, 159(6).
[3] 梁爽, 曹其新, 王雯珊, 等. 基于强化学习的多定位组件自动选择方法[J]. 智能系统学报, 2016, 11(2):149-154 LIANG Shuang, CAO Qixin, WANG Wenshan, et al. An automatic switching method for multiple location components based on reinforcement learning[J]. CAAI transactions on intelligent systems, 2016, 11(2):149-154
[4] KIM H E, AHN H S. Convergence of multiagent Q-learning:multi action replay process approach[C]//Proceedings of 2010 IEEE International Symposium on Intelligent Control. Yokohama, Japan, 2010:789-794.
[5] ⅡMA H, KUROE Y. Swarm reinforcement learning methods improving certainty of learning for a multi-robot formation problem[C]//Proceedings of 2015 IEEE Congress on Evolutionary Computation. Sendai, Japan, 2015:3026-3033.
[6] MENG Xiangyu, CHEN Tongwen. Optimal sampling and performance comparison of periodic and event based impulse control[J]. IEEE transactions on automatic control, 2012, 57(12):3252-3259.
[7] DIMAROGONAS D V, FRAZZOLI E, JOHANSSON K H. Distributed event-triggered control for multi-agent systems[J]. IEEE transactions on automatic control, 2012, 57(5):1291-1297.
[8] XIE Duosi, XU Shengyuan, CHU Yuming, et al. Event-triggered average consensus for multi-agent systems with nonlinear dynamics and switching topology[J]. Journal of the franklin institute, 2015, 352(3):1080-1098.
[9] WU Yuanqing, MENG Xiangyu, XIE Lihua, et al. An input-based triggering approach to leader-following problems[J]. Automatica, 2017, 75:221-228.
[10] TABUADA P. Event-triggered real-time scheduling of stabilizing control tasks[J]. IEEE transactions on automatic control, 2007, 52(9):1680-1685.
[11] WEN Jiayan, WANG Chen, XIE Guangming. Asynchronous distributed event-triggered circle formation of multi-agent systems[J]. Neurocomputing, 2018, 295:118-126.
[12] MENG Xiangyu, CHEN Tongwen. Event based agreement protocols for multi-agent networks[J]. Automatica, 2013, 49(7):2125-2132.
[13] ZHONG Xiangnan, NI Zhen, HE Haibo, et al. Event-triggered reinforcement learning approach for unknown nonlinear continuous-time system[C]//Proceedings of 2014 International Joint Conference on Neural Networks. Beijing, China, 2014:3677-3684.
[14] 张文旭, 马磊, 王晓东. 基于事件驱动的多智能体强化学习研究[J]. 智能系统学报, 2017, 12(1):82-87 ZHANG Wenxu, MA Lei, WANG Xiaodong. Reinforcement learning for event-triggered multi-agent systems[J]. CAAI transactions on intelligent systems, 2017, 12(1):82-87
[15] KRÖSE B J A. Learning from delayed rewards[J]. Robotics and autonomous systems, 1995, 15(4):233-235.

相似文献/References:

[1]谭树彬,刘建昌.Multi-Agent的连续轧制过程控制系统研究[J].智能系统学报,2008,3(02):150.
 TAN Shu-bin,LIU Jian-chang.Research On multiAgent based control system for continuous rolling process[J].CAAI Transactions on Intelligent Systems,2008,3(01):150.
[2]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(02):95.
 LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6(01):95.
[3]雷明,周超,周绍磊,等.考虑时变时滞的多移动智能体分布式编队控制[J].智能系统学报,2012,7(06):536.
 LEI Ming,ZHOU Chao,ZHOU Shaolei,et al.Decentralized formation control of multiple mobile agents considering timevarying delay[J].CAAI Transactions on Intelligent Systems,2012,7(01):536.
[4]郭文强,高晓光,侯勇严,等.采用MSBN多智能体协同推理的智能农业车辆环境识别[J].智能系统学报,2013,8(05):453.[doi:10.3969/j.issn.1673-4785.201210057]
 GUO Wenqiang,GAO Xiaoguang,HOU Yongyan,et al.Environment recognition of intelligent agricultural vehicles based on MSBN and multi-agent coordinative inference[J].CAAI Transactions on Intelligent Systems,2013,8(01):453.[doi:10.3969/j.issn.1673-4785.201210057]
[5]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
 LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11(01):149.[doi:10.11992/tis.201510031]
[6]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(05):590.[doi:10.11992/tis.201706031]
 ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12(01):590.[doi:10.11992/tis.201706031]
[7]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(02):202.[doi:10.11992/tis.201609017]
 ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13(01):202.[doi:10.11992/tis.201609017]
[8]郭宪,方勇纯.仿生机器人运动步态控制:强化学习方法综述[J].智能系统学报,2020,15(1):152.[doi:10.11992/tis.201907052]
 GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15(01):152.[doi:10.11992/tis.201907052]
[9]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317.[doi:10.11992/tis.201809033]
 SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(01):317.[doi:10.11992/tis.201809033]
[10]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(01):82.[doi:10.11992/tis.201604008]
 ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(01):82.[doi:10.11992/tis.201604008]

备注/Memo

备注/Memo:
收稿日期:2018-07-11。
基金项目:国家重点研发计划项目(2017YFB1400800);国家自然科学基金项目(91648120,61633002,51575005,61563006,61563005);广西高校工业过程智能控制技术重点实验室项目(IPICT-2016-04).
作者简介:徐鹏,男,1991年生,硕士研究生,主要研究方向为多智能体、强化学习、深度学习;谢广明,男,1972年生,教授,博士生导师,主要研究方向为复杂系统动力学与控制、智能仿生机器人多机器人系统与控制。现主持国家自然基金重点项目3项,发明专利授权10余项。曾荣获教育部自然科学奖一等奖、国家自然科学奖二等奖。发表学术论文300余篇,其中被SCI收录120余篇、EI收录120余篇;文家燕,男,1981年生,副教授,博士,主要研究方向为事件驱动控制、多智能体编队控制。发表学术论文10余篇。
通讯作者:文家燕.E-mail:wenjiayan2012@126.com
更新日期/Last Update: 1900-01-01