[1]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(01):82-87.[doi:10.11992/tis.201604008]
 ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(01):82-87.[doi:10.11992/tis.201604008]
点击复制

基于事件驱动的多智能体强化学习研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第12卷
期数:
2017年01期
页码:
82-87
栏目:
出版日期:
2017-02-25

文章信息/Info

Title:
Reinforcement learning for event-triggered multi-agent systems
作者:
张文旭 马磊 王晓东
西南交通大学 电气工程学院, 四川 成都 610031
Author(s):
ZHANG Wenxu MA Lei WANG Xiaodong
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China
关键词:
事件驱动多智能体强化学习分布式马尔科夫决策过程收敛性
Keywords:
event-triggeredmulti-agentreinforcement learningdecentralized Markov decision processesconvergence
分类号:
TP181
DOI:
10.11992/tis.201604008
摘要:
本文针对多智能体强化学习中存在的通信和计算资源消耗大等问题,提出了一种基于事件驱动的多智能体强化学习算法,侧重于事件驱动在多智能体学习策略层方面的研究。在智能体与环境的交互过程中,算法基于事件驱动的思想,根据智能体观测信息的变化率设计触发函数,使学习过程中的通信和学习时机无需实时或按周期地进行,故在相同时间内可以降低数据传输和计算次数。另外,分析了该算法的计算资源消耗,以及对算法收敛性进行了论证。最后,仿真实验说明了该算法可以在学习过程中减少一定的通信次数和策略遍历次数,进而缓解了通信和计算资源消耗。
Abstract:
Focusing on the existing multi-agent reinforcement learning problems such as huge consumption of communication and calculation, a novel event-triggered multi-agent reinforcement learning algorithm was presented. The algorithm focused on an event-triggered idea at the strategic level of multi-agent learning. In particular, during the interactive process between agents and the learning environment, the communication and learning were triggered through the change rate of observation.Using an appropriate event-triggered design, the discontinuous threshold was employed, and thus real-time or periodical communication and learning can be avoided, and the number of communications and calculations were reduced within the same time. Moreover, the consumption of computing resource and the convergence of the proposed algorithm were analyzed and proven. Finally, the simulation results show that the number of communications and traversals were reduced in learning, thus saving the computing and communication resources.

参考文献/References:

[1] ZHU Wei, JIANG ZhongPing, FENG Gang. Event-based consensus of multi-agent systems with general linear models[J]. Automatica, 2014, 50(2): 552-558.
[2] FAN Yuan, FENG Gang, WANG Yong, et al. Distributed event-triggered control of multi-agent systems with combinational measurements[J]. Automatica, 2013, 49(2): 671-675.
[3] WANG Xiaofeng, LEMMON M D. Event-triggering in distributed networked control systems[J]. IEEE transactions on automatic control, 2011, 56(3): 586-601.
[4] TABUADA P. Event-triggered real-time scheduling of stabilizing control tasks[J]. IEEE transactions on automatic control, 2007, 52(9): 1680-1685.
[5] ZOU Lei, WANG Zidong, GAO Huijun, et al. Event-triggered state estimation for complex networks with mixed time delays via sampled data information: the continuous-time case[J]. IEEE transactions on cybernetics, 2015, 45(12): 2804-2815.
[6] SAHOO A, XU Hao, JAGANNATHAN S. Adaptive neural network-based event-triggered control of single-input single-output nonlinear discrete-time systems[J]. IEEE transactions on neural networks and learning systems, 2016, 27(1): 151-164.
[7] HU Wenfeng, LIU Lu, FENG Gang. Consensus of linear multi-agent systems by distributed event-triggered strategy[J]. IEEE transactions on cybernetics, 2016, 46(1): 148-157.
[8] ZHONG Xiangnan, NI Zhen, HE Haibo, et al. Event-triggered reinforcement learning approach for unknown nonlinear continuous-time system[C]//Proceedings of 2014 International Joint Conference on Neural Networks. Beijing, China, 2014: 3677-3684.
[9] XU Hao, JAGANNATHAN S. Near optimal event-triggered control of nonlinear continuous-time systems using input and output data[C]//Proceedings of the 11th World Congress on Intelligent Control and Automation. Shenyang, China, 2014: 1799-1804.
[10] BERNSTEIN D S, GIVAN R, IMMERMAN N, et al. The complexity of decentralized control of Markov decision processes[J]. Mathematics of operations research, 2002, 27(4): 819-840.
[11] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279-292.
[12] SZEPESVRI C, LITTMAN M L. A unified analysis of value-function-based reinforcement-learning algorithms[J]. Neural computation, 1999, 11(8): 2017-2060.

相似文献/References:

[1]谭树彬,刘建昌.Multi-Agent的连续轧制过程控制系统研究[J].智能系统学报,2008,3(02):150.
 TAN Shu-bin,LIU Jian-chang.Research On multiAgent based control system for continuous rolling process[J].CAAI Transactions on Intelligent Systems,2008,3(01):150.
[2]雷明,周超,周绍磊,等.考虑时变时滞的多移动智能体分布式编队控制[J].智能系统学报,2012,7(06):536.
 LEI Ming,ZHOU Chao,ZHOU Shaolei,et al.Decentralized formation control of multiple mobile agents considering timevarying delay[J].CAAI Transactions on Intelligent Systems,2012,7(01):536.
[3]郭文强,高晓光,侯勇严,等.采用MSBN多智能体协同推理的智能农业车辆环境识别[J].智能系统学报,2013,8(05):453.[doi:10.3969/j.issn.1673-4785.201210057]
 GUO Wenqiang,GAO Xiaoguang,HOU Yongyan,et al.Environment recognition of intelligent agricultural vehicles based on MSBN and multi-agent coordinative inference[J].CAAI Transactions on Intelligent Systems,2013,8(01):453.[doi:10.3969/j.issn.1673-4785.201210057]
[4]曹鹏飞,郝矿荣,丁永生.面向多机器人动态任务分配的事件驱动免疫网络算法[J].智能系统学报,2018,13(06):952.[doi:10.11992/tis.201707022]
 CAO Pengfei,HAO Kuangrong,DING Yongsheng.Event-driven immune network algorithm for multi-robot dynamic task allocation[J].CAAI Transactions on Intelligent Systems,2018,13(01):952.[doi:10.11992/tis.201707022]
[5]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
 YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(01):646.[doi:10.11992/tis.201909027]
[6]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(01):93.[doi:10.11992/tis.201807010]
 XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14(01):93.[doi:10.11992/tis.201807010]

备注/Memo

备注/Memo:
收稿日期:2016-4-5;改回日期:。
基金项目:国家自然科学基金青年项目(61304166).
作者简介:张文旭,男,1985年生,博士研究生,主要研究方向为多智能体系统、机器学习。发表论文4篇,其中被EI检索4篇;马磊,男,1972年生,教授,博士,主要研究方向为控制理论及其在机器人、新能源和轨道交通系统中的应用等。主持国内外项目14项,发表论文40余篇,其中被EI检索37篇;王晓东,男,1992年生,硕士研究生,主要研究方向为机器学习。获得国家发明型专利3项,发表论文4篇。
通讯作者:张文旭.Email:wenxu_zhang@163.com.
更新日期/Last Update: 1900-01-01