[1]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82-87.[doi:10.11992/tis.201604008]
ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(1):82-87.[doi:10.11992/tis.201604008]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
12
期数:
2017年第1期
页码:
82-87
栏目:
学术论文—机器学习
出版日期:
2017-02-25
- Title:
-
Reinforcement learning for event-triggered multi-agent systems
- 作者:
-
张文旭, 马磊, 王晓东
-
西南交通大学 电气工程学院, 四川 成都 610031
- Author(s):
-
ZHANG Wenxu, MA Lei, WANG Xiaodong
-
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China
-
- 关键词:
-
事件驱动; 多智能体; 强化学习; 分布式马尔科夫决策过程; 收敛性
- Keywords:
-
event-triggered; multi-agent; reinforcement learning; decentralized Markov decision processes; convergence
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.201604008
- 摘要:
-
本文针对多智能体强化学习中存在的通信和计算资源消耗大等问题,提出了一种基于事件驱动的多智能体强化学习算法,侧重于事件驱动在多智能体学习策略层方面的研究。在智能体与环境的交互过程中,算法基于事件驱动的思想,根据智能体观测信息的变化率设计触发函数,使学习过程中的通信和学习时机无需实时或按周期地进行,故在相同时间内可以降低数据传输和计算次数。另外,分析了该算法的计算资源消耗,以及对算法收敛性进行了论证。最后,仿真实验说明了该算法可以在学习过程中减少一定的通信次数和策略遍历次数,进而缓解了通信和计算资源消耗。
- Abstract:
-
Focusing on the existing multi-agent reinforcement learning problems such as huge consumption of communication and calculation, a novel event-triggered multi-agent reinforcement learning algorithm was presented. The algorithm focused on an event-triggered idea at the strategic level of multi-agent learning. In particular, during the interactive process between agents and the learning environment, the communication and learning were triggered through the change rate of observation.Using an appropriate event-triggered design, the discontinuous threshold was employed, and thus real-time or periodical communication and learning can be avoided, and the number of communications and calculations were reduced within the same time. Moreover, the consumption of computing resource and the convergence of the proposed algorithm were analyzed and proven. Finally, the simulation results show that the number of communications and traversals were reduced in learning, thus saving the computing and communication resources.
备注/Memo
收稿日期:2016-4-5;改回日期:。
基金项目:国家自然科学基金青年项目(61304166).
作者简介:张文旭,男,1985年生,博士研究生,主要研究方向为多智能体系统、机器学习。发表论文4篇,其中被EI检索4篇;马磊,男,1972年生,教授,博士,主要研究方向为控制理论及其在机器人、新能源和轨道交通系统中的应用等。主持国内外项目14项,发表论文40余篇,其中被EI检索37篇;王晓东,男,1992年生,硕士研究生,主要研究方向为机器学习。获得国家发明型专利3项,发表论文4篇。
通讯作者:张文旭.Email:wenxu_zhang@163.com.
更新日期/Last Update:
1900-01-01