<-上一篇/Previous Article 下一篇/Next Article->

[1]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82-87.[doi:10.11992/tis.201604008]
　ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(1):82-87.[doi:10.11992/tis.201604008]

点击复制

基于事件驱动的多智能体强化学习研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 12 期数: 2017年第1期页码: 82-87 栏目: 学术论文—机器学习出版日期: 2017-02-25

Title:: Reinforcement learning for event-triggered multi-agent systems

作者:: 张文旭, 马磊, 王晓东; 西南交通大学电气工程学院, 四川成都 610031

Author(s):: ZHANG Wenxu, MA Lei, WANG Xiaodong; School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China

关键词:: 事件驱动; 多智能体; 强化学习; 分布式马尔科夫决策过程; 收敛性

Keywords:: event-triggered; multi-agent; reinforcement learning; decentralized Markov decision processes; convergence

分类号:: TP181

DOI:: 10.11992/tis.201604008

摘要:: 本文针对多智能体强化学习中存在的通信和计算资源消耗大等问题，提出了一种基于事件驱动的多智能体强化学习算法，侧重于事件驱动在多智能体学习策略层方面的研究。在智能体与环境的交互过程中，算法基于事件驱动的思想，根据智能体观测信息的变化率设计触发函数，使学习过程中的通信和学习时机无需实时或按周期地进行，故在相同时间内可以降低数据传输和计算次数。另外，分析了该算法的计算资源消耗，以及对算法收敛性进行了论证。最后，仿真实验说明了该算法可以在学习过程中减少一定的通信次数和策略遍历次数，进而缓解了通信和计算资源消耗。

Abstract:: Focusing on the existing multi-agent reinforcement learning problems such as huge consumption of communication and calculation, a novel event-triggered multi-agent reinforcement learning algorithm was presented. The algorithm focused on an event-triggered idea at the strategic level of multi-agent learning. In particular, during the interactive process between agents and the learning environment, the communication and learning were triggered through the change rate of observation.Using an appropriate event-triggered design, the discontinuous threshold was employed, and thus real-time or periodical communication and learning can be avoided, and the number of communications and calculations were reduced within the same time. Moreover, the consumption of computing resource and the convergence of the proposed algorithm were analyzed and proven. Finally, the simulation results show that the number of communications and traversals were reduced in learning, thus saving the computing and communication resources.

参考文献/References:: [1] ZHU Wei, JIANG ZhongPing, FENG Gang. Event-based consensus of multi-agent systems with general linear models[J]. Automatica, 2014, 50(2): 552-558.
[2] FAN Yuan, FENG Gang, WANG Yong, et al. Distributed event-triggered control of multi-agent systems with combinational measurements[J]. Automatica, 2013, 49(2): 671-675.
[3] WANG Xiaofeng, LEMMON M D. Event-triggering in distributed networked control systems[J]. IEEE transactions on automatic control, 2011, 56(3): 586-601.
[4] TABUADA P. Event-triggered real-time scheduling of stabilizing control tasks[J]. IEEE transactions on automatic control, 2007, 52(9): 1680-1685.
[5] ZOU Lei, WANG Zidong, GAO Huijun, et al. Event-triggered state estimation for complex networks with mixed time delays via sampled data information: the continuous-time case[J]. IEEE transactions on cybernetics, 2015, 45(12): 2804-2815.
[6] SAHOO A, XU Hao, JAGANNATHAN S. Adaptive neural network-based event-triggered control of single-input single-output nonlinear discrete-time systems[J]. IEEE transactions on neural networks and learning systems, 2016, 27(1): 151-164.
[7] HU Wenfeng, LIU Lu, FENG Gang. Consensus of linear multi-agent systems by distributed event-triggered strategy[J]. IEEE transactions on cybernetics, 2016, 46(1): 148-157.
[8] ZHONG Xiangnan, NI Zhen, HE Haibo, et al. Event-triggered reinforcement learning approach for unknown nonlinear continuous-time system[C]//Proceedings of 2014 International Joint Conference on Neural Networks. Beijing, China, 2014: 3677-3684.
[9] XU Hao, JAGANNATHAN S. Near optimal event-triggered control of nonlinear continuous-time systems using input and output data[C]//Proceedings of the 11th World Congress on Intelligent Control and Automation. Shenyang, China, 2014: 1799-1804.
[10] BERNSTEIN D S, GIVAN R, IMMERMAN N, et al. The complexity of decentralized control of Markov decision processes[J]. Mathematics of operations research, 2002, 27(4): 819-840.
[11] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279-292.
[12] SZEPESVRI C, LITTMAN M L. A unified analysis of value-function-based reinforcement-learning algorithms[J]. Neural computation, 1999, 11(8): 2017-2060.

相似文献/References:: [1]谭树彬,刘建昌.Multi-Agent的连续轧制过程控制系统研究[J].智能系统学报,2008,3(2):150.
　TAN Shu-bin,LIU Jian-chang.Research On multiAgent based control system for continuous rolling process[J].CAAI Transactions on Intelligent Systems,2008,3():150.
[2]雷明,周超,周绍磊,等.考虑时变时滞的多移动智能体分布式编队控制[J].智能系统学报,2012,7(6):536.
　LEI Ming,ZHOU Chao,ZHOU Shaolei,et al.Decentralized formation control of multiple mobile agents considering timevarying delay[J].CAAI Transactions on Intelligent Systems,2012,7():536.
[3]郭文强,高晓光,侯勇严,等.采用MSBN多智能体协同推理的智能农业车辆环境识别[J].智能系统学报,2013,8(5):453.[doi:10.3969/j.issn.1673-4785.201210057]
　GUO Wenqiang,GAO Xiaoguang,HOU Yongyan,et al.Environment recognition of intelligent agricultural vehicles based on MSBN and multi-agent coordinative inference[J].CAAI Transactions on Intelligent Systems,2013,8():453.[doi:10.3969/j.issn.1673-4785.201210057]
[4]曹鹏飞,郝矿荣,丁永生.面向多机器人动态任务分配的事件驱动免疫网络算法[J].智能系统学报,2018,13(6):952.[doi:10.11992/tis.201707022]
　CAO Pengfei,HAO Kuangrong,DING Yongsheng.Event-driven immune network algorithm for multi-robot dynamic task allocation[J].CAAI Transactions on Intelligent Systems,2018,13():952.[doi:10.11992/tis.201707022]
[5]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[6]赵玉新,杜登辉,成小会,等.基于强化学习的海洋移动观测网络观测路径规划方法[J].智能系统学报,2022,17(1):192.[doi:10.11992/tis.202106004]
　ZHAO Yuxin,DU Denghui,CHENG Xiaohui,et al.Path planning for mobile ocean observation network based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():192.[doi:10.11992/tis.202106004]
[7]陆升阳,赵怀林,刘华平.场景图谱驱动目标搜索的多智能体强化学习[J].智能系统学报,2023,18(1):207.[doi:10.11992/tis.202111034]
　LU Shengyang,ZHAO Huailin,LIU Huaping.Multi-agent reinforcement learning for scene graph-driven target search[J].CAAI Transactions on Intelligent Systems,2023,18():207.[doi:10.11992/tis.202111034]
[8]蒲兴成,张玲侠.基于合作?竞争关系马尔可夫切换下异质多智能体系统均方二分组一致研究[J].智能系统学报,2023,18(4):803.[doi:10.11992/tis.202201045]
　PU Xingcheng,ZHANG Lingxia.Mean square couple-group consensus for heterogeneous multiagent systems based on Markov switching and cooperative-competitive relation[J].CAAI Transactions on Intelligent Systems,2023,18():803.[doi:10.11992/tis.202201045]
[9]王卓然,文家燕,谢广明,等.基于改进CBS算法的多智能体路径规划[J].智能系统学报,2023,18(6):1336.[doi:10.11992/tis.202211006]
　WANG Zhuoran,WEN Jiayan,XIE Guangming,et al.Multi-agent path planning based on improved CBS algorithm[J].CAAI Transactions on Intelligent Systems,2023,18():1336.[doi:10.11992/tis.202211006]
[10]夏桂华,朱文序,刘浩岩,等.无人艇集群自组织协同围捕控制算法研究[J].智能系统学报,2025,20(1):162.[doi:10.11992/tis.202405025]
　XIA Guihua,ZHU Wenxu,LIU Haoyan,et al.Research on collaborative self-organizing surrounding control algorithm of USV swarm[J].CAAI Transactions on Intelligent Systems,2025,20():162.[doi:10.11992/tis.202405025]
[11]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(1):93.[doi:10.11992/tis.201807010]
　XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14():93.[doi:10.11992/tis.201807010]

备注/Memo

收稿日期:2016-4-5;改回日期:。
基金项目:国家自然科学基金青年项目（61304166）.
作者简介:张文旭,男,1985年生,博士研究生,主要研究方向为多智能体系统、机器学习。发表论文4篇,其中被EI检索4篇;马磊,男,1972年生,教授,博士,主要研究方向为控制理论及其在机器人、新能源和轨道交通系统中的应用等。主持国内外项目14项,发表论文40余篇,其中被EI检索37篇;王晓东,男,1992年生,硕士研究生,主要研究方向为机器学习。获得国家发明型专利3项,发表论文4篇。
通讯作者:张文旭.Email:wenxu_zhang@163.com.

更新日期/Last Update: 1900-01-01

基于事件驱动的多智能体强化学习研究 PDF下载HTML

备注/Memo

基于事件驱动的多智能体强化学习研究

PDF下载 HTML