[1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(02):95-100.
 LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6(02):95-100.
点击复制

面向资源分配问题的Q-CF多智能体强化学习(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第6卷
期数:
2011年02期
页码:
95-100
栏目:
出版日期:
2011-04-25

文章信息/Info

Title:
Q-CF multiAgent reinforcement learningfor resource allocation problems
文章编号:
1673-4785(2011)02-0095-06
作者:
连传强徐昕吴军李兆斌
国防科技大学 机电工程与自动化学院,湖南 长沙 410073
Author(s):
LIAN Chuanqiang XU Xin WU Jun LI Zhaobin
College of Mechatronics and Automation, National University of Defense Technology, Changsha 410073, China
关键词:
多智能体系统强化学习资源分配协同控制
Keywords:
multiAgent system reinforcement learning resource allocation cooperation control
分类号:
TP391.1
文献标志码:
A
摘要:
多智能体强化学习算法在用于复杂的分布式系统时存在着状态空间大、学习效率低等问题.针对网络环境中的资源分配问题对多智能体强化学习算法进行了研究,将Q学习算法和链式反馈(chain feedback,CF)学习算法相结合,提出了QCF多智能体强化学习算法,利用一种称为信息链式反馈的机制实现了多智能体之间的高效协同.仿真结果表明,和已有的多智能体Q学习算法相比,该方法具有更加快速的收敛速度,同时保证了协同策略的性能优化.
Abstract:
When a multiAgent reinforcement learning algorithm is used in complex distributed systems, problems such as huge state space and low learning efficiency arise. In this paper, a multiAgent reinforcement learning algorithm was studied for the resource allocation problem in a network environment. By combining the Qlearning algorithm and the chain feedback learning mechanism, a novel QCF multiAgent reinforcement learning algorithm was presented. In the QCF algorithm, multiAgent cooperation was realized based on the mechanism of information chain feedback. Simulation results show that compared with the multiAgent Qlearning algorithm in existence, the proposed algorithm in this paper has a faster convergence speed while at the same time ensures the performance optimization of cooperation policy.

参考文献/References:

[1]CHONGJIE Z, LESSER V, SHENOY P. A multiAgent learning approach to resource sharing across computing clusters[R].Computer Science Department, University of Massachusetts Computer Science Amherst UMass, UMCS2008035, 2008.
[2]KO P C, LIN P C, YOU J A, et al. Multilayer allocated learning based neural network for resource allocation optimization[C]// Proceedings of the 9th Joint Conference on Information Sciences(JCIS 2006). Taibei, China, 2006: 3541.
[3]TESAURO G. Online resource allocation using decompositional reinforcement learning[C]//Proceedings of AAAI 2005. Pittsburgh, USA, 2005: 886891.
[4]LITTMAN M L, STONE P. Leading bestresponse strategies in repeated games[C]//The 17th Annual International Joint Conference on Artificial Intelligence Workshop on Economic Agents, Models, and Mechanism. Seattle, Washington, USA, 2001: 745756.
[5]HU J, WELLMAN M P. Multiagent reinforcement learning in stochastic games[OL]. Citeseer. ist. psu. edu/hu99multiagent. Html, 1999.
[6]BUSONIU L, De SCHUTTER B, BABUSKA R. Multiagent reinforcement learning with adaptive state focus[C]//Proceedings of the 17th BelgiumNetherlands Conference on Artificial Intelligence. Brussels, Belgium, 2005: 3542.
[7]KOK J R, VLASSIS N. Collaborative multiagent reinforcement learning by payoff propagation[J]. Journal of Machine Learning Research, 2006, 7: 17891828.
[8]杨佩,陈兆乾,陈世福. 机器学习在RoboCup中的应用研究[J].计算机科学, 2003, 30(6): 118121. YANG Pei, CHEN Zhaoqian, CHEN Shifu. RoboCup multiAgent system machinelearning[J].Computer Sciences, 2003, 30(6): 118121.
[9]王醒策,张汝波,顾国昌. 基于强化学习的多机器人编队方法研究[J].计算机工程, 2002, 28(6): 1516. WANG Xingce, ZHANG Rubo, GU Guochang. Research on multiAgent team formation based on reinforcement learning[J].Computer Engineering, 2002, 28(6): 1516.
[10]HU J, WELLMAN M P. Nash Qlearning for generalsum stochastic games[J]. Journal of Machine Learning Research, 2003, 4: 10391069.
[11]ALPAYDM E. 机器学习导论[M]. 范明,等译. 北京:北京工业出版社, 2009: 244255.
 [12]LAGOUDAKIS M G, PARR R. Leastsquares policy iteration[J]. Journal of Machine Learning Research, 2003 (4): 11071149.
[13]XU X, HU D W, LU X C. Kernel based leastsquares policy iteration[J]. IEEE Transactions on Neural Networks, 2007, 18(4): 973992.

相似文献/References:

[1]沈 晶,顾国昌,刘海波.基于多智能体的Option自动生成算法[J].智能系统学报,2006,1(01):84.
 SHEN Jing,GU Guo-chang,LIU Hai-bo.Algorithm for automatic constructing Option based on multi-agent[J].CAAI Transactions on Intelligent Systems,2006,1(02):84.
[2]李宗刚,贾英民.一类具有群体LEADER的多智能体系统的聚集行为[J].智能系统学报,2006,1(02):26.
 LI Zong-gang,JIA Ying-min.Aggregation of MultiAgent systems with group leaders[J].CAAI Transactions on Intelligent Systems,2006,1(02):26.
[3]王建春,谢广明.含有障碍物环境下多智能体系统的聚集行为[J].智能系统学报,2007,2(05):78.
 WANG Jian-chun,XIE Guang-ming.Aggregation behaviors of multiAgent systems in an environment with obstacles[J].CAAI Transactions on Intelligent Systems,2007,2(02):78.
[4]王 龙,伏 锋,陈小杰,等.复杂网络上的群体决策[J].智能系统学报,2008,3(02):95.
 WANG Long,FU Feng,CHEN Xiao-jie,et al.Collective decision-making over complex networks[J].CAAI Transactions on Intelligent Systems,2008,3(02):95.
[5]王冬梅,方华京.反馈控制策略的自适应群集运动[J].智能系统学报,2011,6(02):141.
 WANG Dongmei,FANG Huajing.An adaptive flocking motion with a leader based on a feedback control scheme[J].CAAI Transactions on Intelligent Systems,2011,6(02):141.
[6]董洁,纪志坚,王晓晓.多智能体网络系统的能控性代数条件[J].智能系统学报,2015,10(5):747.[doi:10.11992/tis.201411030]
 DONG Jie,JI Zhijian,WANG Xiaoxiao.Algebraic conditions for the controllability of multi-agent systems[J].CAAI Transactions on Intelligent Systems,2015,10(02):747.[doi:10.11992/tis.201411030]
[7]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
 LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11(02):149.[doi:10.11992/tis.201510031]
[8]王中林,刘忠信,陈增强,等.一种多智能体领航跟随编队新型控制器的设计[J].智能系统学报,2014,9(03):298.[doi:10.3969/j.issn.1673-4785.]
 WANG Zhonglin,LIU Zhongxin,CHEN Zengqiang,et al.A kind of new type controller for multi-agent leader-follower formation[J].CAAI Transactions on Intelligent Systems,2014,9(02):298.[doi:10.3969/j.issn.1673-4785.]
[9]王晓晓,纪志坚.广播信号下非一致多智能体系统的能控性[J].智能系统学报,2014,9(04):401.[doi:10.3969/j.issn.1673-4785.201401011]
 WANG Xiaoxiao,JI Zhijian.Controllability of non-identical multi-agent systems under a broadcasting control signal[J].CAAI Transactions on Intelligent Systems,2014,9(02):401.[doi:10.3969/j.issn.1673-4785.201401011]
[10]马晨,陈雪波.基于包含原理的多智能体一致性协调控制[J].智能系统学报,2014,9(04):468.[doi:10.3969/j.issn.1673-4785.201306024]
 MA Chen,CHEN Xuebo.Coordinated control of the consensus of a multi-agent system based on the inclusion principle[J].CAAI Transactions on Intelligent Systems,2014,9(02):468.[doi:10.3969/j.issn.1673-4785.201306024]

备注/Memo

备注/Memo:
收稿日期:2010-03-25.
基金项目:国家自然科学基金资助项目(60774076,90820302).
通信作者:连传强.
E-mail:wzdslcq@163.com.
作者简介:
连传强,男,1986年生,硕士研究生,主要研究方向为模式识别与机器学习.
徐昕,男,1974年生,研究员,博士,主要研究方向为增强学习、自适应动态规划理论和算法、智能移动机器人、智能系统.
吴军,男,1980年生,博士研究生.主要研究方向为多机器人系统、机器学习与智能系统.
更新日期/Last Update: 2011-05-19