<-上一篇/Previous Article 下一篇/Next Article->

[1]沈晶,顾国昌,刘海波.基于多智能体的Option自动生成算法[J].智能系统学报,2006,1(1):84-87.
　SHEN Jing,GU Guo-chang,LIU Hai-bo.Algorithm for automatic constructing Option based on multi-agent[J].CAAI Transactions on Intelligent Systems,2006,1(1):84-87.

点击复制

基于多智能体的Option自动生成算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 1 期数: 2006年第1期页码: 84-87 栏目: 学术论文—人工智能基础出版日期: 2006-03-25

Title:: Algorithm for automatic constructing Option based on multi-agent

文章编号:: 1673-4785(2006)01-0084-04

作者:: 沈晶, 顾国昌, 刘海波; 哈尔滨工程大学计算机科学与技术学院，黑龙江哈尔滨150001

Author(s):: SHEN Jing，GU Guo-chang，LIU Hai-bo; School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China

关键词:: 分层强化学习; 自动分层; 多智能体系统; Option; aiNet

Keywords:: hierarchical reinforcement learning; automatic hierarchy; multi-agent system; Option; aiNet

分类号:: TP18

文献标志码:: A

摘要:: 目前分层强化学习中的任务自动分层都是采用基于单智能体的串行学习算法，为解决串行算法学习速度较慢的问题，以Sutton的Option分层强化学习方法为基础框架，提出了一种基于多智能体的Option自动生成算法，该算法由多智能体合作对状态空间进行并行探测并集中应用aiNet实现免疫聚类产生状态子空间，然后并行学习生成各子空间上的内部策略，最终生成Option. 以二维有障碍栅格空间内2点间最短路径规划为任务背景给出了算法并进行了仿真实验和分析.结果表明，基于多智能体的Option自动生成算法速度明显快于基于单智能体的算法.

Abstract:: In current hierarchical reinforcement learning, the automatic task hie rarchies are constructed by low speed serial learning algorithm based on single agent. A multi-agent based algorithm for constructing Options aut omatically was presented for speeding up the learning algorithm. The algorithm was developed on the basis of the Option HRL framework proposed by Sutton. Firstly, multiple agents cooperated in parallel exploring the state space. Then the stat e space was partitioned into several sub-spaces via immune clustering based on a iN et. Next, the agents learned the local strategies of the different subspace co ncu rrently. Consequently, the Options were constructed. The theoretical analyses an d experiments with shortest path planning in a twodimensional grid space wit h obstacles show that the speed of multiagent based algorithm for automaticall y con structing Options was obviously faster than that of singleagent based algorith ms.

参考文献/References:: ［1］ BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement le arni ng［J］. Discrete Event Dynamic Systems: Theory and Applications, 2003,13(4): 41-77.
［2］ SUTTON R S, PRECUP D, SINGH S P. Between MDPs and semi-MDPs: a framew ork for temporal abstraction in reinforcement learning［J］. Artificial Intelligence, 1 999,112(1): 181-211.
［3］ PARR R. Hierarchical control and learning for Markov decision processes ［D］. Berkeley: University of California, 1998.
［4］ DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value func tion decomposition［J］. Journal of Artificial Intelligence Research, 2000,13(1) : 227-303.
［5］ DIGNEY B L. Learning hierarchical control structures for multiple tas ks and changing environments［A］. Proc of the 5th International Conference on Simulat ion of Adaptive Behavior［C］. Zurich, Switzerland, 1998.
［6］ MCGOVERN A, BARTO A. Autonomous discovery of subgoals in reinforcem ent learn ing using diverse density［A］. Proc of the 8th International Conference on Mac hine Learning［C］. San Fransisco: Morgan Kaufmann, 2001.
［7］ MENACHE I, MANNOR S, SHIMKIN N. Qcut: dynamic discovery of sub-goal s in rei nforcement learning［A］. Proc the 13th European Conference on Machine Learning ［C］. Helsinki, Finland, 2002.
［8］ MANNOR S, MENACHE I, HOZE A, et al. Dynamic abstraction in reinforce ment lea rning via clustering［A］. Proc of the 21th International Conference on Machine Learning［C］. Banff, Canada, 2004.
［9］ DE CASTRO L N, VON ZUBEN F N. An evolutionary immune network for data cluste ring［A］. Proc of the IEEE Brazilian Symposium on Artificial Neural Networks［ C］. Rio de Janeiro, Brazil, 2000.

相似文献/References:: [1]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[2]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[3]雍宇晨,李子豫,董琦.基于分层多智能体强化学习的多无人机视距内空战[J].智能系统学报,2025,20(3):548.[doi:10.11992/tis.202408008]
　YONG Yuchen,LI Ziyu,DONG Qi.Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning[J].CAAI Transactions on Intelligent Systems,2025,20():548.[doi:10.11992/tis.202408008]

备注/Memo

收稿日期：2005-12-28.
基金项目：哈尔滨工程大学基础研究基金资助项目（HEUFT05021,HEUFT05068）.
作者简介：
沈??? 晶，女，1969年生，哈尔滨工程大学在读博士生.主要从事分层强化学习、人工免疫理论的研究.在国内外会议、期刊发表学术论文30余篇，参加翻译出版译著1部.
顾国昌，男，1946年生，教授，博士生导师.主要从事智能控制、智能机器人技术以及嵌入式系统研究，发表论文100余篇，并有多篇被EI、ISTP等收录.任中国人工智能学会智能机器人学会理事、黑龙江省计算机学会副理事长.
刘海波，男，1976年生，博士，IEEE专业会员，IAIA会员，中国计算机学会会员.主要从事神经心理学理论、多智能体技术与智能机器人体系结构相融合的研究，发表学术论文50余篇，出版编著3部、译著1部.

更新日期/Last Update: 2009-04-07

基于多智能体的Option自动生成算法 PDF下载HTML

备注/Memo

基于多智能体的Option自动生成算法

PDF下载 HTML