<-Previous Article Next Article->

[1]SHEN Jing,GU Guo-chang,LIU Hai-bo.Algorithm for automatic constructing Option based on multi-agent[J].CAAI Transactions on Intelligent Systems,2006,1(1):84-87.

Copy

Algorithm for automatic constructing Option based on multi-agent

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 1 Number of periods: 2006 1 Page number: 84-87 Column: 学术论文—人工智能基础 Public date: 2006-03-25

Title:: Algorithm for automatic constructing Option based on multi-agent

Author(s):: SHEN Jing; GU Guo-chang; LIU Hai-bo; School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China

Keywords:: hierarchical reinforcement learning; automatic hierarchy; multi-agent system; Option; aiNet

CLC:: TP18

DOI:: -

Abstract:: In current hierarchical reinforcement learning, the automatic task hie rarchies are constructed by low speed serial learning algorithm based on single agent. A multi-agent based algorithm for constructing Options aut omatically was presented for speeding up the learning algorithm. The algorithm was developed on the basis of the Option HRL framework proposed by Sutton. Firstly, multiple agents cooperated in parallel exploring the state space. Then the stat e space was partitioned into several sub-spaces via immune clustering based on a iN et. Next, the agents learned the local strategies of the different subspace co ncu rrently. Consequently, the Options were constructed. The theoretical analyses an d experiments with shortest path planning in a twodimensional grid space wit h obstacles show that the speed of multiagent based algorithm for automaticall y con structing Options was obviously faster than that of singleagent based algorith ms.

References:: ［1］ BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement le arni ng［J］. Discrete Event Dynamic Systems: Theory and Applications, 2003,13(4): 41-77.
［2］ SUTTON R S, PRECUP D, SINGH S P. Between MDPs and semi-MDPs: a framew ork for temporal abstraction in reinforcement learning［J］. Artificial Intelligence, 1 999,112(1): 181-211.
［3］ PARR R. Hierarchical control and learning for Markov decision processes ［D］. Berkeley: University of California, 1998.
［4］ DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value func tion decomposition［J］. Journal of Artificial Intelligence Research, 2000,13(1) : 227-303.
［5］ DIGNEY B L. Learning hierarchical control structures for multiple tas ks and changing environments［A］. Proc of the 5th International Conference on Simulat ion of Adaptive Behavior［C］. Zurich, Switzerland, 1998.
［6］ MCGOVERN A, BARTO A. Autonomous discovery of subgoals in reinforcem ent learn ing using diverse density［A］. Proc of the 8th International Conference on Mac hine Learning［C］. San Fransisco: Morgan Kaufmann, 2001.
［7］ MENACHE I, MANNOR S, SHIMKIN N. Qcut: dynamic discovery of sub-goal s in rei nforcement learning［A］. Proc the 13th European Conference on Machine Learning ［C］. Helsinki, Finland, 2002.
［8］ MANNOR S, MENACHE I, HOZE A, et al. Dynamic abstraction in reinforce ment lea rning via clustering［A］. Proc of the 21th International Conference on Machine Learning［C］. Banff, Canada, 2004.
［9］ DE CASTRO L N, VON ZUBEN F N. An evolutionary immune network for data cluste ring［A］. Proc of the IEEE Brazilian Symposium on Artificial Neural Networks［ C］. Rio de Janeiro, Brazil, 2000.

Similar References:

Memo

Last Update: 2009-04-07

Algorithm for automatic constructing Option based on multi-agent PDF DownloadHTML

Memo

Algorithm for automatic constructing Option based on multi-agent

PDF Download HTML