<-Previous Article Next Article->

[1]JIN Zhuo-jun,QIAN Hui,CHEN Shen-yi,et al.Survey of apprenticeship learning based on reward function learning[J].CAAI Transactions on Intelligent Systems,2009,4(3):208-212.

Copy

Survey of apprenticeship learning based on reward function learning

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 4 Number of periods: 2009 3 Page number: 208-212 Column: 综述 Public date: 2009-06-25

Title:: Survey of apprenticeship learning based on reward function learning

Author(s):: JIN Zhuo-jun; QIAN Hui; CHEN Shen-yi; ZHU Miao-liang; Department of Computer Science, Zhejiang University, Hangzhou 310027, China

Keywords:: apprenticeship learning; reward function; inverse reinforcement learning; maximum margin planning

CLC:: TP181

DOI:: -

Abstract:: This paper focuses on apprenticeship learning, based on reward function learning. Both the historical basis of this field and a broad selection of current work were investigated. In this paper, two kinds of algorithm—apprenticeship learning methods based on inverse reinforcement learning (IRL) and maximum margin planning (MMP) frameworks were discussed under respective assumptions of linear and nonlinear reward functions. Comparison was made under the linear assumption conditions. The former can be implemented with an efficient approximate method but has made a strong supposition of optimal demonstration. The latter has a relatively easy to extend form but may take large amounts of computation. Finally, some suggestions were given for further research in reward function learning in a partially observable Markov decision process (POMDP) environment and in continuous/high dimensional space, using either an approximate algorithm such as pointbased value iteration (PBVI) or a feature abstraction algorithm using dimension reduction methods such as principle component analysis (PCA). Adopting these may alleviate the curse of dimensionality.

References:: ［1］ATKESON C G, SCHAAL S. Robot learning from demonstration［C］//Proceedings of the Fourteenth International Conference on Machine Learning. Nashville, USA, 1997: 1220.
［2］RATLIFF N D, BAGNELL J A, ZINKEVICH M A. Maximum margin planning［C］//Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 729736.
［3］金卓军, 钱徽, 陈沈轶，等. 基于回报函数逼近的学徒学习综述［J］. 华中科技大学学报：自然科学版，2008(S1): 288290, 294.
JIN Zhuojun, QIAN Hui, CHEN Shenyi, et al. Survey of apprenticeship learning based on reward function approximating［J］. Journal of Huazhong University of Science and Technology: Nature Science, 2008, 36(S1): 288290, 294.
［4］NG A Y, RUSSELL S J. Algorithms for inverse reinforcement learning［C］//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, USA, 2000: 663670.
［5］ABBEEL P, NG A Y. Apprenticeship learning via inverse reinforcement learning［C］//Proceedings of the Twentyfirst International Conference on Machine Learning. Banff, Canada, 2004:18
［6］KOLTER J Z, ABBEEL P, NG A Y. Hierarchical apprenticeship learning with application to quadruped locomotion ［C］//Advances in Neural Information Processing Systems.Cambridge, USA: MIT Press, 2008.
［7］RATLIFF N, BAGNELL J A, ZINKEVICH M A. Subgradient methods for maximum margin structured learning［C］//Workshop on Learning in Structured Outputs Spaces at ICML. Pittsburgh, USA, 2006.
［8］SYED U, BOWLING M, SCHAPIRE R E. Apprenticeship learning using linear programming［C］//Proceedings of the 25 International Conference on Machine Learning (ICML 2008). Helsinki, Finland, 2008： 10321039.
［9］SYED U, SCHAPIRE R E. A gametheoretic approach to apprenticeship learning［C］//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2008.
［10］GRIMES D B, RAJESH D R, RAO R P N. Learning nonparametric models for probabilistic imitation［C］//Proceedings of Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007: 521528.
［11］ABBEEL P, COATES A, QUIGLEY M, et al. An application of reinforcement learning to aerobatic helicopter flight［C］//Proceedings of Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007: 18.
［12］KOLTER J Z, RODGERS M P, NG A Y. A complete control architecture for quadruped locomotion over rough terrain［C］//IEEE International Conference on Robotics and Automation. Pasadena, USA, 2008: 811818.
［13］REBULA J R, NEUHAUS P D, BONNLANDER B V, et al. A controller for the littledog quadruped walking on rough terrain［C］//2007 IEEE International Conference on Robotics and Automation. Roma, Italy, 2007: 14671473.
?［14］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey［J］. Journal of Artificial Intelligence Research, 1996, 4: 237285.
［15］SUTTON R S, BARTO A G. Reinforcement learning: an introduction［M］. Cambridge, USA: MIT Press, 1998.
［16］COATES A, ABBEEL P, NG A Y. Reinforcement learning with multiple demonstrations［C］//The Twentyfirst Annual Conference on Neural Information Processing Systems (NIPS 2007). Vancouver, Canada, 2007.
［17］TASKAR B, CHATALBASHEV V, KOLLER D, et al. Learning structured prediction models: a large margin approach［C］//Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005: 896903.
?［18］TASKAR B, LACOSTEJULIEN S, JORDAN M. Structured prediction via the extragradient method［C］//Proceedings of Neural Information Processing Systems.Vancouver, Canada, 2005： 13451352.
［19］SHOR N Z, KIWIEL K C, RUSZCAYNSKI A. Minimization methods for nondifferentiable functions［M］. New York, USA: SpringerVerlag, 1985.
［20］TSOCHANTARIDIS I, JOACHIMS T, HOFMANN T, et al. Large margin methods for structured and interdependent output variables［J］. The Journal of Machine Learning Research, 2005, 6: 14531484
［21］CHECHIK G, HEITZ G, ELIDAN G, et al. Maxmargin classification of incomplete data ［C］//Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference. Cambridge, USA: MIT Press, 2007：233240.
［22］NEU G, SZEPESVARI C. Apprenticeship learning using inverse reinforcement learning and gradient methods［C］//Proceedings of Uncertainty in Artificial Intelligence. Vancouver, Canada, 2007: 295302.

Similar References:

Memo

Last Update: 2009-08-31

Survey of apprenticeship learning based on reward function learning PDF DownloadHTML

Memo

Survey of apprenticeship learning based on reward function learning

PDF Download HTML