[1]金卓军,钱? 徽,陈沈轶,等.回报函数学习的学徒学习综述[J].智能系统学报,2009,4(3):208-212.
 JIN Zhuo-jun,QIAN Hui,CHEN Shen-yi,et al.Survey of apprenticeship learning based on reward function learning[J].CAAI Transactions on Intelligent Systems,2009,4(3):208-212.
点击复制

回报函数学习的学徒学习综述

参考文献/References:
[1]ATKESON C G, SCHAAL S. Robot learning from demonstration[C]//Proceedings of the Fourteenth International Conference on Machine Learning. Nashville, USA, 1997: 1220.
[2]RATLIFF N D, BAGNELL J A, ZINKEVICH M A. Maximum margin planning[C]//Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 729736.
[3]金卓军, 钱 徽, 陈沈轶,等. 基于回报函数逼近的学徒学习综述[J]. 华中科技大学学报:自然科学版,2008(S1): 288290, 294.
JIN Zhuojun, QIAN Hui, CHEN Shenyi, et al. Survey of apprenticeship learning based on reward function approximating[J]. Journal of Huazhong University of Science and Technology: Nature Science, 2008, 36(S1): 288290, 294.
[4]NG A Y, RUSSELL S J. Algorithms for inverse reinforcement learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, USA, 2000: 663670.
[5]ABBEEL P, NG A Y. Apprenticeship learning via inverse reinforcement learning[C]//Proceedings of the Twentyfirst International Conference on Machine Learning. Banff, Canada, 2004:18
[6]KOLTER J Z, ABBEEL P, NG A Y. Hierarchical apprenticeship learning with application to quadruped locomotion [C]//Advances in Neural Information Processing Systems.Cambridge, USA: MIT Press, 2008.
[7]RATLIFF N, BAGNELL J A, ZINKEVICH M A. Subgradient methods for maximum margin structured learning[C]//Workshop on Learning in Structured Outputs Spaces at ICML. Pittsburgh, USA, 2006.
[8]SYED U, BOWLING M, SCHAPIRE R E. Apprenticeship learning using linear programming[C]//Proceedings of the 25 International Conference on Machine Learning (ICML 2008). Helsinki, Finland, 2008: 10321039.
[9]SYED U, SCHAPIRE R E. A gametheoretic approach to apprenticeship learning[C]//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2008.
[10]GRIMES D B, RAJESH D R, RAO R P N. Learning nonparametric models for probabilistic imitation[C]//Proceedings of Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007: 521528.
[11]ABBEEL P, COATES A, QUIGLEY M, et al. An application of reinforcement learning to aerobatic helicopter flight[C]//Proceedings of Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007: 18.
[12]KOLTER J Z, RODGERS M P, NG A Y. A complete control architecture for quadruped locomotion over rough terrain[C]//IEEE International Conference on Robotics and Automation. Pasadena, USA, 2008: 811818.
[13]REBULA J R, NEUHAUS P D, BONNLANDER B V, et al. A controller for the littledog quadruped walking on rough terrain[C]//2007 IEEE International Conference on Robotics and Automation. Roma, Italy, 2007: 14671473.
?[14]KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey[J]. Journal of Artificial Intelligence Research, 1996, 4: 237285.
[15]SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge, USA: MIT Press, 1998.
[16]COATES A, ABBEEL P, NG A Y. Reinforcement learning with multiple demonstrations[C]//The Twentyfirst Annual Conference on Neural Information Processing Systems (NIPS 2007). Vancouver, Canada, 2007.
[17]TASKAR B, CHATALBASHEV V, KOLLER D, et al. Learning structured prediction models: a large margin approach[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005: 896903.
?[18]TASKAR B, LACOSTEJULIEN S, JORDAN M. Structured prediction via the extragradient method[C]//Proceedings of Neural Information Processing Systems.Vancouver, Canada, 2005: 13451352.
[19]SHOR N Z, KIWIEL K C, RUSZCAYNSKI A. Minimization methods for nondifferentiable functions[M]. New York, USA: SpringerVerlag, 1985.
[20]TSOCHANTARIDIS I, JOACHIMS T, HOFMANN T, et al. Large margin methods for structured and interdependent output variables[J]. The Journal of Machine Learning Research, 2005, 6: 14531484
[21]CHECHIK G, HEITZ G, ELIDAN G, et al. Maxmargin classification of incomplete data [C]//Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference. Cambridge, USA: MIT Press, 2007:233240.
[22]NEU G, SZEPESVARI C. Apprenticeship learning using inverse reinforcement learning and gradient methods[C]//Proceedings of Uncertainty in Artificial Intelligence. Vancouver, Canada, 2007: 295302.

备注/Memo

收稿日期:2008-10-08.
基金项目:国家自然科学基金资助项目(90820306);浙江省科技厅重大资助项目(006c13096).
通信作者:钱 徽.E-mail: qianhui@zju.edu.cn.
作者简介:
金卓军,男,1984年生,博士研究生,主要研究方向为机器学习.
?钱徽,男,1974年生,副教授,人工智能学会智能机器人专业委员会委员,主要研究方向为人工智能、计算机视觉.
陈沈轶,男,1980生,博士研究生,主要研究方向为机器学习.

更新日期/Last Update: 2009-08-31
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com