[1]金卓军,钱? 徽,陈沈轶,等.回报函数学习的学徒学习综述[J].智能系统学报,2009,4(3):208-212.
JIN Zhuo-jun,QIAN Hui,CHEN Shen-yi,et al.Survey of apprenticeship learning based on reward function learning[J].CAAI Transactions on Intelligent Systems,2009,4(3):208-212.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
4
期数:
2009年第3期
页码:
208-212
栏目:
综述
出版日期:
2009-06-25
- Title:
-
Survey of apprenticeship learning based on reward function learning
- 文章编号:
-
1673-4785(2009)03-0208-05
- 作者:
-
金卓军,钱? 徽,陈沈轶,朱淼良
-
浙江大学 计算机学院, 浙江 杭州 310027
- Author(s):
-
JIN Zhuo-jun, QIAN Hui, CHEN Shen-yi, ZHU Miao-liang
-
Department of Computer Science, Zhejiang University, Hangzhou 310027, China
-
- 关键词:
-
学徒学习; 回报函数; 逆向增强学习; 最大化边际规划
- Keywords:
-
apprenticeship learning; reward function; inverse reinforcement learning; maximum margin planning
- 分类号:
-
TP181
- 文献标志码:
-
A
- 摘要:
-
通过研究基于回报函数学习的学徒学习的发展历史和目前的主要工作,概述了基于回报函数学习的学徒学习方法.分别在回报函数为线性和非线性条件下讨论,并且在线性条件下比较了2类方法——基于逆向增强学习(IRL)和最大化边际规划(MMP)的学徒学习.前者有较为快速的近似算法,但对于演示的最优性作了较强的假设;后者形式上更易于扩展,但计算量大.最后,提出了该领域现在还存在的问题和未来的研究方向,如把学徒学习应用于POMDP环境下,用PBVI等近似算法或者通过PCA等降维方法对数据进行学习特征的提取,从而减少高维度带来的大计算量问题.
- Abstract:
-
This paper focuses on apprenticeship learning, based on reward function learning. Both the historical basis of this field and a broad selection of current work were investigated. In this paper, two kinds of algorithm—apprenticeship learning methods based on inverse reinforcement learning (IRL) and maximum margin planning (MMP) frameworks were discussed under respective assumptions of linear and nonlinear reward functions. Comparison was made under the linear assumption conditions. The former can be implemented with an efficient approximate method but has made a strong supposition of optimal demonstration. The latter has a relatively easy to extend form but may take large amounts of computation. Finally, some suggestions were given for further research in reward function learning in a partially observable Markov decision process (POMDP) environment and in continuous/high dimensional space, using either an approximate algorithm such as pointbased value iteration (PBVI) or a feature abstraction algorithm using dimension reduction methods such as principle component analysis (PCA). Adopting these may alleviate the curse of dimensionality.
备注/Memo
收稿日期:2008-10-08.
基金项目:国家自然科学基金资助项目(90820306);浙江省科技厅重大资助项目(006c13096).
通信作者:钱 徽.E-mail: qianhui@zju.edu.cn.
作者简介:
金卓军,男,1984年生,博士研究生,主要研究方向为机器学习.
?钱徽,男,1974年生,副教授,人工智能学会智能机器人专业委员会委员,主要研究方向为人工智能、计算机视觉.
陈沈轶,男,1980生,博士研究生,主要研究方向为机器学习.
更新日期/Last Update:
2009-08-31