[1]JIN Zhuo-jun,QIAN Hui,CHEN Shen-yi,et al.Survey of apprenticeship learning based on reward function learning[J].CAAI Transactions on Intelligent Systems,2009,4(3):208-212.
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
4
Number of periods:
2009 3
Page number:
208-212
Column:
综述
Public date:
2009-06-25
- Title:
-
Survey of apprenticeship learning based on reward function learning
- Author(s):
-
JIN Zhuo-jun; QIAN Hui; CHEN Shen-yi; ZHU Miao-liang
-
Department of Computer Science, Zhejiang University, Hangzhou 310027, China
-
- Keywords:
-
apprenticeship learning; reward function; inverse reinforcement learning; maximum margin planning
- CLC:
-
TP181
- DOI:
-
-
- Abstract:
-
This paper focuses on apprenticeship learning, based on reward function learning. Both the historical basis of this field and a broad selection of current work were investigated. In this paper, two kinds of algorithm—apprenticeship learning methods based on inverse reinforcement learning (IRL) and maximum margin planning (MMP) frameworks were discussed under respective assumptions of linear and nonlinear reward functions. Comparison was made under the linear assumption conditions. The former can be implemented with an efficient approximate method but has made a strong supposition of optimal demonstration. The latter has a relatively easy to extend form but may take large amounts of computation. Finally, some suggestions were given for further research in reward function learning in a partially observable Markov decision process (POMDP) environment and in continuous/high dimensional space, using either an approximate algorithm such as pointbased value iteration (PBVI) or a feature abstraction algorithm using dimension reduction methods such as principle component analysis (PCA). Adopting these may alleviate the curse of dimensionality.