[1]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,(05):590-594.[doi:10.11992/tis.201706031]
 ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,(05):590-594.[doi:10.11992/tis.201706031]
点击复制

分层强化学习综述(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2017年05期
页码:
590-594
栏目:
出版日期:
2017-10-25

文章信息/Info

Title:
Summarize of hierarchical reinforcement learning
作者:
周文吉 俞扬
南京大学 软件新技术国家重点实验室, 江苏 南京 210023
Author(s):
ZHOU Wenji YU Yang
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
关键词:
人工智能机器学习强化学习分层强化学习深度强化学习马尔可夫决策过程半马尔可夫决策过程维度灾难
Keywords:
artificial intelligencemachine learningreinforcement learninghierarchical reinforcement learningdeep reinforcement learningMarkov decision processsemi-Markov decision processdimensional curse
分类号:
TP391
DOI:
10.11992/tis.201706031
摘要:
强化学习(reinforcement learning)是机器学习和人工智能领域的重要分支,近年来受到社会各界和企业的广泛关注。强化学习算法要解决的主要问题是,智能体如何直接与环境进行交互来学习策略。但是当状态空间维度增加时,传统的强化学习方法往往面临着维度灾难,难以取得好的学习效果。分层强化学习(hierarchical reinforcement learning)致力于将一个复杂的强化学习问题分解成几个子问题并分别解决,可以取得比直接解决整个问题更好的效果。分层强化学习是解决大规模强化学习问题的潜在途径,然而其受到的关注不高。本文将介绍和回顾分层强化学习的几大类方法。
Abstract:
Reinforcement Learning (RL) is an important research area in the field of machine learning and artificial intelligence and has received increasing attentions in recent years. The goal in RL is to maximize long-term total reward by interacting with the environment. Traditional RL algorithms are limited due to the so-called curse of dimensionality, and their learning abilities degrade drastically with increases in the dimensionality of the state space. Hierarchical reinforcement learning (HRL) decomposes the RL problem into sub-problems and solves each of them to improve learning ability. HRL offers a potential way to solve large-scale RL, which has received insufficient attention to date. In this paper, we introduce and review several main HRL methods.

参考文献/References:

[1] BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement learning[J]. Discrete event dynamic systems, 2013,13(4):341-379.
[2] YAN Q, LIU Q, HU D. A hierarchical reinforcement learning algorithm based on heuristic reward function[C]//In Proceedings of 2nd International Conference on Advanced Computer Control. Shenyang, China, 2010, 3:371-376.
[3] DETHLEFS N, CUAYÁHUITL H. Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue[C]//European Workshop on Natural Language Generation. Nancy, France,2011:110-120.
[4] AL-EMRAN M. Hierarchical reinforcement learning:a survey[J]. International journal of computing and digital systems, 2015, 4(2):137-143.
[5] MAHADEVAN S, MARCHALLECK N. Self-improving factory simulation using continuous-time average-reward reinforcement learning[C]. In Proceedings of the Machine Learning International Workshop. Nashville, USA, 1997:202-210.
[6] HOWARD R A. Semi-Markov and decision processes[M]. New York:DOVER Publications, 2007.
[7] GIL P, NUNES L. Hierarchical reinforcement learning using path clustering[C]//In Proceedings of 8th Iberian Conference on Information Systems and Technologies. Lisboa, Portugal, 2013:1-6.
[8] STULP F, SCHAAL S. Hierarchical reinforcement learning with movement primitives[C]//In Proceedings of 11th IEEE-RAS International Conference on Humanoid Robots. Bled, Slovenia, 2011:231-238.
[9] DU X, LI Q, HAN J. Applying hierarchical reinforcement learning to computer games[C]//In Proceedings of IEEE International Conference on Automation and Logistics. Xi’an, China, 2009:929-932.
[10] SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial intelligence, 1999, 112(1/2):181-211.
[11] PARR R, RUSSELL S. Reinforcement learning with hierarchies of machines[C]//Advances in Neural Information Processing Systems. Colorado, USA, 1998:1043-1049.
[12] DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of artificial intelligence research, 2000, 13:227-303.
[13] MOHSEN G, TAGHIZADEH N, et al. Automatic abstraction in reinforcement learning using ant system algorithm[C]//In Proceedings of AAAI Spring Symposium:Lifelong Machine Learning. Stanford, USA, 2013:114-122.
[14] PIERRE-LUC BACON, JEAN HARB. The option-critic architecture[C]//In Proceeding of 31th AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017:1726-1734.
[15] VEZHNEVETS A S, OSINDERO S, SCHAUL T, et al. FeUdal networks for hierarchical reinforcement learning[C]//In Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia, 2017:3540-3549.
[16] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(2):529-533.
[17] TEJAS D. K, KARTHNIK N, ARDAVAN S, et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]//Annual Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:3675-3683.
[18] CARLOS FLORENSA, YAN D, PIETER A. Stochastic neural networks for hierarchical reinforcement learning[EB/OL]. Berkeley, USA, arXiv. 2017, https://arxiv.org/pdf/1704.03012.pdf.
[19] LAKSHMINARAYANAN A S, KRISHNAMURTHY R, KUMAR P, et al. Option discovery in hierarchical reinforcement learning using spatio-temporal clustering[EB/OL]. Madras, India, arXiv, 2016, https://arxiv.org/pdf/1605.05359.pdf.
[20] XUE B, GLEN B. DeepLoco:dynamic locomotion skills using hierarchical deep reinforcement learning[J]. ACM transactions on graphics,2017, 36(4):1-13.

相似文献/References:

[1]李德毅.网络时代人工智能研究与发展[J].智能系统学报,2009,(01):1.
 LI De-yi.AI research and development in the network age[J].CAAI Transactions on Intelligent Systems,2009,(05):1.
[2]赵克勤.二元联系数A+Bi的理论基础与基本算法及在人工智能中的应用[J].智能系统学报,2008,(06):476.
 ZHAO Ke-qin.The theoretical basis and basic algorithm of binary connection A+Bi and its application in AI[J].CAAI Transactions on Intelligent Systems,2008,(05):476.
[3]徐玉如,庞永杰,甘 永,等.智能水下机器人技术展望[J].智能系统学报,2006,(01):9.
 XU Yu-ru,PANG Yong-jie,GAN Yong,et al.AUV—state-of-the-art and prospect[J].CAAI Transactions on Intelligent Systems,2006,(05):9.
[4]王志良.人工心理与人工情感[J].智能系统学报,2006,(01):38.
 WANG Zhi-liang.Artificial psychology and artificial emotion[J].CAAI Transactions on Intelligent Systems,2006,(05):38.
[5]赵克勤.集对分析的不确定性系统理论在AI中的应用[J].智能系统学报,2006,(02):16.
 ZHAO Ke-qin.The application of uncertainty systems theory of set pair analysis (SPU)in the artificial intelligence[J].CAAI Transactions on Intelligent Systems,2006,(05):16.
[6]秦裕林,朱新民,朱 丹.Herbert Simon在最后几年里的两个研究方向[J].智能系统学报,2006,(02):11.
 QIN Yu-lin,ZHU Xin-min,ZHU Dan.Herbert Simons two research directions in his lost years[J].CAAI Transactions on Intelligent Systems,2006,(05):11.
[7]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,(05):148.
[8]谷文祥,李 丽,李丹丹.规划识别的研究及其应用[J].智能系统学报,2007,(01):1.
 GU Wen-xiang,LI Li,LI Dan-dan.Research and application of plan recognition[J].CAAI Transactions on Intelligent Systems,2007,(05):1.
[9]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,(05):45.
[10]杨春燕,蔡 文.可拓信息-知识-智能形式化体系研究[J].智能系统学报,2007,(03):8.
 YANG Chun-yan,CAI Wen.A formalized system of extension information-knowledge-intelligence[J].CAAI Transactions on Intelligent Systems,2007,(05):8.
[11]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,(05):183.[doi:10.3969/j.issn.1673-4785.201209056]
[12]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,(05):728.[doi:10.11992/tis.201611021]
[13]李雪,蒋树强.智能交互的物体识别增量学习技术综述[J].智能系统学报,2017,(02):140.[doi:10.11992/tis.201701006]
 LI Xue,JIANG Shuqiang.Incremental learning and object recognition system based on intelligent HCI: a survey[J].CAAI Transactions on Intelligent Systems,2017,(05):140.[doi:10.11992/tis.201701006]

备注/Memo

备注/Memo:
收稿日期:2017-06-09。
基金项目:国家自然科学基金项目(61375061);江苏省自然科学基金项目(BK20160066).
作者简介:周文吉,男,1995年生,硕士研究生,主要研究方向为强化学习和数据挖掘;俞扬,男,1982年生,副教授,博士生导师,主要研究方向为人工智能、机器学习、演化计算、数据挖掘。曾获2013年全国优秀博士学位论文奖,2011年中国计算机学会优秀博士学位论文奖。发表论文40余篇。
通讯作者:俞扬.E-mail:yuy@nju.edu.cn
更新日期/Last Update: 2017-10-25