[1]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888-899.[doi:10.11992/tis.202003031]
YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15(5):888-899.[doi:10.11992/tis.202003031]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第5期
页码:
888-899
栏目:
学术论文—智能系统
出版日期:
2020-09-05
- Title:
-
Survey of sparse reward algorithms in reinforcement learning — theory and experiment
- 作者:
-
杨瑞1, 严江鹏1, 李秀1,2
-
1. 清华大学 自动化系,北京 100084;
2. 清华大学 深圳国际研究生院,广东 深圳 518055
- Author(s):
-
YANG Rui1, YAN Jiangpeng1, LI Xiu1,2
-
1. Department of Automation, Tsinghua University, Beijing 100084, China;
2. Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
-
- 关键词:
-
强化学习; 深度强化学习; 机器学习; 稀疏奖励; 神经网络; 人工智能; 深度学习
- Keywords:
-
reinforcement learning; deep reinforcement learning; machine learning; sparse reward; neural networks; artificial intelligence; deep learning
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202003031
- 文献标志码:
-
A
- 摘要:
-
近年来,强化学习在游戏、机器人控制等序列决策领域都获得了巨大的成功,但是大量实际问题中奖励信号十分稀疏,导致智能体难以从与环境的交互中学习到最优的策略,这一问题被称为稀疏奖励问题。稀疏奖励问题的研究能够促进强化学习实际应用与落地,在强化学习理论研究中具有重要意义。本文调研了稀疏奖励问题的研究现状,以外部引导信息为线索,分别介绍了奖励塑造、模仿学习、课程学习、事后经验回放、好奇心驱动、分层强化学习等方法。本文在稀疏奖励环境Fetch Reach上实现了以上6类方法的代表性算法进行实验验证和比较分析。使用外部引导信息的算法平均表现好于无外部引导信息的算法,但是后者对数据的依赖性更低,两类方法均具有重要的研究意义。最后,本文对稀疏奖励算法研究进行了总结与展望。
- Abstract:
-
In recent years, reinforcement learning has achieved great success in a range of sequential decision-making applications such as games and robotic control. However, the reward signals are very sparse in many real-world situations, which makes it difficult for agents to determine an optimal strategy based on interaction with the environment. This problem is called the sparse reward problem. Research on sparse reward can advance both the theory and actual applications of reinforcement learning. We investigated the current research status of the sparse reward problem and used the external information as the clue to introduce the following six classes of algorithms: reward shaping, imitation learning, curriculum learning, hindsight experience replay, curiosity-driven algorithms, and hierarchical reinforcement learning. To conduct experiments in the sparse reward environment Fetch Reach, we implemented typical algorithms from the above six classes, followed by thorough comparison and analysis of the results. Algorithms that utilize external information were found to outperform those without external information, but the latter are less dependent on data. Both methods have great research significance. At last, summarize the current sparse reward algorithms and forecast future work.
备注/Memo
收稿日期:2020-03-19。
基金项目:国家自然科学基金项目(41876098)
作者简介:杨瑞,硕士研究生,主要研究方向为机器学习与强化学习;严江鹏,博士研究生,主要研究方向为人工智能与计算机视觉;李秀,教授,博士生导师,主要研究方向为智能系统、数据挖掘与模式识别。主持完成国家自然科学基金项目3项、深圳市基础研究项目2项、深圳市技术开发项目1项;参与完成国家863项目4项;目前在研863重大项目1项,国家自然科学基金项目1项。获得国家发明专利授权7项,国家软件著作权5项。发表学术论文100余篇
通讯作者:李秀.E-mail:li.xiu@sz.tsinghua.edu.cn
更新日期/Last Update:
2021-01-15