[1]YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15(5):888-899.[doi:10.11992/tis.202003031]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 5
Page number:
888-899
Column:
学术论文—智能系统
Public date:
2020-09-05
- Title:
-
Survey of sparse reward algorithms in reinforcement learning — theory and experiment
- Author(s):
-
YANG Rui1; YAN Jiangpeng1; LI Xiu1; 2
-
1. Department of Automation, Tsinghua University, Beijing 100084, China;
2. Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
-
- Keywords:
-
reinforcement learning; deep reinforcement learning; machine learning; sparse reward; neural networks; artificial intelligence; deep learning
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202003031
- Abstract:
-
In recent years, reinforcement learning has achieved great success in a range of sequential decision-making applications such as games and robotic control. However, the reward signals are very sparse in many real-world situations, which makes it difficult for agents to determine an optimal strategy based on interaction with the environment. This problem is called the sparse reward problem. Research on sparse reward can advance both the theory and actual applications of reinforcement learning. We investigated the current research status of the sparse reward problem and used the external information as the clue to introduce the following six classes of algorithms: reward shaping, imitation learning, curriculum learning, hindsight experience replay, curiosity-driven algorithms, and hierarchical reinforcement learning. To conduct experiments in the sparse reward environment Fetch Reach, we implemented typical algorithms from the above six classes, followed by thorough comparison and analysis of the results. Algorithms that utilize external information were found to outperform those without external information, but the latter are less dependent on data. Both methods have great research significance. At last, summarize the current sparse reward algorithms and forecast future work.