[1]YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15(5):888-899.[doi:10.11992/tis.202003031]
Copy

Survey of sparse reward algorithms in reinforcement learning — theory and experiment

References:
[1] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge, USA: MIT Press, 1998.
[2] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd ed. Cambridge: MIT Press, 2018.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[5] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with large scale deep reinforcement learning[EB/OL]. California, USA: arXiv, 2019. [2019-10-1] https://arxiv.org/pdf/1912.06680.pdf.
[6] SILVER D. Tutorial: Deep reinforcement learning[C]//Proc. of the 33rd Int. Conf. on Machine Learning (ICML 2016). 2016.
[7] LI Yuxi. Deep reinforcement learning: An overview[EB/OL]. Alberta, Canada: arXiv, 2017. [2019-10-2] https://arxiv.org/pdf/ 1701.07274.pdf.
[8] LI Yuxi. Deep reinforcement learning[EB/OL]. Alberta, Canada: arXiv, 2018. [2019-10-5] https://arxiv.org/pdf/1810.06339.pdf.
[9] Riedmiller M, Hafner R, Lampe T, et al. Learning by playing-solving sparse reward tasks from scratch[EB/OL]. London, UK: arXiv, 2018. [2019-10-20] https://arxiv.org/pdf/1802.10567.pdf.
[10] HOSU I A, REBEDEA T. Playing atari games with deep reinforcement learning and human checkpoint replay[EB/OL]. Bucharest, Romania: arXiv, 2016. [2019-10-21] https://arxiv.org/pdf/1607.05077.pdf.
[11] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 5048-5058.
[12] 杨惟轶, 白辰甲, 蔡超, 等. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学, 2020, 47(3): 182-191
YANG Weiyi, BAI Chenjia, CAI Chao, et al. Survey on sparse reward in deep reinforcement learning[J]. Computer science, 2020, 47(3): 182-191
[13] GULLAPALLI V, BARTO A G. Shaping as a method for accelerating reinforcement learning[C]//Proceedings of the 1992 IEEE International Symposium on Intelligent Control. Glasgow, UK, 1992: 554-559.
[14] HUSSEIN A, GABER M M, ELYAN E, et al. Imitation learning: A survey of learning methods[J]. ACM computing surveys, 2017, 50(2): 1-35.
[15] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada, 2009: 41-48.
[16] BURDA Y, EDWARDS H, PATHAK D, et al. Large-scale study of curiosity-driven learning[EB/OL]. California, USA: arXiv, 2018. [2019-10-30] https://arxiv.org/pdf/1808.04355.
[17] 周文吉, 俞扬. 分层强化学习综述[J]. 智能系统学报, 2017, 12(5): 590-594
ZHOU Wenji, YU Yang. Summarize of hierarchical reinforcement learning[J]. CAAI transactions on intelligent systems, 2017, 12(5): 590-594
[18] Plappert M, Andrychowicz M, Ray A, et al. Multi-goal reinforcement learning: Challenging robotics environments and request for research[EB/OL]. California, USA: arXiv, 2018. [2019-11-1] https://arxiv.org/pdf/1802.09464.pdf.
[19] 万里鹏, 兰旭光, 张翰博, 等. 深度强化学习理论及其应用综述[J]. 模式识别与人工智能, 2019, 32(1): 67-81
WAN Lipeng, LAN Xuguang, ZHANG Hanbo, et al. A review of deep reinforcement learning theory and application[J]. Pattern recognition and artificial intelligence, 2019, 32(1): 67-81
[20] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL]. London, UK: arXiv, 2013. [2019-11-1] https://arxiv.org/pdf/1312.5602.pdf.
[21] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[22] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3/4): 229-256.
[23] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Processing Systems. Colorado, USA, 2000: 1008-1014.
[24] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016: 1928-1937.
[25] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. California, USA: arXiv, 2017. [2019-11-3] https://arxiv.org/pdf/1707.06347.pdf.
[26] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. London, UK: arXiv, 2015. [2019-12-25] https://arxiv.org/pdf/1509.02971.pdf.
[27] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping[C]//Proceedings of the Sixteenth International Conference on Machine Learning. Bled, Slovenia, 1999, 99: 278-287.
[28] RANDL?V J, ALSTR?M P. Learning to drive a bicycle using reinforcement learning and shaping[C]//Proceedings of the Fifteenth International Conference on Machine Learning. Madison, USA, 1998, 98: 463-471.
[29] JAGODNIK K M, THOMAS P S, VAN DEN BOGERT A J, et al. Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards[J]. IEEE transactions on neural systems and rehabilitation engineering, 2017, 25(10): 1892-1905.
[30] FERREIRA E, LEFèVRE F. Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 2013: 108-113.
[31] NG A Y, RUSSELL S J. Algorithms for inverse reinforcement learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, USA, 2000, 1: 663-670.
[32] MARTHI B. Automatic shaping and decomposition of reward functions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, USA, 2007: 601-608.
[33] ROSS S, BAGNELL D. Efficient reductions for imitation learning[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010: 661-668.
[34] NAIR A, MCGREW B, ANDRYCHOWICZ M, et al. Overcoming exploration in reinforcement learning with demonstrations[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, QLD, Australia, 2018: 6292-6299.
[35] HO J, ERMON S. Generative adversarial imitation learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain, 2016: 4565-4573.
[36] LIU Yuxuan, GUPTA A, ABBEEL P, et al. Imitation from observation: Learning to imitate behaviors from raw video via context translation[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia, 2018: 1118-1125.
[37] TORABI F, WARNELL G, STONE P. Behavioral cloning from observation[EB/OL]. Texas, USA: arXiv, 2018. [2019-11-1] https://arxiv.org/pdf/1805.01954.pdf.
[38] ELMAN J L. Learning and development in neural networks: The importance of starting small[J]. Cognition, 1993, 48(1): 71-99.
[39] GRAVES A, BELLEMARE M G, MENICK J, et al. Automated curriculum learning for neural networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. Sydney, Australia, 2017: 1311-1320.
[40] OpenAI, AKKAYA I, ANDRYCHOWICZ M, et al. Solving rubik’s cube with a robot hand[EB/OL]. California, USA: arXiv, 2019. [2019-11-2] https://arxiv.org/pdf/1910.07113.pdf.
[41] LANKA S, WU Tianfu. ARCHER: aggressive rewards to counter bias in hindsight experience replay[EB/OL]. NC, USA: arXiv, 2018. [2019-12-3] https://arxiv.org/pdf/1809.02070.
[42] MANELA B, BIESS A. Bias-reduced hindsight experience replay with virtual goal prioritization[EB/OL]. BeerSheva, Israel: arXiv, 2019. [2019-12-3] https://arxiv.org/pdf/1905.05498.pdf.
[43] RAUBER P, UMMADISINGU A, MUTZ F, et al. Hindsight policy gradients[EB/OL]. London, UK: arXiv, 2017. [2019-11-2] https://arxiv.org/pdf/1711.06006.pdf.
[44] SCHMIDHUBER J. A possibility for implementing curiosity and boredom in model-building neural controllers[C]//Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats. Cambridge, USA, 1991: 222-227.
[45] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA, 2017: 16-17.
[46] BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain, 2016: 1471-1479.
[47] STREHL A L, LITTMAN M L. An analysis of model-based interval estimation for Markov decision processes[J]. Journal of computer and system sciences, 2008, 74(8): 1309-1331.
[48] TANG Haoran, HOUTHOOFT R, FOOTE D, et al. # exploration: A study of count-based exploration for deep reinforcement learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 2753-2762.
[49] BURDA Y, EDWARDS H, STORKEY A, et al. Exploration by random network distillation[EB/OL]. California, USA: arXiv, 2018. [2019-5-20] https://arxiv.org/pdf/1810.12894.pdf.
[50] STADIE B C, LEVINE S, ABBEEL P. Incentivizing exploration in reinforcement learning with deep predictive models[EB/OL]. California, USA: arXiv, 2015. [2019-5-2] https://arxiv.org/pdf/1507.00814.pdf.
[51] KINGMA D P, WELLING M. Auto-encoding variational bayes[EB/OL]. Amsterdam, Netherlands: arXiv, 2013. [2019-2-2] https://arxiv.org/pdf/1312.6114.pdf.
[52] RAFATI J, NOELLE D C. Learning representations in model-free hierarchical reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu, USA, 2019, 33: 10009-10010.
[53] SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning[J]. Artificial intelligence, 1999, 112(1-2): 181-211.
[54] KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain, 2016: 3675-3683.
[55] BACON P L, HARB J, PRECUP D. The option-critic architecture[C]//Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017.
[56] FRANS K, HO J, CHEN X, et al. Meta learning shared hierarchies[EB/OL]. California, USA: arXiv, 2017. [2019-11-15] https://arxiv.org/pdf/1710.09767, 2017.
[57] VEZHNEVETS A S, OSINDERO S, SCHAUL T, et al. Feudal networks for hierarchical reinforcement learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. Sydney, Australia, 2017: 3540-3549.
[58] NACHUM O, GU Shixiang, LEE H, et al. Data-efficient hierarchical reinforcement learning[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada, 2018: 3303-3313.
[59] LEVY A, KONIDARIS G, PLATT R, et al. Learning multi-level hierarchies with hindsight[EB/OL]. RI, USA: arXiv, 2017. [2019-12-16] https://arxiv.org/pdf/1712.00948.pdf.
[60] SCHAUL T, HORGAN D, GREGOR K, et al. Universal value function approximators[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France, 2015: 1312-1320.
[61] SUKHBAATAR S, LIN Zeming, KOSTRIKOV I, et al. Intrinsic motivation and automatic curricula via asymmetric self-play[EB/OL]. NY, USA: arXiv, 2017. [2019-12-11] https://arxiv.org/pdf/1703.05407.pdf.
[62] JABRI A, HSU K, EYSENBACH B, et al. Unsupervised curricula for visual meta-reinforcement learning[EB/OL]. California, USA: arXiv, 2019. [2019-12-21] https://arxiv.org/pdf/1912.04226.pdf.
[63] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 5998-6008.
[64] SAHNI H, BUCKLEY T, ABBEEL P, et al. Visual hindsight experience replay[EB/OL]. GA, USA: arXiv, 2019. [2019-10-12] https://arxiv.org/pdf/1901.11529.pdf.
[65] SUKHBAATAR S, DENTON E, SZLAM A, et al. Learning goal embeddings via self-play for hierarchical reinforcement learning[EB/OL]. NY, USA: arXiv, 2018. [2019-11-21] https://arxiv.org/pdf/1811.09083.pdf.
[66] LANIER J B, MCALEER S, BALDI P. Curiosity-driven multi-criteria hindsight experience replay[EB/OL]. California, USA: arXiv, 2019. [2019-12-13] https://arxiv.org/pdf/1906.03710.pdf.
Similar References:

Memo

-

Last Update: 2021-01-15

Copyright © CAAI Transactions on Intelligent Systems