<-上一篇/Previous Article 下一篇/Next Article->

[1]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317-322.[doi:10.11992/tis.201809033]
　SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(2):317-322.[doi:10.11992/tis.201809033]

点击复制

深度强化学习中状态注意力机制的研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 15 期数: 2020年第2期页码: 317-322 栏目: 学术论文—机器学习出版日期: 2020-03-05

Title:: State attention in deep reinforcement learning

作者:: 申翔翔¹, 侯新文², 尹传环¹; 1. 北京交通大学交通数据分析与挖掘北京市重点实验室, 北京 100044;
2. 中国科学院自动化研究所智能系统与工程研究中心, 北京 110016

Author(s):: SHEN Xiangxiang¹, HOU Xinwen², YIN Chuanhuan¹; 1. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China;
2. Center for Research on Intelligent System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 110016, China

关键词:: 深度学习; 强化学习; 注意力机制; A3C算法; 星际争霸II迷你游戏; 智能体; 微型操作

Keywords:: deep learning; reinforcement learning; attention mechanism; A3C; StarCraft II mini-games; agent; micromanagement

分类号:: TP183

DOI:: 10.11992/tis.201809033

摘要:: 虽然在深度学习与强化学习结合后，人工智能在棋类游戏和视频游戏等领域取得了超越人类水平的重大成就，但是实时策略性游戏星际争霸由于其巨大的状态空间和动作空间，对于人工智能研究者来说是一个巨大的挑战平台，针对Deepmind在星际争霸II迷你游戏中利用经典的深度强化学习算法A3C训练出来的基线智能体的水平和普通业余玩家的水平相比还存在较大的差距的问题。通过采用更简化的网络结构以及把注意力机制与强化学习中的奖励结合起来的方法，提出基于状态注意力的A3C算法，所训练出来的智能体在个别星际迷你游戏中利用更少的特征图层取得的成绩最高，高于Deepmind的基线智能体71分。

Abstract:: Through artificial intelligence, significant achievements beyond the human level have been made in the field of board games and video games since the emergence of deep reinforcement learning. However, the real-time strategic game StarCraft is a huge challenging platform for artificial intelligence researchers due to its huge state space and action space. Considering that the level of baseline agents trained by DeepMind using classical deep reinforcement learning algorithm A3C in StarCraft II mini-game is still far from that of ordinary amateur players, by adopting a more simplified network structure and combining the attention mechanism with rewards in reinforcement learning, an A3C algorithm based on state attention is proposed to solve this problem. The trained agent achieves the highest score, which is 71 points higher than Deepmind’s baseline agent in individual interplanetary mini games with fewer feature layers.

参考文献/References:: [1] LI Yuxi. Deep reinforcement learning: an overview[EB/OL]. [2018-01-17]https://arxiv.org/abs/1701.07274.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: a new challenge for reinforcement learning[EB/OL]. [2018-01-17]https://arXiv: 1708.04782, 2017.
[5] ONTANON S, SYNNAEVE G, URIARTE A, et al. A survey of real-time strategy game AI research and competition in StarCraft[J]. IEEE transactions on computational intelligence and AI in games, 2013, 5(4): 293-311.
[6] SYNNAEVE G, BESSIERE P. A dataset for StarCraft AI & an example of armies clustering[C]//Artificial Intelligence in Adversarial Real-Time Games. Palo Alto, USA, 2012: 25-30.
[7] SYNNAEVE G, BESSIèRE P. A Bayesian model for opening prediction in RTS games with application to StarCraft[C]//Proceedings of 2011 IEEE Conference on Computational Intelligence and Games. Seoul, South Korea, 2011: 281-288.
[8] JUSTESEN N, RISI S. Learning macromanagement in starcraft from replays using deep learning[C]//Proceedings of 2017 IEEE Conference on Computational Intelligence and Games. New York, USA, 2017: 162-169.
[9] DODGE J, PENNEY S, HILDERBRAND C, et al. How the experts do it: assessing and explaining agent behaviors in real-time strategy games[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Montreal QC, Canada, 2018.
[10] PENNEY S, DODGE J, HILDERBRAND C, et al. Toward foraging for understanding of starcraft agents: an empirical study[C]//Proceedings of the 23rd International Conference on Intelligent User Interfaces. Tokyo, Japan, 2018: 225-237.
[11] PENG Peng, WEN Ying, YANG Yaodong, et al. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games[EB/OL]. [2018-01-17]https://arXiv: 1703.10069, 2017.
[12] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin, et al. StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE transactions on emerging topics in computational intelligence, 2019, 3(1): 73-84.
[13] WENDER S, WATSON I. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft: Broodwar[C]//Proceedings of 2012 IEEE Conference on Computational Intelligence and Games. Granada, Spain, 2012: 402-408.
[14] DENIL M, BAZZANI L, LAROCHELLE H, et al. Learning where to attend with deep architectures for image tracking[J]. Neural computation, 2012, 24(8): 2151-2184.
[15] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations. 2015.
[16] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 2204-2212.
[17] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York USA, 2016: 1928-1937.
[18] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3/4): 229-256.
[19] ILYAS A, ENGSTROM L, SANTURKAR S, et al. Are deep policy gradient algorithms truly policy gradient algorithms? [EB/OL]. [2018-01-17]https://arXiv: 1811.02553, 2018.
[20] DeepMind. DeepMind mini games[EB/OL]. (2017-08-10)[2018-09-10]. https://github.com/deepmind/pysc2/blob/master/docs/mini_games.md.

相似文献/References:: [1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
　LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[2]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
　ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[3]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
　DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[4]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
　LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[5]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
　MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[6]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
　LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[7]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[8]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
　WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[9]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
　HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11():835.[doi:10.11992/tis.201605022]
[10]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
　ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[11]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[12]王作为,徐征,张汝波,等.记忆神经网络在机器人导航领域的应用与研究进展[J].智能系统学报,2020,15(5):835.[doi:10.11992/tis.202002020]
　WANG Zuowei,XU Zheng,ZHANG Rubo,et al.Research progress and application of memory neural network in robot navigation[J].CAAI Transactions on Intelligent Systems,2020,15():835.[doi:10.11992/tis.202002020]
[13]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888.[doi:10.11992/tis.202003031]
　YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15():888.[doi:10.11992/tis.202003031]
[14]朱少凯,孟庆浩,金晟,等.基于深度强化学习的室内视觉局部路径规划[J].智能系统学报,2022,17(5):908.[doi:10.11992/tis.202107059]
　ZHU Shaokai,MENG Qinghao,JIN Sheng,et al.Indoor visual local path planning based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():908.[doi:10.11992/tis.202107059]
[15]李霞丽,王昭琦,刘博,等.麻将博弈AI构建方法综述[J].智能系统学报,2023,18(6):1143.[doi:10.11992/tis.202211028]
　LI Xiali,WANG Zhaoqi,LIU Bo,et al.Survey of Mahjong game AI construction methods[J].CAAI Transactions on Intelligent Systems,2023,18():1143.[doi:10.11992/tis.202211028]

备注/Memo

收稿日期:2018-09-17。
基金项目:中央高校基本科研业务费专项资金项目(2018JBZ006)；国家自然科学基金项目(61105056)
作者简介:申翔翔，硕士研究生，主要研究方向为深度强化学习;侯新文，项目研究员，主要研究方向为人脸检测和识别、机器学习、强化学习和博弈对抗。发表学术论文40余篇，Google Scholar 1 000多次;尹传环，副教授，主要研究方向为网络安全(入侵检测)、数据挖掘、机器学习。
通讯作者:尹传环.E-mail:chyin@bjtu.edu.cn

更新日期/Last Update: 1900-01-01

深度强化学习中状态注意力机制的研究 PDF下载HTML

备注/Memo

深度强化学习中状态注意力机制的研究

PDF下载 HTML