[1]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317-322.[doi:10.11992/tis.201809033]
 SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(2):317-322.[doi:10.11992/tis.201809033]
点击复制

深度强化学习中状态注意力机制的研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年2期
页码:
317-322
栏目:
学术论文—机器学习
出版日期:
2020-07-09

文章信息/Info

Title:
State attention in deep reinforcement learning
作者:
申翔翔1 侯新文2 尹传环1
1. 北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044;
2. 中国科学院自动化研究所 智能系统与工程研究中心, 北京 110016
Author(s):
SHEN Xiangxiang1 HOU Xinwen2 YIN Chuanhuan1
1. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China;
2. Center for Research on Intelligent System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 110016, China
关键词:
深度学习强化学习注意力机制A3C算法星际争霸II迷你游戏智能体微型操作
Keywords:
deep learningreinforcement learningattention mechanismA3CStarCraft II mini-gamesagentmicromanagement
分类号:
TP183
DOI:
10.11992/tis.201809033
摘要:
虽然在深度学习与强化学习结合后,人工智能在棋类游戏和视频游戏等领域取得了超越人类水平的重大成就,但是实时策略性游戏星际争霸由于其巨大的状态空间和动作空间,对于人工智能研究者来说是一个巨大的挑战平台,针对Deepmind在星际争霸II迷你游戏中利用经典的深度强化学习算法A3C训练出来的基线智能体的水平和普通业余玩家的水平相比还存在较大的差距的问题。通过采用更简化的网络结构以及把注意力机制与强化学习中的奖励结合起来的方法,提出基于状态注意力的A3C算法,所训练出来的智能体在个别星际迷你游戏中利用更少的特征图层取得的成绩最高,高于Deepmind的基线智能体71分。
Abstract:
Through artificial intelligence, significant achievements beyond the human level have been made in the field of board games and video games since the emergence of deep reinforcement learning. However, the real-time strategic game StarCraft is a huge challenging platform for artificial intelligence researchers due to its huge state space and action space. Considering that the level of baseline agents trained by DeepMind using classical deep reinforcement learning algorithm A3C in StarCraft II mini-game is still far from that of ordinary amateur players, by adopting a more simplified network structure and combining the attention mechanism with rewards in reinforcement learning, an A3C algorithm based on state attention is proposed to solve this problem. The trained agent achieves the highest score, which is 71 points higher than Deepmind’s baseline agent in individual interplanetary mini games with fewer feature layers.

参考文献/References:

[1] LI Yuxi. Deep reinforcement learning: an overview[EB/OL]. [2018-01-17]https://arxiv.org/abs/1701.07274.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: a new challenge for reinforcement learning[EB/OL]. [2018-01-17]https://arXiv: 1708.04782, 2017.
[5] ONTANON S, SYNNAEVE G, URIARTE A, et al. A survey of real-time strategy game AI research and competition in StarCraft[J]. IEEE transactions on computational intelligence and AI in games, 2013, 5(4): 293-311.
[6] SYNNAEVE G, BESSIERE P. A dataset for StarCraft AI & an example of armies clustering[C]//Artificial Intelligence in Adversarial Real-Time Games. Palo Alto, USA, 2012: 25-30.
[7] SYNNAEVE G, BESSIèRE P. A Bayesian model for opening prediction in RTS games with application to StarCraft[C]//Proceedings of 2011 IEEE Conference on Computational Intelligence and Games. Seoul, South Korea, 2011: 281-288.
[8] JUSTESEN N, RISI S. Learning macromanagement in starcraft from replays using deep learning[C]//Proceedings of 2017 IEEE Conference on Computational Intelligence and Games. New York, USA, 2017: 162-169.
[9] DODGE J, PENNEY S, HILDERBRAND C, et al. How the experts do it: assessing and explaining agent behaviors in real-time strategy games[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Montreal QC, Canada, 2018.
[10] PENNEY S, DODGE J, HILDERBRAND C, et al. Toward foraging for understanding of starcraft agents: an empirical study[C]//Proceedings of the 23rd International Conference on Intelligent User Interfaces. Tokyo, Japan, 2018: 225-237.
[11] PENG Peng, WEN Ying, YANG Yaodong, et al. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games[EB/OL]. [2018-01-17]https://arXiv: 1703.10069, 2017.
[12] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin, et al. StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE transactions on emerging topics in computational intelligence, 2019, 3(1): 73-84.
[13] WENDER S, WATSON I. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft: Broodwar[C]//Proceedings of 2012 IEEE Conference on Computational Intelligence and Games. Granada, Spain, 2012: 402-408.
[14] DENIL M, BAZZANI L, LAROCHELLE H, et al. Learning where to attend with deep architectures for image tracking[J]. Neural computation, 2012, 24(8): 2151-2184.
[15] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations. 2015.
[16] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 2204-2212.
[17] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York USA, 2016: 1928-1937.
[18] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3/4): 229-256.
[19] ILYAS A, ENGSTROM L, SANTURKAR S, et al. Are deep policy gradient algorithms truly policy gradient algorithms? [EB/OL]. [2018-01-17]https://arXiv: 1811.02553, 2018.
[20] DeepMind. DeepMind mini games[EB/OL]. (2017-08-10)[2018-09-10]. https://github.com/deepmind/pysc2/blob/master/docs/mini_games.md.

相似文献/References:

[1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(02):95.
 LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6(2):95.
[2]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(02):193.[doi:10.3969/j.issn.1673-4785.201405060]
 ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
[3]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(2):1.[doi:10.3969/j.issn.1673-4785.201403072]
[4]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
 LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11(2):149.[doi:10.11992/tis.201510031]
[5]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
 MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11(2):279.[doi:10.11992/tis.201603026]
[6]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
 LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11(2):567.[doi:10.11992/tis.201511028]
[7]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11(2):728.[doi:10.11992/tis.201611021]
[8]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
 WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11(2):788.[doi:10.11992/tis.201609006]
[9]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
 HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11(2):835.[doi:10.11992/tis.201605022]
[10]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(01):82.[doi:10.11992/tis.201604008]
 ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12(2):82.[doi:10.11992/tis.201604008]

备注/Memo

备注/Memo:
收稿日期:2018-09-17。
基金项目:中央高校基本科研业务费专项资金项目(2018JBZ006);国家自然科学基金项目(61105056)
作者简介:申翔翔,硕士研究生,主要研究方向为深度强化学习;侯新文,项目研究员,主要研究方向为人脸检测和识别、机器学习、强化学习和博弈对抗。发表学术论文40余篇,Google Scholar 1 000多次;尹传环,副教授,主要研究方向为网络安全(入侵检测)、数据挖掘、机器学习。
通讯作者:尹传环.E-mail:chyin@bjtu.edu.cn
更新日期/Last Update: 1900-01-01