[1]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317-322.[doi:10.11992/tis.201809033]
SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(2):317-322.[doi:10.11992/tis.201809033]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第2期
页码:
317-322
栏目:
学术论文—机器学习
出版日期:
2020-03-05
- Title:
-
State attention in deep reinforcement learning
- 作者:
-
申翔翔1, 侯新文2, 尹传环1
-
1. 北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044;
2. 中国科学院自动化研究所 智能系统与工程研究中心, 北京 110016
- Author(s):
-
SHEN Xiangxiang1, HOU Xinwen2, YIN Chuanhuan1
-
1. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China;
2. Center for Research on Intelligent System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 110016, China
-
- 关键词:
-
深度学习; 强化学习; 注意力机制; A3C算法; 星际争霸II迷你游戏; 智能体; 微型操作
- Keywords:
-
deep learning; reinforcement learning; attention mechanism; A3C; StarCraft II mini-games; agent; micromanagement
- 分类号:
-
TP183
- DOI:
-
10.11992/tis.201809033
- 摘要:
-
虽然在深度学习与强化学习结合后,人工智能在棋类游戏和视频游戏等领域取得了超越人类水平的重大成就,但是实时策略性游戏星际争霸由于其巨大的状态空间和动作空间,对于人工智能研究者来说是一个巨大的挑战平台,针对Deepmind在星际争霸II迷你游戏中利用经典的深度强化学习算法A3C训练出来的基线智能体的水平和普通业余玩家的水平相比还存在较大的差距的问题。通过采用更简化的网络结构以及把注意力机制与强化学习中的奖励结合起来的方法,提出基于状态注意力的A3C算法,所训练出来的智能体在个别星际迷你游戏中利用更少的特征图层取得的成绩最高,高于Deepmind的基线智能体71分。
- Abstract:
-
Through artificial intelligence, significant achievements beyond the human level have been made in the field of board games and video games since the emergence of deep reinforcement learning. However, the real-time strategic game StarCraft is a huge challenging platform for artificial intelligence researchers due to its huge state space and action space. Considering that the level of baseline agents trained by DeepMind using classical deep reinforcement learning algorithm A3C in StarCraft II mini-game is still far from that of ordinary amateur players, by adopting a more simplified network structure and combining the attention mechanism with rewards in reinforcement learning, an A3C algorithm based on state attention is proposed to solve this problem. The trained agent achieves the highest score, which is 71 points higher than Deepmind’s baseline agent in individual interplanetary mini games with fewer feature layers.
更新日期/Last Update:
1900-01-01