[1]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317-322.[doi:10.11992/tis.201809033]
 SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15(2):317-322.[doi:10.11992/tis.201809033]
点击复制

深度强化学习中状态注意力机制的研究

参考文献/References:
[1] LI Yuxi. Deep reinforcement learning: an overview[EB/OL]. [2018-01-17]https://arxiv.org/abs/1701.07274.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[3] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[4] VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: a new challenge for reinforcement learning[EB/OL]. [2018-01-17]https://arXiv: 1708.04782, 2017.
[5] ONTANON S, SYNNAEVE G, URIARTE A, et al. A survey of real-time strategy game AI research and competition in StarCraft[J]. IEEE transactions on computational intelligence and AI in games, 2013, 5(4): 293-311.
[6] SYNNAEVE G, BESSIERE P. A dataset for StarCraft AI & an example of armies clustering[C]//Artificial Intelligence in Adversarial Real-Time Games. Palo Alto, USA, 2012: 25-30.
[7] SYNNAEVE G, BESSIèRE P. A Bayesian model for opening prediction in RTS games with application to StarCraft[C]//Proceedings of 2011 IEEE Conference on Computational Intelligence and Games. Seoul, South Korea, 2011: 281-288.
[8] JUSTESEN N, RISI S. Learning macromanagement in starcraft from replays using deep learning[C]//Proceedings of 2017 IEEE Conference on Computational Intelligence and Games. New York, USA, 2017: 162-169.
[9] DODGE J, PENNEY S, HILDERBRAND C, et al. How the experts do it: assessing and explaining agent behaviors in real-time strategy games[C]//Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Montreal QC, Canada, 2018.
[10] PENNEY S, DODGE J, HILDERBRAND C, et al. Toward foraging for understanding of starcraft agents: an empirical study[C]//Proceedings of the 23rd International Conference on Intelligent User Interfaces. Tokyo, Japan, 2018: 225-237.
[11] PENG Peng, WEN Ying, YANG Yaodong, et al. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games[EB/OL]. [2018-01-17]https://arXiv: 1703.10069, 2017.
[12] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin, et al. StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE transactions on emerging topics in computational intelligence, 2019, 3(1): 73-84.
[13] WENDER S, WATSON I. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft: Broodwar[C]//Proceedings of 2012 IEEE Conference on Computational Intelligence and Games. Granada, Spain, 2012: 402-408.
[14] DENIL M, BAZZANI L, LAROCHELLE H, et al. Learning where to attend with deep architectures for image tracking[J]. Neural computation, 2012, 24(8): 2151-2184.
[15] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations. 2015.
[16] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 2204-2212.
[17] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York USA, 2016: 1928-1937.
[18] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3/4): 229-256.
[19] ILYAS A, ENGSTROM L, SANTURKAR S, et al. Are deep policy gradient algorithms truly policy gradient algorithms? [EB/OL]. [2018-01-17]https://arXiv: 1811.02553, 2018.
[20] DeepMind. DeepMind mini games[EB/OL]. (2017-08-10)[2018-09-10]. https://github.com/deepmind/pysc2/blob/master/docs/mini_games.md.
相似文献/References:
[1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
 LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[2]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
 ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[3]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[4]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
 LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[5]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
 MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[6]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
 LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[7]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[8]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
 WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[9]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
 HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11():835.[doi:10.11992/tis.201605022]
[10]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
 ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[11]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
 YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[12]王作为,徐征,张汝波,等.记忆神经网络在机器人导航领域的应用与研究进展[J].智能系统学报,2020,15(5):835.[doi:10.11992/tis.202002020]
 WANG Zuowei,XU Zheng,ZHANG Rubo,et al.Research progress and application of memory neural network in robot navigation[J].CAAI Transactions on Intelligent Systems,2020,15():835.[doi:10.11992/tis.202002020]
[13]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888.[doi:10.11992/tis.202003031]
 YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15():888.[doi:10.11992/tis.202003031]
[14]朱少凯,孟庆浩,金晟,等.基于深度强化学习的室内视觉局部路径规划[J].智能系统学报,2022,17(5):908.[doi:10.11992/tis.202107059]
 ZHU Shaokai,MENG Qinghao,JIN Sheng,et al.Indoor visual local path planning based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():908.[doi:10.11992/tis.202107059]
[15]李霞丽,王昭琦,刘博,等.麻将博弈AI构建方法综述[J].智能系统学报,2023,18(6):1143.[doi:10.11992/tis.202211028]
 LI Xiali,WANG Zhaoqi,LIU Bo,et al.Survey of Mahjong game AI construction methods[J].CAAI Transactions on Intelligent Systems,2023,18():1143.[doi:10.11992/tis.202211028]

备注/Memo

收稿日期:2018-09-17。
基金项目:中央高校基本科研业务费专项资金项目(2018JBZ006);国家自然科学基金项目(61105056)
作者简介:申翔翔,硕士研究生,主要研究方向为深度强化学习;侯新文,项目研究员,主要研究方向为人脸检测和识别、机器学习、强化学习和博弈对抗。发表学术论文40余篇,Google Scholar 1 000多次;尹传环,副教授,主要研究方向为网络安全(入侵检测)、数据挖掘、机器学习。
通讯作者:尹传环.E-mail:chyin@bjtu.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com