[1]张钰欣,赵恩娇,赵玉新.规则耦合下的多异构子网络MADDPG博弈对抗算法[J].智能系统学报,2024,19(1):190-208.[doi:10.11992/tis.202303037]
 ZHANG Yuxin,ZHAO Enjiao,ZHAO Yuxin.MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling[J].CAAI Transactions on Intelligent Systems,2024,19(1):190-208.[doi:10.11992/tis.202303037]
点击复制

规则耦合下的多异构子网络MADDPG博弈对抗算法

参考文献/References:
[1] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1): 4–14
JIA Yongnan, TIAN Siying, LI Qing. Recent development of unmanned aerial vehicle swarms[J]. Acta aeronautica et astronautica sinica, 2020, 41(S1): 4–14
[2] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842–1858
LI Jingchen, SHI Haobin, HUANG Guosheng. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese journal of computers, 2022, 45(9): 1842–1858
[3] ZHANG Yu, MOU Zhiyu, GAO Feifei, et al. UAV-enabled secure communications by multi-agent deep reinforcement learning[J]. IEEE transactions on vehicular technology, 2020, 69(10): 11599–11611.
[4] ZHANG Lixiang, LI Jingchen, ZHU Yi’an, et al. Multi-agent reinforcement learning by the actor-critic model with an attention interface[J]. Neurocomputing, 2022, 471: 275–284.
[5] SIM?ES D, LAU N, REIS L P. Exploring communication protocols and centralized critics in multi-agent deep learning[J]. Integrated computer-aided engineering, 2020, 27(4): 333–351.
[6] SINGH A, JHA S S. Learning safe cooperative policies in autonomous multi-UAV navigation[C]//2021 IEEE 18th India Council International Conference . Piscataway: IEEE, 2022: 1?6.
[7] WANG Bennian, GAO Yang, CHEN Zhaoqian, et al. A two-layered multi-agent reinforcement learning model and algorithm[J]. Journal of network and computer applications, 2007, 30(4): 1366–1376.
[8] WANG Zihao, ZHANG Yanxin, YIN Chenkun, et al. Multi-agent deep reinforcement learning based on maximum entropy[C]//2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference. Piscataway: IEEE, 2021: 1402?1406.
[9] DING Feng, MA Guanfeng, CHEN Zhikui, et al. Averaged soft actor-critic for deep reinforcement learning[J]. Complexity, 2021, 2021: 1–16.
[10] 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311
FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta aeronautica et astronautica sinica, 2022, 43(5): 325311
[11] ZHOU Xiao, SONG Zhou, MOU Xingang, et al. Multirobot collaborative pursuit target robot by improved MADDPG[J]. Computational intelligence and neuroscience, 2022, 2022: 1–10.
[12] LI Chengjing, WANG Li, HUANG Zirong. Hindsight-aware deep reinforcement learning algorithm for multi-agent systems[J]. International journal of machine learning and cybernetics, 2022, 13(7): 2045–2057.
[13] WAN Kaifang, WU Dingwei, LI Bo, et al. ME-MADDPG: an efficient learning-based motion planning method for multiple agents in complex environments[J]. International journal of intelligent systems, 2022, 37(3): 2393–2427.
[14] WAN Kaifang, WU Dingwei, ZHAI Yiwei, et al. An improved approach towards multi-agent pursuit-evasion game decision-making using deep reinforcement learning[J]. Entropy, 2021, 23(11): 1433.
[15] LUO Wentao, ZHANG Jianfu, FENG Pingfa, et al. A deep transfer-learning-based dynamic reinforcement learning for intelligent tightening system[J]. International journal of intelligent systems, 2021, 36(3): 1345–1365.
[16] SUN Yu, LAI Jun, CAO Lei, et al. A novel multi-agent parallel-critic network architecture for cooperative-competitive reinforcement learning[J]. IEEE access, 2020, 8: 135605–135616.
[17] JIANG Longting, WEI Ruixuan, WANG Dong. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient[J]. Applied intelligence, 2023, 53(10): 11474–11489.
[18] QIE Han, SHI Dianxi, SHEN Tianlong, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE access, 2019, 7: 146264–146272.
[19] KONG Weiren, ZHOU Deyun, YANG Zhen. Air combat strategies generation of CGF based on MADDPG and reward shaping[C]//2020 International Conference on Computer Vision, Image and Deep Learning. Piscataway: IEEE, 2020: 651?655.
[20] XIANG Lei, XIE Tao. Research on UAV swarm confrontation task based on MADDPG algorithm[C]//2020 5th International Conference on Mechanical, Control and Computer Engineering. Piscataway: IEEE, 2021: 1513?1518.
[21] LIU Jianxing, AN Hao, GAO Yabin, et al. Adaptive control of hypersonic flight vehicles with limited angle-of-attack[J]. IEEE/ASME transactions on mechatronics, 2018, 23(2): 883–894.
[22] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. (2017?06?07)[2023?03?30]. https://arxiv.org/abs/1706.02275.
[23] 邹长杰, 郑皎凌, 张中雷. 基于GAED-MADDPG多智能体强化学习的协作策略研究[J]. 计算机应用研究, 2020, 37(12): 3656–3661
ZOU Changjie, ZHENG Jiaoling, ZHANG Zhonglei. Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning[J]. Application research of computers, 2020, 37(12): 3656–3661
[24] WANG Zhaolei, ZHANG Jun, LI Yue, et al. Automated reinforcement learning based on parameter sharing network architecture search[C]//2021 6th International Conference on Robotics and Automation Engineering (ICRAE). Piscataway: IEEE, 2022: 358?363.
[25] HUANG Liwei, FU Mingsheng, QU Hong, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems[J]. Expert systems with applications, 2021, 176: 114896.
[26] REN Jinsheng, GUO Shangqi, CHEN Feng. Orientation-preserving rewards’ balancing in reinforcement learning[J]. IEEE transactions on neural networks and learning systems, 2022, 33(11): 6458–6472.
[27] 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12): 324152
CHEN Can, MO Li, ZHENG Duo, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta aeronautica et astronautica sinica, 2020, 41(12): 324152
[28] KONG Weiren, ZHOU Deyun, ZHANG Kai, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning[C]//2020 IEEE 16th International Conference on Control & Automation. Piscataway: IEEE, 2020: 506?512.
[29] ZUO Guoyu, ZHAO Qishen, LU Jiahao, et al. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards[J]. International journal of advanced robotic systems, 2020, 17(1): 172988141989834.
[30] FU Yuchuan, LI Changle, YU F R, et al. Hybrid autonomous driving guidance strategy combining deep reinforcement learning and expert system[J]. IEEE transactions on intelligent transportation systems, 2022, 23(8): 11273–11286.
[31] YANG Ruyue, WANG Ding, QIAO Junfei. Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control[J]. IEEE transactions on industrial informatics, 2022, 18(5): 3150–3158.
[32] NI Zhen, MALLA N, ZHONG Xiangnan. Prioritizing useful experience replay for heuristic dynamic programming-based learning systems[J]. IEEE transactions on cybernetics, 2019, 49(11): 3911–3922.
[33] YUAN Wei, LI Yueyuan, ZHUANG Hanyang, et al. Prioritized experience replay-based deep Q learning: multiple-reward architecture for highway driving decision making[J]. IEEE robotics & automation magazine, 2021, 28(4): 21–31.
[34] LIU Wenzhang, DONG Lu, LIU Jian, et al. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents[J]. Journal of systems engineering and electronics, 2022, 33(2): 447–460.
[35] 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420–433
GAO Ang, DONG Zhiming, LI Liang, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems engineering and electronics, 2021, 43(2): 420–433
相似文献/References:
[1]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
 ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[2]王作为,徐征,张汝波,等.记忆神经网络在机器人导航领域的应用与研究进展[J].智能系统学报,2020,15(5):835.[doi:10.11992/tis.202002020]
 WANG Zuowei,XU Zheng,ZHANG Rubo,et al.Research progress and application of memory neural network in robot navigation[J].CAAI Transactions on Intelligent Systems,2020,15():835.[doi:10.11992/tis.202002020]
[3]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888.[doi:10.11992/tis.202003031]
 YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15():888.[doi:10.11992/tis.202003031]
[4]赵玉新,杜登辉,成小会,等.基于强化学习的海洋移动观测网络观测路径规划方法[J].智能系统学报,2022,17(1):192.[doi:10.11992/tis.202106004]
 ZHAO Yuxin,DU Denghui,CHENG Xiaohui,et al.Path planning for mobile ocean observation network based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():192.[doi:10.11992/tis.202106004]
[5]欧阳勇平,魏长赟,蔡帛良.动态环境下分布式异构多机器人避障方法研究[J].智能系统学报,2022,17(4):752.[doi:10.11992/tis.202106044]
 OUYANG Yongping,WEI Changyun,CAI Boliang.Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J].CAAI Transactions on Intelligent Systems,2022,17():752.[doi:10.11992/tis.202106044]
[6]王竣禾,姜勇.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2.[doi:10.11992/tis.202201006]
 WANG Junhe,JIANG Yong.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():2.[doi:10.11992/tis.202201006]
[7]陶鑫钰,王艳,纪志成.基于深度强化学习的节能工艺路线发现方法[J].智能系统学报,2023,18(1):23.[doi:10.11992/tis.202112030]
 TAO Xinyu,WANG Yan,JI Zhicheng.Energy-saving process route discovery method based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():23.[doi:10.11992/tis.202112030]

备注/Memo

收稿日期:2023-03-30。
基金项目:国家自然科学基金项目(61903099);黑龙江省自然科学基金项目 (LH2020F025);重庆市教育委员会科学技术研究计划(KJZD-K20200470);中国博士后科学基金面上项目(2021M690812); 黑龙江省博士后基金面上项目(LBH-Z21048).
作者简介:张钰欣,硕士研究生,主要研究方向为多智能体深度强化学习和多智能体博弈对抗。E-mail:15140294516@163.com;赵恩娇,副教授,主要研究方向为集群无人系统协同控制、智能化航海,主持国家自然科学基金项目、黑龙江省自然科学基金项目、中国博士后科学基金项目、黑龙江省博士后科学基金项目等 10 余项。发表学术论文 20余篇。E-mail:zhaoenjiao935@ hrbeu.edu.cn;赵玉新,教授,博士生导师,中国青年科技奖、霍英东教育教学奖获得者,入选国家科技创新领军人才支持计划,首批龙江科技英才,担任“导航仪器”教育部工程研究中心主任、工信部研究型教学团队负责人、国家级虚拟教研室负责人、国家级一流本科课程负责人,主要研究方向为船舶导航与海洋仪器技术。承担国防973课题、国家重大科技专项课题、国家自然科学基金项目等多项任务。发表学术论文50余篇,出版专著4部。E-mail:zhaoyuxin@hrbeu.edu.cn
通讯作者:赵恩娇. E-mail:zhaoenjiao935@hrbeu.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com