<-Previous Article Next Article->

[1]ZHANG Yuxin,ZHAO Enjiao,ZHAO Yuxin.MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling[J].CAAI Transactions on Intelligent Systems,2024,19(1):190-208.[doi:10.11992/tis.202303037]

Copy

MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 19 Number of periods: 2024 1 Page number: 190-208 Column: 人工智能院长论坛 Public date: 2024-01-05

Title:: MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling

Author(s):: ZHANG Yuxin; ZHAO Enjiao; ZHAO Yuxin; College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China

Keywords:: deep reinforcement learning; multi-UAVs; game confrontation; MADDPG; Actor-Critic; rule coupling; experience replay; sparse rewards

CLC:: V279

DOI:: 10.11992/tis.202303037

Abstract:: In order to overcome of dynamic attenuation of the number of UAVs in the process of multi-UAV game confrontation, and solve the sparse reward problem in the traditional deep reinforcement learning algorithm and the high frequency of invalid experience extraction, a game model of red and blue UAV clusters is built in this paper based on the background of multi-unmanned aerial vehicles (Multi-UAVs) game with limited attack and defense capabilities and communication range. Under the Actor-Critic framework of multi-agent deep deterministic policy gradient (MADDPG) algorithm, the original MADDPG algorithm is improved according to the characteristics of the game scenario to solve the problem of the number attenuation, sparse rewards and high extraction frequency of invalid experience of UAVs in the original algorithm. On this basis, in order to improve the exploration and utilization of algorithm for effective experiences, a rule coupling module is built to assist UAV. The simulation experiment shows that the algorithm designed in this paper has improved the convergence speed, learning efficiency and stability. The use of polyisomer network makes the algorithm more suitable for the game scenario that the number of UAVs declines dynamically; the reward potential function and the priority experience playback method based on the importance weight coupling improve the degree of refinement of experience difference and the utilization rate of superior experience; the introduction of rule coupling module realizes the effective utilization of UAV decision network for priori knowledge.

References:: [1] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1): 4–14
JIA Yongnan, TIAN Siying, LI Qing. Recent development of unmanned aerial vehicle swarms[J]. Acta aeronautica et astronautica sinica, 2020, 41(S1): 4–14
[2] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842–1858
LI Jingchen, SHI Haobin, HUANG Guosheng. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese journal of computers, 2022, 45(9): 1842–1858
[3] ZHANG Yu, MOU Zhiyu, GAO Feifei, et al. UAV-enabled secure communications by multi-agent deep reinforcement learning[J]. IEEE transactions on vehicular technology, 2020, 69(10): 11599–11611.
[4] ZHANG Lixiang, LI Jingchen, ZHU Yi’an, et al. Multi-agent reinforcement learning by the actor-critic model with an attention interface[J]. Neurocomputing, 2022, 471: 275–284.
[5] SIM?ES D, LAU N, REIS L P. Exploring communication protocols and centralized critics in multi-agent deep learning[J]. Integrated computer-aided engineering, 2020, 27(4): 333–351.
[6] SINGH A, JHA S S. Learning safe cooperative policies in autonomous multi-UAV navigation[C]//2021 IEEE 18th India Council International Conference . Piscataway: IEEE, 2022: 1?6.
[7] WANG Bennian, GAO Yang, CHEN Zhaoqian, et al. A two-layered multi-agent reinforcement learning model and algorithm[J]. Journal of network and computer applications, 2007, 30(4): 1366–1376.
[8] WANG Zihao, ZHANG Yanxin, YIN Chenkun, et al. Multi-agent deep reinforcement learning based on maximum entropy[C]//2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference. Piscataway: IEEE, 2021: 1402?1406.
[9] DING Feng, MA Guanfeng, CHEN Zhikui, et al. Averaged soft actor-critic for deep reinforcement learning[J]. Complexity, 2021, 2021: 1–16.
[10] 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311
FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta aeronautica et astronautica sinica, 2022, 43(5): 325311
[11] ZHOU Xiao, SONG Zhou, MOU Xingang, et al. Multirobot collaborative pursuit target robot by improved MADDPG[J]. Computational intelligence and neuroscience, 2022, 2022: 1–10.
[12] LI Chengjing, WANG Li, HUANG Zirong. Hindsight-aware deep reinforcement learning algorithm for multi-agent systems[J]. International journal of machine learning and cybernetics, 2022, 13(7): 2045–2057.
[13] WAN Kaifang, WU Dingwei, LI Bo, et al. ME-MADDPG: an efficient learning-based motion planning method for multiple agents in complex environments[J]. International journal of intelligent systems, 2022, 37(3): 2393–2427.
[14] WAN Kaifang, WU Dingwei, ZHAI Yiwei, et al. An improved approach towards multi-agent pursuit-evasion game decision-making using deep reinforcement learning[J]. Entropy, 2021, 23(11): 1433.
[15] LUO Wentao, ZHANG Jianfu, FENG Pingfa, et al. A deep transfer-learning-based dynamic reinforcement learning for intelligent tightening system[J]. International journal of intelligent systems, 2021, 36(3): 1345–1365.
[16] SUN Yu, LAI Jun, CAO Lei, et al. A novel multi-agent parallel-critic network architecture for cooperative-competitive reinforcement learning[J]. IEEE access, 2020, 8: 135605–135616.
[17] JIANG Longting, WEI Ruixuan, WANG Dong. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient[J]. Applied intelligence, 2023, 53(10): 11474–11489.
[18] QIE Han, SHI Dianxi, SHEN Tianlong, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE access, 2019, 7: 146264–146272.
[19] KONG Weiren, ZHOU Deyun, YANG Zhen. Air combat strategies generation of CGF based on MADDPG and reward shaping[C]//2020 International Conference on Computer Vision, Image and Deep Learning. Piscataway: IEEE, 2020: 651?655.
[20] XIANG Lei, XIE Tao. Research on UAV swarm confrontation task based on MADDPG algorithm[C]//2020 5th International Conference on Mechanical, Control and Computer Engineering. Piscataway: IEEE, 2021: 1513?1518.
[21] LIU Jianxing, AN Hao, GAO Yabin, et al. Adaptive control of hypersonic flight vehicles with limited angle-of-attack[J]. IEEE/ASME transactions on mechatronics, 2018, 23(2): 883–894.
[22] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. (2017?06?07)[2023?03?30]. https://arxiv.org/abs/1706.02275.
[23] 邹长杰, 郑皎凌, 张中雷. 基于GAED-MADDPG多智能体强化学习的协作策略研究[J]. 计算机应用研究, 2020, 37(12): 3656–3661
ZOU Changjie, ZHENG Jiaoling, ZHANG Zhonglei. Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning[J]. Application research of computers, 2020, 37(12): 3656–3661
[24] WANG Zhaolei, ZHANG Jun, LI Yue, et al. Automated reinforcement learning based on parameter sharing network architecture search[C]//2021 6th International Conference on Robotics and Automation Engineering (ICRAE). Piscataway: IEEE, 2022: 358?363.
[25] HUANG Liwei, FU Mingsheng, QU Hong, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems[J]. Expert systems with applications, 2021, 176: 114896.
[26] REN Jinsheng, GUO Shangqi, CHEN Feng. Orientation-preserving rewards’ balancing in reinforcement learning[J]. IEEE transactions on neural networks and learning systems, 2022, 33(11): 6458–6472.
[27] 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12): 324152
CHEN Can, MO Li, ZHENG Duo, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta aeronautica et astronautica sinica, 2020, 41(12): 324152
[28] KONG Weiren, ZHOU Deyun, ZHANG Kai, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning[C]//2020 IEEE 16th International Conference on Control & Automation. Piscataway: IEEE, 2020: 506?512.
[29] ZUO Guoyu, ZHAO Qishen, LU Jiahao, et al. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards[J]. International journal of advanced robotic systems, 2020, 17(1): 172988141989834.
[30] FU Yuchuan, LI Changle, YU F R, et al. Hybrid autonomous driving guidance strategy combining deep reinforcement learning and expert system[J]. IEEE transactions on intelligent transportation systems, 2022, 23(8): 11273–11286.
[31] YANG Ruyue, WANG Ding, QIAO Junfei. Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control[J]. IEEE transactions on industrial informatics, 2022, 18(5): 3150–3158.
[32] NI Zhen, MALLA N, ZHONG Xiangnan. Prioritizing useful experience replay for heuristic dynamic programming-based learning systems[J]. IEEE transactions on cybernetics, 2019, 49(11): 3911–3922.
[33] YUAN Wei, LI Yueyuan, ZHUANG Hanyang, et al. Prioritized experience replay-based deep Q learning: multiple-reward architecture for highway driving decision making[J]. IEEE robotics & automation magazine, 2021, 28(4): 21–31.
[34] LIU Wenzhang, DONG Lu, LIU Jian, et al. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents[J]. Journal of systems engineering and electronics, 2022, 33(2): 447–460.
[35] 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420–433
GAO Ang, DONG Zhiming, LI Liang, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems engineering and electronics, 2021, 43(2): 420–433

Similar References:

Memo

Last Update: 1900-01-01

MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling PDF DownloadHTML

Memo

MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling

PDF Download HTML