<-上一篇/Previous Article 下一篇/Next Article->

[1]张钰欣,赵恩娇,赵玉新.规则耦合下的多异构子网络MADDPG博弈对抗算法[J].智能系统学报,2024,19(1):190-208.[doi:10.11992/tis.202303037]
　ZHANG Yuxin,ZHAO Enjiao,ZHAO Yuxin.MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling[J].CAAI Transactions on Intelligent Systems,2024,19(1):190-208.[doi:10.11992/tis.202303037]

点击复制

规则耦合下的多异构子网络MADDPG博弈对抗算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 19 期数: 2024年第1期页码: 190-208 栏目: 人工智能院长论坛出版日期: 2024-01-05

Title:: MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling

作者:: 张钰欣, 赵恩娇, 赵玉新; 哈尔滨工程大学智能科学与工程学院, 黑龙江哈尔滨 150001

Author(s):: ZHANG Yuxin, ZHAO Enjiao, ZHAO Yuxin; College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China

关键词:: 深度强化学习; 多无人机; 博弈对抗; MADDPG; Actor-Critic; 规则耦合; 经验回放; 稀疏奖励

Keywords:: deep reinforcement learning; multi-UAVs; game confrontation; MADDPG; Actor-Critic; rule coupling; experience replay; sparse rewards

分类号:: V279

DOI:: 10.11992/tis.202303037

文献标志码:: 2023-07-31

摘要:: 针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题，本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景，构建了红、蓝两方无人机群的博弈对抗模型，在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient, MADDPG)算法的Actor-Critic框架下，根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用，本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明，本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升，异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景；奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率；规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。

Abstract:: In order to overcome of dynamic attenuation of the number of UAVs in the process of multi-UAV game confrontation, and solve the sparse reward problem in the traditional deep reinforcement learning algorithm and the high frequency of invalid experience extraction, a game model of red and blue UAV clusters is built in this paper based on the background of multi-unmanned aerial vehicles (Multi-UAVs) game with limited attack and defense capabilities and communication range. Under the Actor-Critic framework of multi-agent deep deterministic policy gradient (MADDPG) algorithm, the original MADDPG algorithm is improved according to the characteristics of the game scenario to solve the problem of the number attenuation, sparse rewards and high extraction frequency of invalid experience of UAVs in the original algorithm. On this basis, in order to improve the exploration and utilization of algorithm for effective experiences, a rule coupling module is built to assist UAV. The simulation experiment shows that the algorithm designed in this paper has improved the convergence speed, learning efficiency and stability. The use of polyisomer network makes the algorithm more suitable for the game scenario that the number of UAVs declines dynamically; the reward potential function and the priority experience playback method based on the importance weight coupling improve the degree of refinement of experience difference and the utilization rate of superior experience; the introduction of rule coupling module realizes the effective utilization of UAV decision network for priori knowledge.

参考文献/References:: [1] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1): 4–14
JIA Yongnan, TIAN Siying, LI Qing. Recent development of unmanned aerial vehicle swarms[J]. Acta aeronautica et astronautica sinica, 2020, 41(S1): 4–14
[2] 李静晨, 史豪斌, 黄国胜. 基于自注意力机制和策略映射重组的多智能体强化学习算法[J]. 计算机学报, 2022, 45(9): 1842–1858
LI Jingchen, SHI Haobin, HUANG Guosheng. A multi-agent reinforcement learning method based on self-attention mechanism and policy mapping recombination[J]. Chinese journal of computers, 2022, 45(9): 1842–1858
[3] ZHANG Yu, MOU Zhiyu, GAO Feifei, et al. UAV-enabled secure communications by multi-agent deep reinforcement learning[J]. IEEE transactions on vehicular technology, 2020, 69(10): 11599–11611.
[4] ZHANG Lixiang, LI Jingchen, ZHU Yi’an, et al. Multi-agent reinforcement learning by the actor-critic model with an attention interface[J]. Neurocomputing, 2022, 471: 275–284.
[5] SIM?ES D, LAU N, REIS L P. Exploring communication protocols and centralized critics in multi-agent deep learning[J]. Integrated computer-aided engineering, 2020, 27(4): 333–351.
[6] SINGH A, JHA S S. Learning safe cooperative policies in autonomous multi-UAV navigation[C]//2021 IEEE 18th India Council International Conference . Piscataway: IEEE, 2022: 1?6.
[7] WANG Bennian, GAO Yang, CHEN Zhaoqian, et al. A two-layered multi-agent reinforcement learning model and algorithm[J]. Journal of network and computer applications, 2007, 30(4): 1366–1376.
[8] WANG Zihao, ZHANG Yanxin, YIN Chenkun, et al. Multi-agent deep reinforcement learning based on maximum entropy[C]//2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference. Piscataway: IEEE, 2021: 1402?1406.
[9] DING Feng, MA Guanfeng, CHEN Zhikui, et al. Averaged soft actor-critic for deep reinforcement learning[J]. Complexity, 2021, 2021: 1–16.
[10] 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311
FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta aeronautica et astronautica sinica, 2022, 43(5): 325311
[11] ZHOU Xiao, SONG Zhou, MOU Xingang, et al. Multirobot collaborative pursuit target robot by improved MADDPG[J]. Computational intelligence and neuroscience, 2022, 2022: 1–10.
[12] LI Chengjing, WANG Li, HUANG Zirong. Hindsight-aware deep reinforcement learning algorithm for multi-agent systems[J]. International journal of machine learning and cybernetics, 2022, 13(7): 2045–2057.
[13] WAN Kaifang, WU Dingwei, LI Bo, et al. ME-MADDPG: an efficient learning-based motion planning method for multiple agents in complex environments[J]. International journal of intelligent systems, 2022, 37(3): 2393–2427.
[14] WAN Kaifang, WU Dingwei, ZHAI Yiwei, et al. An improved approach towards multi-agent pursuit-evasion game decision-making using deep reinforcement learning[J]. Entropy, 2021, 23(11): 1433.
[15] LUO Wentao, ZHANG Jianfu, FENG Pingfa, et al. A deep transfer-learning-based dynamic reinforcement learning for intelligent tightening system[J]. International journal of intelligent systems, 2021, 36(3): 1345–1365.
[16] SUN Yu, LAI Jun, CAO Lei, et al. A novel multi-agent parallel-critic network architecture for cooperative-competitive reinforcement learning[J]. IEEE access, 2020, 8: 135605–135616.
[17] JIANG Longting, WEI Ruixuan, WANG Dong. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient[J]. Applied intelligence, 2023, 53(10): 11474–11489.
[18] QIE Han, SHI Dianxi, SHEN Tianlong, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE access, 2019, 7: 146264–146272.
[19] KONG Weiren, ZHOU Deyun, YANG Zhen. Air combat strategies generation of CGF based on MADDPG and reward shaping[C]//2020 International Conference on Computer Vision, Image and Deep Learning. Piscataway: IEEE, 2020: 651?655.
[20] XIANG Lei, XIE Tao. Research on UAV swarm confrontation task based on MADDPG algorithm[C]//2020 5th International Conference on Mechanical, Control and Computer Engineering. Piscataway: IEEE, 2021: 1513?1518.
[21] LIU Jianxing, AN Hao, GAO Yabin, et al. Adaptive control of hypersonic flight vehicles with limited angle-of-attack[J]. IEEE/ASME transactions on mechatronics, 2018, 23(2): 883–894.
[22] LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. (2017?06?07)[2023?03?30]. https://arxiv.org/abs/1706.02275.
[23] 邹长杰, 郑皎凌, 张中雷. 基于GAED-MADDPG多智能体强化学习的协作策略研究[J]. 计算机应用研究, 2020, 37(12): 3656–3661
ZOU Changjie, ZHENG Jiaoling, ZHANG Zhonglei. Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning[J]. Application research of computers, 2020, 37(12): 3656–3661
[24] WANG Zhaolei, ZHANG Jun, LI Yue, et al. Automated reinforcement learning based on parameter sharing network architecture search[C]//2021 6th International Conference on Robotics and Automation Engineering (ICRAE). Piscataway: IEEE, 2022: 358?363.
[25] HUANG Liwei, FU Mingsheng, QU Hong, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems[J]. Expert systems with applications, 2021, 176: 114896.
[26] REN Jinsheng, GUO Shangqi, CHEN Feng. Orientation-preserving rewards’ balancing in reinforcement learning[J]. IEEE transactions on neural networks and learning systems, 2022, 33(11): 6458–6472.
[27] 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12): 324152
CHEN Can, MO Li, ZHENG Duo, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta aeronautica et astronautica sinica, 2020, 41(12): 324152
[28] KONG Weiren, ZHOU Deyun, ZHANG Kai, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning[C]//2020 IEEE 16th International Conference on Control & Automation. Piscataway: IEEE, 2020: 506?512.
[29] ZUO Guoyu, ZHAO Qishen, LU Jiahao, et al. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards[J]. International journal of advanced robotic systems, 2020, 17(1): 172988141989834.
[30] FU Yuchuan, LI Changle, YU F R, et al. Hybrid autonomous driving guidance strategy combining deep reinforcement learning and expert system[J]. IEEE transactions on intelligent transportation systems, 2022, 23(8): 11273–11286.
[31] YANG Ruyue, WANG Ding, QIAO Junfei. Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control[J]. IEEE transactions on industrial informatics, 2022, 18(5): 3150–3158.
[32] NI Zhen, MALLA N, ZHONG Xiangnan. Prioritizing useful experience replay for heuristic dynamic programming-based learning systems[J]. IEEE transactions on cybernetics, 2019, 49(11): 3911–3922.
[33] YUAN Wei, LI Yueyuan, ZHUANG Hanyang, et al. Prioritized experience replay-based deep Q learning: multiple-reward architecture for highway driving decision making[J]. IEEE robotics & automation magazine, 2021, 28(4): 21–31.
[34] LIU Wenzhang, DONG Lu, LIU Jian, et al. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents[J]. Journal of systems engineering and electronics, 2022, 33(2): 447–460.
[35] 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420–433
GAO Ang, DONG Zhiming, LI Liang, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems engineering and electronics, 2021, 43(2): 420–433

相似文献/References:: [1]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[2]王作为,徐征,张汝波,等.记忆神经网络在机器人导航领域的应用与研究进展[J].智能系统学报,2020,15(5):835.[doi:10.11992/tis.202002020]
　WANG Zuowei,XU Zheng,ZHANG Rubo,et al.Research progress and application of memory neural network in robot navigation[J].CAAI Transactions on Intelligent Systems,2020,15():835.[doi:10.11992/tis.202002020]
[3]杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报,2020,15(5):888.[doi:10.11992/tis.202003031]
　YANG Rui,YAN Jiangpeng,LI Xiu.Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J].CAAI Transactions on Intelligent Systems,2020,15():888.[doi:10.11992/tis.202003031]
[4]赵玉新,杜登辉,成小会,等.基于强化学习的海洋移动观测网络观测路径规划方法[J].智能系统学报,2022,17(1):192.[doi:10.11992/tis.202106004]
　ZHAO Yuxin,DU Denghui,CHENG Xiaohui,et al.Path planning for mobile ocean observation network based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():192.[doi:10.11992/tis.202106004]
[5]欧阳勇平,魏长赟,蔡帛良.动态环境下分布式异构多机器人避障方法研究[J].智能系统学报,2022,17(4):752.[doi:10.11992/tis.202106044]
　OUYANG Yongping,WEI Changyun,CAI Boliang.Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J].CAAI Transactions on Intelligent Systems,2022,17():752.[doi:10.11992/tis.202106044]
[6]王竣禾,姜勇.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2.[doi:10.11992/tis.202201006]
　WANG Junhe,JIANG Yong.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():2.[doi:10.11992/tis.202201006]
[7]陶鑫钰,王艳,纪志成.基于深度强化学习的节能工艺路线发现方法[J].智能系统学报,2023,18(1):23.[doi:10.11992/tis.202112030]
　TAO Xinyu,WANG Yan,JI Zhicheng.Energy-saving process route discovery method based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18():23.[doi:10.11992/tis.202112030]
[8]李康斌,朱齐丹,牟进友,等.基于改进DDQN船舶自动靠泊路径规划方法[J].智能系统学报,2025,20(1):73.[doi:10.11992/tis.202401005]
　LI Kangbin,ZHU Qidan,MU Jinyou,et al.Automatic ship berthing path-planning method based on improved DDQN[J].CAAI Transactions on Intelligent Systems,2025,20():73.[doi:10.11992/tis.202401005]
[9]穆凌霞,李筱,王斑,等.基于改进序列凸优化的多无人机航迹规划方法[J].智能系统学报,2025,20(1):128.[doi:10.11992/tis.202312035]
　MU Lingxia,LI Xiao,WANG Ban,et al.Trajectory planning method based on improved sequential convex optimization algorithm for multiple UAVs[J].CAAI Transactions on Intelligent Systems,2025,20():128.[doi:10.11992/tis.202312035]
[10]李庆华,冉泳屹,刘启晨,等.数据中心冷热电联产系统的前摄式智能节能优化算法[J].智能系统学报,2025,20(1):139.[doi:10.11992/tis.202312037]
　LI Qinghua,RAN Yongyi,LIU Qichen,et al.Proactive intelligent energy-saving optimization algorithm for data center CCHP system[J].CAAI Transactions on Intelligent Systems,2025,20():139.[doi:10.11992/tis.202312037]

备注/Memo

收稿日期:2023-03-30。
基金项目:国家自然科学基金项目（61903099）；黑龙江省自然科学基金项目（LH2020F025）；重庆市教育委员会科学技术研究计划（KJZD-K20200470）；中国博士后科学基金面上项目（2021M690812）；黑龙江省博士后基金面上项目（LBH-Z21048）.
作者简介:张钰欣，硕士研究生，主要研究方向为多智能体深度强化学习和多智能体博弈对抗。E-mail：15140294516@163.com;赵恩娇，副教授，主要研究方向为集群无人系统协同控制、智能化航海，主持国家自然科学基金项目、黑龙江省自然科学基金项目、中国博士后科学基金项目、黑龙江省博士后科学基金项目等 10 余项。发表学术论文 20余篇。E-mail：zhaoenjiao935@ hrbeu.edu.cn;赵玉新，教授，博士生导师，中国青年科技奖、霍英东教育教学奖获得者，入选国家科技创新领军人才支持计划，首批龙江科技英才，担任“导航仪器”教育部工程研究中心主任、工信部研究型教学团队负责人、国家级虚拟教研室负责人、国家级一流本科课程负责人，主要研究方向为船舶导航与海洋仪器技术。承担国防973课题、国家重大科技专项课题、国家自然科学基金项目等多项任务。发表学术论文50余篇，出版专著4部。E-mail：zhaoyuxin@hrbeu.edu.cn
通讯作者:赵恩娇. E-mail：zhaoenjiao935@hrbeu.edu.cn

更新日期/Last Update: 1900-01-01

规则耦合下的多异构子网络MADDPG博弈对抗算法 PDF下载HTML

备注/Memo

规则耦合下的多异构子网络MADDPG博弈对抗算法

PDF下载 HTML