<-上一篇/Previous Article 下一篇/Next Article->

[1]雍宇晨,李子豫,董琦.基于分层多智能体强化学习的多无人机视距内空战[J].智能系统学报,2025,20(3):548-556.[doi:10.11992/tis.202408008]
　YONG Yuchen,LI Ziyu,DONG Qi.Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning[J].CAAI Transactions on Intelligent Systems,2025,20(3):548-556.[doi:10.11992/tis.202408008]

点击复制

基于分层多智能体强化学习的多无人机视距内空战

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第3期页码: 548-556 栏目: 学术论文—机器学习出版日期: 2025-05-05

Title:: Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning

作者:: 雍宇晨^1,2, 李子豫³, 董琦²; 1. 东南大学软件学院, 江苏南京 211189;
2. 中国电科电子科学研究院, 北京 100041;
3. 东南大学信息科学与工程学院, 江苏南京 210096

Author(s):: YONG Yuchen^1,2, LI Ziyu³, DONG Qi²; 1. College of Software Engineering, Southeast University, Nanjing 211189, China;
2. Electronic Science Research Institute of China Electronics Technology Group Corporation, Beijing 100041, China;
3. School of Information Science and Engineering, Southeast University, Nanjing 210096, China

关键词:: 视距内空战; 缠斗; 自主机动决策; 自博弈; 分层强化学习; 多智能体博弈; 分层决策网络; 奖励函数设计

Keywords:: air combat within visual range; dogfight; autonomous decision-making; self-play; hierarchical reinforcement learning; multi-intelligent body game; hierarchical decision networks; reward function design

分类号:: TP18

DOI:: 10.11992/tis.202408008

摘要:: 为提高无人机在视距内空战中的自主机动决策能力，本文提出一种基于自博弈理论(self-play，SP)和多智能体分层强化学习(mutil agent hierarchical reinforcement learning，MAHRL)的层次决策网络框架。该框架通过结合自身博弈和多智能体强化学习算法，研究了多无人机空战缠斗场景。复杂的空战任务被分解为上层导弹打击任务和下层飞行跟踪任务，有效地减少了战术行动的模糊性，并提高了多无人机空战场景中的自主机动决策能力。此外，通过设计新颖的奖励函数和采用自博弈方法，减少了大型战场环境导致的无意义探索。仿真结果表明，该算法不仅有助于智能体学习基本的飞行战术和高级的作战战术，而且在防御和进攻能力上优于其他多智能体空战算法。

Abstract:: To improve the autonomous maneuvering decision-making capabilities of unmanned aerial vehicles (UAVs) in within-visual-range air combat, a hierarchical decision network framework based on self-play theory (SP) and multiagent reinforcement learning (MARL) is proposed in this paper. A multi-UAV dogfight scenario is studied by combining SP and an MARL algorithm. The complex air combat task is divided into upper-level missile strike tasks and lower-level flight tracking tasks, which effectively reduces the fuzziness of tactical action and improves the autonomous maneuvering decision-making ability in a multi-UAV dogfight scenario. In addition, through an innovative reward function design and by adopting the SP method, the algorithm reduces the meaningless exploration of an agent due to the large battlefield environment. Simulation results show that this algorithm can help agents learn basic flight tactics and advanced combat tactics and has better defensive and offensive capabilities compared with other multiagent air combat algorithms.

参考文献/References:: [1] ZHAO Yujiao, QI Xin, MA Yong, et al. Path following optimization for an underactuated USV using smoothly-convergent deep reinforcement learning[J]. IEEE transactions on intelligent transportation systems, 2021, 22(10): 6208-6220.
[2] WANG Yuan, ZHANG Xiwen, ZHOU Rong, et al. Research on UCAV maneuvering decision method based on heuristic reinforcement learning[J]. Computational intelligence and neuroscience, 2022, 2022(1): 1477078.
[3] ERNEST N, COHEN K, KIVELEVITCH E, et al. Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles[J]. Unmanned systems, 2015, 3(3): 185-204.
[4] CHAI Jiajun, CHEN Wenzhang, ZHU Yuanheng, et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat[J]. IEEE transactions on systems, man, and cybernetics: systems, 2023, 53(9): 5417-5429.
[5] POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials[J]. IEEE transactions on artificial intelligence, 2023, 4(6): 1371-1385.
[6] HU Dongyuan, YANG Rennong, ZUO Jialiang, et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat[J]. IEEE access, 2021, 9: 32282-32297.
[7] CRUMPACKER J B, ROBBINS M J, JENKINS P R. An approximate dynamic programming approach for solving an air combat maneuvering problem[J]. Expert systems with applications, 2022, 203: 117448.
[8] RUAN Wanying, DUAN Haibin, DENG Yimin. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements[J]. IEEE/CAA journal of automatica sinica, 2022, 9(9): 1639-1657.
[9] WANG Maolin, WANG Lixin, YUE Ting, et al. Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness[J]. Aerospace science and technology, 2020, 96: 105534.
[10] DOURADO A O, MARTIN C A. New concept of dynamic flight simulator, Part I[J]. Aerospace science and technology, 2013, 30(1): 79-82.
[11] PATIL V, POTPHODE V, POTDUKHE U, et al. Smart UAV framework for multi-assistance[C]//ICT with Intelligent Applications. Singapore: Springer Singapore, 2022: 241-249.
[12] DONG Yiqun, AI Jianliang, LIU Jiquan. Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects[J]. Proceedings of the institution of mechanical engineers, Part G: journal of aerospace engineering, 2019, 233(16): 5943-5991.
[13] NG A Y, HARADA D, RUSSEL S. Policy invariance under reward transformations: theory and application to reward shaping[C]//Proceedings of the International Conference on Machine Learning. Bled: ICML, 1999: 278-287.
[14] HARTIKAINEN K, GENG Xinyang, HAARNOJA T, et al. Dynamical distance learning for semi-supervised and unsupervised skill discovery[EB/OL]. (2019-11-16)[2024-01-01]. https://arxiv.org/abs/1907.08225v4.
[15] KONG Weiren, ZHOU Deyun, ZHOU Ying, et al. Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat[J]. Journal of computational design and engineering, 2023, 10(2): 830-859.
[16] HUANG Changqiang, WEI Zhenglei, YANG Yuanzhi, et al. Knowledge acquisition for the air combat based on GWO[J]. Journal of physics: conference series, 2019, 1325(1): 012078.
[17] 陈虎. 多机协同多目标空战智能优化决策研究[D]. 南京: 南京航空航天大学, 2021.
CHEN Hu. Research on intelligent optimization decision of multi-aircraft cooperative multi-target air combat[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2021.
[18] 张鹏程. 基于博弈的空中目标航迹预测及攻防对抗研究[D]. 杭州: 浙江大学, 2023.
ZHANG Pengcheng. Research on air target track prediction and attack-defense confrontation based on game theory[D]. Hangzhou: Zhejiang University, 2023.
[19] 张立鹏, 魏瑞轩, 李霞. 无人作战飞机空战自主战术决策方法研究[J]. 电光与控制, 2012, 19(2): 92.
ZHANG Lipeng, WEI Ruixuan, LI Xia. Autonomous tactical decision-making of UCAVs in air combat[J]. Electronics optics & control, 2012, 19(2): 92.
[20] LI Weihua, SHI Jingping, WU Yunyan, et al. A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat[J]. Defence technology, 2022, 18(6): 1006-1022.
[21] LI Shouyi, CHEN Mou, WANG Yuhui, et al. Air combat decision-making of multiple UCAVs based on constraint strategy games[J]. Defence technology, 2022, 18(3): 368-383.
[22] HA J S, CHAE H J, CHOI H L. A stochastic game-theoretic approach for analysis of multiple cooperative air combat[C]//2015 American Control Conference. Chicago: IEEE, 2015: 3728-3733.
[23] LIU Lu, ZHANG Lichuan, ZHANG Shuo, et al. Multi-UUV cooperative dynamic maneuver decision-making algorithm using intuitionistic fuzzy game theory[J]. Complexity, 2020, 2020(1): 2815258.
[24] YANG Qiming, ZHANG Jiandong, SHI Guoqing, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE access, 2019, 8: 363-378.
[25] ZHANG Liang , XU Jia , GOLD D, et al. Air dominance through machine learning: a preliminary exploration of artificial intelligence-assisted mission planning[M]. Santa Monica: RAND Corporation, 2020.
[26] YOO J, KIM D, SHIM D H. Deep reinforcement learning based autonomous air-to-air combat using target trajectory prediction[C]//2021 21st International Conference on Control, Automation and Systems. Jeju: IEEE, 2021: 2172-2176.
[27] SUN Zhixiao, PIAO Haiyin, YANG Zhen, et al. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play[J]. Engineering applications of artificial intelligence, 2021, 98: 104112.
[28] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd ed. Cambridge: Bradford Book, 2018.
[29] MITCHELL T M. Machine learning[M]. New York: McGraw-Hill, 1997.
[30] ZHANG Kaiqing, YANG Zhuoran, BA?AR T. Multi-agent reinforcement learning: a selective overview of theories and algorithms[M]//Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
[31] YU Chao, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: ACM, 2022: 24611-24624.
[32] BERNDT J. JSBSim: an open source flight dynamics model[J]. AIAA modeling and simulation technologies conference proceedings, 2004, 2004(4923): 1-12.
[33] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30: 6380-6391.

备注/Memo

收稿日期:2024-8-7。
作者简介:雍宇晨，硕士研究生，主要研究方向为多智能体强化学习、无人机空战。E-mail：939938865@qq.com。;李子豫，博士研究生，主要研究方向为多智能体强化学习。E-mail：1494290510@qq.com。;董琦，高级工程师，主要研究方向为智能博弈与无人系统。获得第九届吴文俊人工智能优秀青年奖，发表学术论文40余篇，获得发明专利授权20余项。E-mail：dongqiouc@126.com。
通讯作者:董琦. E-mail：dongqiouc@126.com

更新日期/Last Update: 1900-01-01

基于分层多智能体强化学习的多无人机视距内空战 PDF下载HTML

备注/Memo

基于分层多智能体强化学习的多无人机视距内空战

PDF下载 HTML