<-Previous Article Next Article->

[1]YONG Yuchen,LI Ziyu,DONG Qi.Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning[J].CAAI Transactions on Intelligent Systems,2025,20(3):548-556.[doi:10.11992/tis.202408008]

Copy

Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 3 Page number: 548-556 Column: 学术论文—机器学习 Public date: 2025-05-05

Title:: Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning

Author(s):: YONG Yuchen¹; 2; LI Ziyu³; DONG Qi²; 1. College of Software Engineering, Southeast University, Nanjing 211189, China;
2. Electronic Science Research Institute of China Electronics Technology Group Corporation, Beijing 100041, China;
3. School of Information Science and Engineering, Southeast University, Nanjing 210096, China

Keywords:: air combat within visual range; dogfight; autonomous decision-making; self-play; hierarchical reinforcement learning; multi-intelligent body game; hierarchical decision networks; reward function design

CLC:: TP18

DOI:: 10.11992/tis.202408008

Abstract:: To improve the autonomous maneuvering decision-making capabilities of unmanned aerial vehicles (UAVs) in within-visual-range air combat, a hierarchical decision network framework based on self-play theory (SP) and multiagent reinforcement learning (MARL) is proposed in this paper. A multi-UAV dogfight scenario is studied by combining SP and an MARL algorithm. The complex air combat task is divided into upper-level missile strike tasks and lower-level flight tracking tasks, which effectively reduces the fuzziness of tactical action and improves the autonomous maneuvering decision-making ability in a multi-UAV dogfight scenario. In addition, through an innovative reward function design and by adopting the SP method, the algorithm reduces the meaningless exploration of an agent due to the large battlefield environment. Simulation results show that this algorithm can help agents learn basic flight tactics and advanced combat tactics and has better defensive and offensive capabilities compared with other multiagent air combat algorithms.

References:: [1] ZHAO Yujiao, QI Xin, MA Yong, et al. Path following optimization for an underactuated USV using smoothly-convergent deep reinforcement learning[J]. IEEE transactions on intelligent transportation systems, 2021, 22(10): 6208-6220.
[2] WANG Yuan, ZHANG Xiwen, ZHOU Rong, et al. Research on UCAV maneuvering decision method based on heuristic reinforcement learning[J]. Computational intelligence and neuroscience, 2022, 2022(1): 1477078.
[3] ERNEST N, COHEN K, KIVELEVITCH E, et al. Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles[J]. Unmanned systems, 2015, 3(3): 185-204.
[4] CHAI Jiajun, CHEN Wenzhang, ZHU Yuanheng, et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat[J]. IEEE transactions on systems, man, and cybernetics: systems, 2023, 53(9): 5417-5429.
[5] POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials[J]. IEEE transactions on artificial intelligence, 2023, 4(6): 1371-1385.
[6] HU Dongyuan, YANG Rennong, ZUO Jialiang, et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat[J]. IEEE access, 2021, 9: 32282-32297.
[7] CRUMPACKER J B, ROBBINS M J, JENKINS P R. An approximate dynamic programming approach for solving an air combat maneuvering problem[J]. Expert systems with applications, 2022, 203: 117448.
[8] RUAN Wanying, DUAN Haibin, DENG Yimin. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements[J]. IEEE/CAA journal of automatica sinica, 2022, 9(9): 1639-1657.
[9] WANG Maolin, WANG Lixin, YUE Ting, et al. Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness[J]. Aerospace science and technology, 2020, 96: 105534.
[10] DOURADO A O, MARTIN C A. New concept of dynamic flight simulator, Part I[J]. Aerospace science and technology, 2013, 30(1): 79-82.
[11] PATIL V, POTPHODE V, POTDUKHE U, et al. Smart UAV framework for multi-assistance[C]//ICT with Intelligent Applications. Singapore: Springer Singapore, 2022: 241-249.
[12] DONG Yiqun, AI Jianliang, LIU Jiquan. Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects[J]. Proceedings of the institution of mechanical engineers, Part G: journal of aerospace engineering, 2019, 233(16): 5943-5991.
[13] NG A Y, HARADA D, RUSSEL S. Policy invariance under reward transformations: theory and application to reward shaping[C]//Proceedings of the International Conference on Machine Learning. Bled: ICML, 1999: 278-287.
[14] HARTIKAINEN K, GENG Xinyang, HAARNOJA T, et al. Dynamical distance learning for semi-supervised and unsupervised skill discovery[EB/OL]. (2019-11-16)[2024-01-01]. https://arxiv.org/abs/1907.08225v4.
[15] KONG Weiren, ZHOU Deyun, ZHOU Ying, et al. Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat[J]. Journal of computational design and engineering, 2023, 10(2): 830-859.
[16] HUANG Changqiang, WEI Zhenglei, YANG Yuanzhi, et al. Knowledge acquisition for the air combat based on GWO[J]. Journal of physics: conference series, 2019, 1325(1): 012078.
[17] 陈虎. 多机协同多目标空战智能优化决策研究[D]. 南京: 南京航空航天大学, 2021.
CHEN Hu. Research on intelligent optimization decision of multi-aircraft cooperative multi-target air combat[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2021.
[18] 张鹏程. 基于博弈的空中目标航迹预测及攻防对抗研究[D]. 杭州: 浙江大学, 2023.
ZHANG Pengcheng. Research on air target track prediction and attack-defense confrontation based on game theory[D]. Hangzhou: Zhejiang University, 2023.
[19] 张立鹏, 魏瑞轩, 李霞. 无人作战飞机空战自主战术决策方法研究[J]. 电光与控制, 2012, 19(2): 92.
ZHANG Lipeng, WEI Ruixuan, LI Xia. Autonomous tactical decision-making of UCAVs in air combat[J]. Electronics optics & control, 2012, 19(2): 92.
[20] LI Weihua, SHI Jingping, WU Yunyan, et al. A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat[J]. Defence technology, 2022, 18(6): 1006-1022.
[21] LI Shouyi, CHEN Mou, WANG Yuhui, et al. Air combat decision-making of multiple UCAVs based on constraint strategy games[J]. Defence technology, 2022, 18(3): 368-383.
[22] HA J S, CHAE H J, CHOI H L. A stochastic game-theoretic approach for analysis of multiple cooperative air combat[C]//2015 American Control Conference. Chicago: IEEE, 2015: 3728-3733.
[23] LIU Lu, ZHANG Lichuan, ZHANG Shuo, et al. Multi-UUV cooperative dynamic maneuver decision-making algorithm using intuitionistic fuzzy game theory[J]. Complexity, 2020, 2020(1): 2815258.
[24] YANG Qiming, ZHANG Jiandong, SHI Guoqing, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE access, 2019, 8: 363-378.
[25] ZHANG Liang , XU Jia , GOLD D, et al. Air dominance through machine learning: a preliminary exploration of artificial intelligence-assisted mission planning[M]. Santa Monica: RAND Corporation, 2020.
[26] YOO J, KIM D, SHIM D H. Deep reinforcement learning based autonomous air-to-air combat using target trajectory prediction[C]//2021 21st International Conference on Control, Automation and Systems. Jeju: IEEE, 2021: 2172-2176.
[27] SUN Zhixiao, PIAO Haiyin, YANG Zhen, et al. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play[J]. Engineering applications of artificial intelligence, 2021, 98: 104112.
[28] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd ed. Cambridge: Bradford Book, 2018.
[29] MITCHELL T M. Machine learning[M]. New York: McGraw-Hill, 1997.
[30] ZHANG Kaiqing, YANG Zhuoran, BA?AR T. Multi-agent reinforcement learning: a selective overview of theories and algorithms[M]//Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
[31] YU Chao, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: ACM, 2022: 24611-24624.
[32] BERNDT J. JSBSim: an open source flight dynamics model[J]. AIAA modeling and simulation technologies conference proceedings, 2004, 2004(4923): 1-12.
[33] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30: 6380-6391.

Similar References:

Memo

Last Update: 1900-01-01

Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning PDF DownloadHTML

Memo

Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning

PDF Download HTML