[1]雍宇晨,李子豫,董琦.基于分层多智能体强化学习的多无人机视距内空战[J].智能系统学报,2025,20(3):548-556.[doi:10.11992/tis.202408008]
YONG Yuchen,LI Ziyu,DONG Qi.Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning[J].CAAI Transactions on Intelligent Systems,2025,20(3):548-556.[doi:10.11992/tis.202408008]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第3期
页码:
548-556
栏目:
学术论文—机器学习
出版日期:
2025-05-05
- Title:
-
Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning
- 作者:
-
雍宇晨1,2, 李子豫3, 董琦2
-
1. 东南大学 软件学院, 江苏 南京 211189;
2. 中国电科电子科学研究院, 北京 100041;
3. 东南大学 信息科学与工程学院, 江苏 南京 210096
- Author(s):
-
YONG Yuchen1,2, LI Ziyu3, DONG Qi2
-
1. College of Software Engineering, Southeast University, Nanjing 211189, China;
2. Electronic Science Research Institute of China Electronics Technology Group Corporation, Beijing 100041, China;
3. School of Information Science and Engineering, Southeast University, Nanjing 210096, China
-
- 关键词:
-
视距内空战; 缠斗; 自主机动决策; 自博弈; 分层强化学习; 多智能体博弈; 分层决策网络; 奖励函数设计
- Keywords:
-
air combat within visual range; dogfight; autonomous decision-making; self-play; hierarchical reinforcement learning; multi-intelligent body game; hierarchical decision networks; reward function design
- 分类号:
-
TP18
- DOI:
-
10.11992/tis.202408008
- 摘要:
-
为提高无人机在视距内空战中的自主机动决策能力,本文提出一种基于自博弈理论(self-play,SP)和多智能体分层强化学习(mutil agent hierarchical reinforcement learning,MAHRL)的层次决策网络框架。该框架通过结合自身博弈和多智能体强化学习算法,研究了多无人机空战缠斗场景。复杂的空战任务被分解为上层导弹打击任务和下层飞行跟踪任务,有效地减少了战术行动的模糊性,并提高了多无人机空战场景中的自主机动决策能力。此外,通过设计新颖的奖励函数和采用自博弈方法,减少了大型战场环境导致的无意义探索。仿真结果表明,该算法不仅有助于智能体学习基本的飞行战术和高级的作战战术,而且在防御和进攻能力上优于其他多智能体空战算法。
- Abstract:
-
To improve the autonomous maneuvering decision-making capabilities of unmanned aerial vehicles (UAVs) in within-visual-range air combat, a hierarchical decision network framework based on self-play theory (SP) and multiagent reinforcement learning (MARL) is proposed in this paper. A multi-UAV dogfight scenario is studied by combining SP and an MARL algorithm. The complex air combat task is divided into upper-level missile strike tasks and lower-level flight tracking tasks, which effectively reduces the fuzziness of tactical action and improves the autonomous maneuvering decision-making ability in a multi-UAV dogfight scenario. In addition, through an innovative reward function design and by adopting the SP method, the algorithm reduces the meaningless exploration of an agent due to the large battlefield environment. Simulation results show that this algorithm can help agents learn basic flight tactics and advanced combat tactics and has better defensive and offensive capabilities compared with other multiagent air combat algorithms.
更新日期/Last Update:
1900-01-01