[1]雍宇晨,李子豫,董琦.基于分层多智能体强化学习的多无人机视距内空战[J].智能系统学报,2025,20(3):548-556.[doi:10.11992/tis.202408008]
 YONG Yuchen,LI Ziyu,DONG Qi.Multi-UAV within-visual-range air combat based on hierarchical multiagent reinforcement learning[J].CAAI Transactions on Intelligent Systems,2025,20(3):548-556.[doi:10.11992/tis.202408008]
点击复制

基于分层多智能体强化学习的多无人机视距内空战

参考文献/References:
[1] ZHAO Yujiao, QI Xin, MA Yong, et al. Path following optimization for an underactuated USV using smoothly-convergent deep reinforcement learning[J]. IEEE transactions on intelligent transportation systems, 2021, 22(10): 6208-6220.
[2] WANG Yuan, ZHANG Xiwen, ZHOU Rong, et al. Research on UCAV maneuvering decision method based on heuristic reinforcement learning[J]. Computational intelligence and neuroscience, 2022, 2022(1): 1477078.
[3] ERNEST N, COHEN K, KIVELEVITCH E, et al. Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles[J]. Unmanned systems, 2015, 3(3): 185-204.
[4] CHAI Jiajun, CHEN Wenzhang, ZHU Yuanheng, et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat[J]. IEEE transactions on systems, man, and cybernetics: systems, 2023, 53(9): 5417-5429.
[5] POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials[J]. IEEE transactions on artificial intelligence, 2023, 4(6): 1371-1385.
[6] HU Dongyuan, YANG Rennong, ZUO Jialiang, et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat[J]. IEEE access, 2021, 9: 32282-32297.
[7] CRUMPACKER J B, ROBBINS M J, JENKINS P R. An approximate dynamic programming approach for solving an air combat maneuvering problem[J]. Expert systems with applications, 2022, 203: 117448.
[8] RUAN Wanying, DUAN Haibin, DENG Yimin. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements[J]. IEEE/CAA journal of automatica sinica, 2022, 9(9): 1639-1657.
[9] WANG Maolin, WANG Lixin, YUE Ting, et al. Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness[J]. Aerospace science and technology, 2020, 96: 105534.
[10] DOURADO A O, MARTIN C A. New concept of dynamic flight simulator, Part I[J]. Aerospace science and technology, 2013, 30(1): 79-82.
[11] PATIL V, POTPHODE V, POTDUKHE U, et al. Smart UAV framework for multi-assistance[C]//ICT with Intelligent Applications. Singapore: Springer Singapore, 2022: 241-249.
[12] DONG Yiqun, AI Jianliang, LIU Jiquan. Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects[J]. Proceedings of the institution of mechanical engineers, Part G: journal of aerospace engineering, 2019, 233(16): 5943-5991.
[13] NG A Y, HARADA D, RUSSEL S. Policy invariance under reward transformations: theory and application to reward shaping[C]//Proceedings of the International Conference on Machine Learning. Bled: ICML, 1999: 278-287.
[14] HARTIKAINEN K, GENG Xinyang, HAARNOJA T, et al. Dynamical distance learning for semi-supervised and unsupervised skill discovery[EB/OL]. (2019-11-16)[2024-01-01]. https://arxiv.org/abs/1907.08225v4.
[15] KONG Weiren, ZHOU Deyun, ZHOU Ying, et al. Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat[J]. Journal of computational design and engineering, 2023, 10(2): 830-859.
[16] HUANG Changqiang, WEI Zhenglei, YANG Yuanzhi, et al. Knowledge acquisition for the air combat based on GWO[J]. Journal of physics: conference series, 2019, 1325(1): 012078.
[17] 陈虎. 多机协同多目标空战智能优化决策研究[D]. 南京: 南京航空航天大学, 2021.
CHEN Hu. Research on intelligent optimization decision of multi-aircraft cooperative multi-target air combat[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2021.
[18] 张鹏程. 基于博弈的空中目标航迹预测及攻防对抗研究[D]. 杭州: 浙江大学, 2023.
ZHANG Pengcheng. Research on air target track prediction and attack-defense confrontation based on game theory[D]. Hangzhou: Zhejiang University, 2023.
[19] 张立鹏, 魏瑞轩, 李霞. 无人作战飞机空战自主战术决策方法研究[J]. 电光与控制, 2012, 19(2): 92.
ZHANG Lipeng, WEI Ruixuan, LI Xia. Autonomous tactical decision-making of UCAVs in air combat[J]. Electronics optics & control, 2012, 19(2): 92.
[20] LI Weihua, SHI Jingping, WU Yunyan, et al. A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat[J]. Defence technology, 2022, 18(6): 1006-1022.
[21] LI Shouyi, CHEN Mou, WANG Yuhui, et al. Air combat decision-making of multiple UCAVs based on constraint strategy games[J]. Defence technology, 2022, 18(3): 368-383.
[22] HA J S, CHAE H J, CHOI H L. A stochastic game-theoretic approach for analysis of multiple cooperative air combat[C]//2015 American Control Conference. Chicago: IEEE, 2015: 3728-3733.
[23] LIU Lu, ZHANG Lichuan, ZHANG Shuo, et al. Multi-UUV cooperative dynamic maneuver decision-making algorithm using intuitionistic fuzzy game theory[J]. Complexity, 2020, 2020(1): 2815258.
[24] YANG Qiming, ZHANG Jiandong, SHI Guoqing, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE access, 2019, 8: 363-378.
[25] ZHANG Liang , XU Jia , GOLD D, et al. Air dominance through machine learning: a preliminary exploration of artificial intelligence-assisted mission planning[M]. Santa Monica: RAND Corporation, 2020.
[26] YOO J, KIM D, SHIM D H. Deep reinforcement learning based autonomous air-to-air combat using target trajectory prediction[C]//2021 21st International Conference on Control, Automation and Systems. Jeju: IEEE, 2021: 2172-2176.
[27] SUN Zhixiao, PIAO Haiyin, YANG Zhen, et al. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play[J]. Engineering applications of artificial intelligence, 2021, 98: 104112.
[28] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd ed. Cambridge: Bradford Book, 2018.
[29] MITCHELL T M. Machine learning[M]. New York: McGraw-Hill, 1997.
[30] ZHANG Kaiqing, YANG Zhuoran, BA?AR T. Multi-agent reinforcement learning: a selective overview of theories and algorithms[M]//Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
[31] YU Chao, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: ACM, 2022: 24611-24624.
[32] BERNDT J. JSBSim: an open source flight dynamics model[J]. AIAA modeling and simulation technologies conference proceedings, 2004, 2004(4923): 1-12.
[33] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30: 6380-6391.

备注/Memo

收稿日期:2024-8-7。
作者简介:雍宇晨,硕士研究生,主要研究方向为多智能体强化学习、无人机空战。E-mail:939938865@qq.com。;李子豫,博士研究生,主要研究方向为多智能体强化学习。E-mail:1494290510@qq.com。;董琦,高级工程师,主要研究方向为智能博弈与无人系统。获得第九届吴文俊人工智能优秀青年奖,发表学术论文40余篇,获得发明专利授权20余项。E-mail:dongqiouc@126.com。
通讯作者:董琦. E-mail:dongqiouc@126.com

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com