[1]ZHANG Yuxin,ZHAO Enjiao,ZHAO Yuxin.MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling[J].CAAI Transactions on Intelligent Systems,2024,19(1):190-208.[doi:10.11992/tis.202303037]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 1
Page number:
190-208
Column:
人工智能院长论坛
Public date:
2024-01-05
- Title:
-
MADDPG game confrontation algorithm of polyisomer network based on rule coupling based on rule coupling
- Author(s):
-
ZHANG Yuxin; ZHAO Enjiao; ZHAO Yuxin
-
College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
-
- Keywords:
-
deep reinforcement learning; multi-UAVs; game confrontation; MADDPG; Actor-Critic; rule coupling; experience replay; sparse rewards
- CLC:
-
V279
- DOI:
-
10.11992/tis.202303037
- Abstract:
-
In order to overcome of dynamic attenuation of the number of UAVs in the process of multi-UAV game confrontation, and solve the sparse reward problem in the traditional deep reinforcement learning algorithm and the high frequency of invalid experience extraction, a game model of red and blue UAV clusters is built in this paper based on the background of multi-unmanned aerial vehicles (Multi-UAVs) game with limited attack and defense capabilities and communication range. Under the Actor-Critic framework of multi-agent deep deterministic policy gradient (MADDPG) algorithm, the original MADDPG algorithm is improved according to the characteristics of the game scenario to solve the problem of the number attenuation, sparse rewards and high extraction frequency of invalid experience of UAVs in the original algorithm. On this basis, in order to improve the exploration and utilization of algorithm for effective experiences, a rule coupling module is built to assist UAV. The simulation experiment shows that the algorithm designed in this paper has improved the convergence speed, learning efficiency and stability. The use of polyisomer network makes the algorithm more suitable for the game scenario that the number of UAVs declines dynamically; the reward potential function and the priority experience playback method based on the importance weight coupling improve the degree of refinement of experience difference and the utilization rate of superior experience; the introduction of rule coupling module realizes the effective utilization of UAV decision network for priori knowledge.