|
|
Design and Verification of UAV Cooperative Defense Strategy Based on Reinforcement Learning |
LI Yijia, LI Jianuo, KE Liangjun |
School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China |
|
|
Abstract The drone swarm confrontation is built based on the OODA decision loop and employs multi-agent deep reinforcement learning for algorithm design to find the optimal collaborative defence strategy for drone swarm. Specifically, a QMIX-based single-layer decision algorithm is developed to tackle contribution allocation and high-dimensional space challenges in drone cooperation. In this paper, a hierarchical decision-making model integrating rule-based methods and reinforcement learning was proposed. This model first adopted a decision layer with rule-based or HMM intention recognition to analyze combat scenarios and schedule drones, followed by an action layer utilizing the QMIX algorithm to output actions. To verify the performance of the proposed algorithms, this study established a controllable and observable simulation platform using Python and Unity and produced a challenging defensive game problem. Experiments quantitatively evaluated defence strategies in perspectives of cooperation effectiveness, resource efficiency, and generalisation. The results show that each index of hierarchical decision-making is significantly better than that of single-layer decision making, and the winning rate has been dramatically improved. The HMM-based hierarchical strategy shows the best performance, offering a promising new approach to drone swarm defence.
|
Received: 05 March 2025
Published: 15 July 2025
|
|
|
|
|
[1] |
ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412. |
[2] |
SUN Liang, WANG Mingyu, ZHOU Suhua, LEI Rongqiang. Research on the Combat Issues of Key-Point Defense Miniature UAV Raid in Critical Cities[J]. Air & Space Defense, 2025, 8(2): 112-117. |
[3] |
DONG Yubo1 (董玉博), CUI Tao1 (崔涛), ZHOU Yufan1 (周禹帆),
SONG Xun2 (宋勋), ZHU Yue2 (祝月), DONG Peng1∗ (董鹏). Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
[4] |
CHEN Nü, XU Wenhao, WU Biao, CHEN Xiaowen, HOU Dongwei. Random Dispersion Model and Simulation of Model Pore Structure of Cementitious Materials[J]. Journal of Shanghai Jiao Tong University, 2024, 58(11): 1745-1752. |
[5] |
LI Guozhia a(李国志),ZOU Shuizhong b*(邹水中),DING Shuacue a(丁数学). Visual Positioning of Nasal Swab Robot Based on Hierarchical Decision[J]. J Shanghai Jiaotong Univ Sci, 2023, 28(3): 323-329. |
[6] |
HAN Xing, JIANG Jiang, FU Yu-Zhuo, ZHOU Chuan, LIU Zi-Yang, YANG Kai-Kai. Design of System Level Simulation Platform for Dynamic Reconfigurable Many-Core Processor[J]. Journal of Shanghai Jiaotong University, 2013, 47(01): 44-48. |
|
|
|
|