|
|
Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies |
QUAN Jiale1, MA Xianlong1, SHEN Yuheng2 |
1. School of Astronautics, Northwestern Polytechnical University, Xi’an 710129, Shaanxi, China;
2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China |
|
|
Abstract Unmanned aerial vehicle (UAV) cluster systems have advantages in redundancy of capabilities, high destruction resistance, and adaptability to complex scenarios, allowing more efficient mission execution and information acquisition. In recent years, deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems. However, deep reinforcement learning has problems such as low training efficiency. In this paper, a cluster formation method using an improved proximal policy optimization method was proposed. It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism, and effectively improve the data utilization rate. Simulation results verified the improvement in the training efficiency and sample reuse problems, thus achieving the optimized performance.
|
Received: 16 October 2023
Published: 11 May 2024
|
|
|
|
|
[1] |
MA Chi, ZHANG Guoqun, SUN Junge, LYU Guangzhe, ZHANG Tao. Deep Reinforcement Learning-Based Reconfiguration Method for Integrated Electronic Systems[J]. Air & Space Defense, 2024, 7(1): 63-70. |
[2] |
LI Mengxuan, GUO Jianguo, XU Xinpeng, SHEN Yuheng. Guidance Law Based on Proximal Policy Optimization[J]. Air & Space Defense, 2023, 6(4): 51-57. |
|
|
|
|