Abstract:Unmanned aerial vehicle (UAV) cluster systems have advantages in redundancy of capabilities, high destruction resistance, and adaptability to complex scenarios, allowing more efficient mission execution and information acquisition. In recent years, deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems. However, deep reinforcement learning has problems such as low training efficiency. In this paper, a cluster formation method using an improved proximal policy optimization method was proposed. It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism, and effectively improve the data utilization rate. Simulation results verified the improvement in the training efficiency and sample reuse problems, thus achieving the optimized performance.
全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62.
QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies. Air & Space Defense, 2024, 7(2): 52-62.