基于近端策略动态优化的多智能体编队方法

全文: PDF(1672 KB)
输出: BibTeX | EndNote (RIS)

摘要无人机集群系统具有能力冗余、抗毁能力强、适应复杂场景等优势，能够实现高效的任务执行和信息获取。近年来，深度强化学习技术被引入无人机集群编队控制方法中，以解决集群维度爆炸和集群系统建模困难的弊端，但深度强化学习面临训练效率低等问题。本文提出了一种基于改进近端策略优化方法的集群编队方法，通过引入动态估计法作为评价机制，解决了传统近端策略优化方法收敛速度慢和忽视高价值动作问题，有效提升了数据利用率。仿真试验证明，该方法能够提高训练效率，解决样本复用问题，具有良好的决策性能。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：无人机集群, 深度强化学习, 近端策略优化, 逆强化学习, 集群决策

Abstract：Unmanned aerial vehicle (UAV) cluster systems have advantages in redundancy of capabilities, high destruction resistance, and adaptability to complex scenarios, allowing more efficient mission execution and information acquisition. In recent years, deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems. However, deep reinforcement learning has problems such as low training efficiency. In this paper, a cluster formation method using an improved proximal policy optimization method was proposed. It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism, and effectively improve the data utilization rate. Simulation results verified the improvement in the training efficiency and sample reuse problems, thus achieving the optimized performance.

Key words： unmanned aerial vehicle clustering deep reinforcement learning proximal policy optimization inverse reinforcement learning cluster decision making

收稿日期: 2023-10-16 出版日期: 2024-05-11

ZTFLH:	V 249
	TP 273

基金资助:国家自然科学基金（61473226）

通讯作者: 马先龙 (1982—)，男，博士研究生，副研究员。

作者简介: 全家乐（1996—），男，博士研究生。

引用本文:

全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62.
QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies. Air & Space Defense, 2024, 7(2): 52-62.

链接本文:

https://www.qk.sjtu.edu.cn/ktfy/CN/ 或 https://www.qk.sjtu.edu.cn/ktfy/CN/Y2024/V7/I2/52

参考文献

[1]	熊婧伊, 呼卫军, 殷玮, 张伟杰, 颜涛. 多弹集群协同优化决策算法研究[J]. 空天防御, 2024, 7(3): 86-.
[2]	马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[3]	李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[4]	吴诗辉, 贾军, 鲍然, 周宇, 夏青元. 面向集群对抗的多弹协同目标分配模型与仿真分析[J]. 空天防御, 2021, 4(3): 1-9.
[5]	曹莉, 耿斌斌, 周亮, 高森. 无人机集群发射与回收技术发展研究[J]. 空天防御, 2019, 2(2): 68-72.