Please wait a minute...
空天防御  2024, Vol. 7 Issue (2): 52-62    
0
  专业技术 本期目录 | 过刊浏览 | 高级检索 |
基于近端策略动态优化的多智能体编队方法
全家乐1, 马先龙1, 沈昱恒2
1. 西北工业大学 航天学院,陕西 西安 710129; 2.上海机电工程研究所,上海 201109
Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies
QUAN Jiale1, MA Xianlong1, SHEN Yuheng2
1. School of Astronautics, Northwestern Polytechnical University, Xi’an 710129, Shaanxi, China; 2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China
全文: PDF(1672 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 无人机集群系统具有能力冗余、抗毁能力强、适应复杂场景等优势,能够实现高效的任务执行和信息获取。近年来,深度强化学习技术被引入无人机集群编队控制方法中,以解决集群维度爆炸和集群系统建模困难的弊端,但深度强化学习面临训练效率低等问题。本文提出了一种基于改进近端策略优化方法的集群编队方法,通过引入动态估计法作为评价机制,解决了传统近端策略优化方法收敛速度慢和忽视高价值动作问题,有效提升了数据利用率。仿真试验证明,该方法能够提高训练效率,解决样本复用问题,具有良好的决策性能。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 无人机集群深度强化学习近端策略优化逆强化学习集群决策    
Abstract:Unmanned aerial vehicle (UAV) cluster systems have advantages in redundancy of capabilities, high destruction resistance, and adaptability to complex scenarios, allowing more efficient mission execution and information acquisition. In recent years, deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems. However, deep reinforcement learning has problems such as low training efficiency. In this paper, a cluster formation method using an improved proximal policy optimization method was proposed. It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism, and effectively improve the data utilization rate. Simulation results verified the improvement in the training efficiency and sample reuse problems, thus achieving the optimized performance.
Key wordsunmanned aerial vehicle clustering    deep reinforcement learning    proximal policy optimization    inverse reinforcement learning    cluster decision making
收稿日期: 2023-10-16      出版日期: 2024-05-11
ZTFLH:  V 249  
  TP 273  
基金资助:国家自然科学基金(61473226)
通讯作者: 马先龙 (1982—),男,博士研究生,副研究员。   
作者简介: 全家乐(1996—),男,博士研究生。
引用本文:   
全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62.
QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies. Air & Space Defense, 2024, 7(2): 52-62.
链接本文:  
https://www.qk.sjtu.edu.cn/ktfy/CN/      或      https://www.qk.sjtu.edu.cn/ktfy/CN/Y2024/V7/I2/52

参考文献
[1] 熊婧伊, 呼卫军, 殷玮, 张伟杰, 颜涛. 多弹集群协同优化决策算法研究[J]. 空天防御, 2024, 7(3): 86-.
[2] 马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[3] 李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[4] 吴诗辉, 贾军, 鲍然, 周宇, 夏青元. 面向集群对抗的多弹协同目标分配模型与仿真分析[J]. 空天防御, 2021, 4(3): 1-9.
[5] 曹莉, 耿斌斌, 周亮, 高森. 无人机集群发射与回收技术发展研究[J]. 空天防御, 2019, 2(2): 68-72.
沪ICP备15013849号-1
版权所有 © 2017《空天防御》编辑部
主管单位:中国航天科技集团有限公司 主办单位:上海机电工程研究所 上海交通大学出版社有限公司