Please wait a minute...
空天防御  2024, Vol. 7 Issue (1): 40-47    
0
  专业技术 本期目录 | 过刊浏览 | 高级检索 |
基于分层强化学习的低过载比拦截制导律
王旭1, 蔡远利1, 张学成2, 张荣良3, 韩成龙3
1.西安交通大学 电子与信息学部,陕西 西安 710049; 2. 陆军装备部驻上海地区第三军事代表室, 上海 200031; 3. 上海机电工程研究所,上海 201109
Intercept Guidance Law with a Low Acceleration Ratio Based on Hierarchical Reinforcement Learning
WANG Xu1, CAI Yuanli1, ZHANG Xuecheng2, ZHANG Rongliang3, HAN Chenglong3
1. Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China; 2. Third Military Representative Office of Army Equipment Department in Shanghai, Shanghai 200031, China; 3. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109,China
全文: PDF(1658 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 为解决低过载比和纯角度量测等约束下的三维机动目标拦截制导问题,提出了一种基于分层强化学习的拦截制导律。首先将问题建模为马尔科夫决策过程模型,并考虑拦截能量消耗与弹目视线角速率,设计了一种启发式奖赏函数。其次通过构建具有双层结构的策略网络,并利用上层策略规划阶段性子目标来指导下层策略生成所需的制导指令,实现了拦截交战过程中的视线角速率收敛,以保证能成功拦截机动目标。仿真结果验证了所提出的方法较增强比例导引具有更高的拦截精度和拦截概率,且拦截过程的需用过载更低。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 末制导机动目标拦截低过载比分层强化学习    
Abstract:This paper has proposed an intercept guidance law based on hierarchical reinforcement learning to solve the three-dimensional maneuvering target intercept guidance problem with constraints of low acceleration ratio and bearings-only measurement. The aforementioned problem was initially modelled using a Markov decision process model, where a heuristic reward function was applied considering both the energy consumption and the missile-to-target line of sight (LOS) angular rate. Besides, the policy of two levels was built up with the lower-level policy generating the required guidance command and being supervised by subgoals that were instructed by the higher levels, allowing the convergence of the LOS angular rate and guaranteeing the successful interception against a maneuvering target. Simulation results have validated the superiority of the proposed method compared with the augmented proportional navigation guidance law in terms of intercept accuracy and hit probability, and its required acceleration ratio is much lower.
Key wordsguidance law    maneuvering target intercept    low acceleration ratio    hierarchical reinforcement learning
收稿日期: 2023-10-12      出版日期: 2024-03-04
ZTFLH:  TJ 765  
基金资助:国家自然科学基金项目(62203349,12302061)
通讯作者: 蔡远利(1963—),男,博士,教授,博士生导师。   
作者简介: 王旭(1996—),男,博士研究生。
引用本文:   
王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47.
WANG Xu, CAI Yuanli, ZHANG Xuecheng, ZHANG Rongliang, HAN Chenglong. Intercept Guidance Law with a Low Acceleration Ratio Based on Hierarchical Reinforcement Learning. Air & Space Defense, 2024, 7(1): 40-47.
链接本文:  
https://www.qk.sjtu.edu.cn/ktfy/CN/      或      https://www.qk.sjtu.edu.cn/ktfy/CN/Y2024/V7/I1/40

参考文献
[1] 高伯伦, 李剑, 刘瑞恒, 吕硕, 张晓宇, 张庆振. 带终端角度约束的双闭环末制导律研究[J]. 空天防御, 2022, 5(4): 38-46.
[2] 李昊星, 张迪, 高德亮, 吕瑞恒, 顾村锋. 基于正交多载波的导引头抗多目标干扰技术研究[J]. 空天防御, 2019, 2(2): 31-36.
[3] 臧月进, 李仁俊, 曾亮. 基于变结构的空间杀伤器末制导律研究[J]. 空天防御, 2019, 2(1): 37-41.
[4] 臧月进, 李仁俊, 安国琛. 大气层外反TBM拦截弹变结构末制导律研究[J]. 空天防御, 2018, 1(2): 22-26.
沪ICP备15013849号-1
版权所有 © 2017《空天防御》编辑部
主管单位:中国航天科技集团有限公司 主办单位:上海机电工程研究所 上海交通大学出版社有限公司