基于强化学习的可回收运载火箭着陆制导

全文: PDF(2142 KB)
输出: BibTeX | EndNote (RIS)

摘要可回收运载火箭的着陆制导需要严格保证着陆位置、速度精度，并尽量减小燃料消耗。基于最优控制的着陆制导方法需要依赖火箭精确模型，不具有对模型偏差的泛化能力。针对此问题，本文基于强化学习方法，通过不基于模型的交互采样，训练了神经网络形式的火箭着陆制导策略。首先，建立火箭着陆制导问题的马尔可夫决策过程模型，根据终端约束和燃料消耗指标设计了分阶段奖励函数；然后，在此基础上设计了多层感知机制导策略网络，并使用不基于模型的邻近策略优化算法，通过与火箭着陆制导马尔可夫决策过程的交互采样，实现对制导策略网络的迭代优化；最后，在可回收运载火箭着陆段仿真场景下对制导策略进行验证。仿真结果表明，本文提出的强化学习着陆制导策略能够保证火箭的着陆精度、燃料消耗与最优解相近，且能够泛化至火箭模型参数存在偏差的工况。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：着陆制导, 可回收运载火箭, 最优控制, 强化学习, 垂直回收

Abstract：Landing guidance for reusable launch vehicle should ensure the accuracy of landing position and velocity as well as minimized fuel consumption. Landing guidance methods based on optimal control is based on accurate rocket dynamic model, which corrupts the scalability of guidance methods. To address this problem, a neural network landing guidance policy is developed based on model-free iterative reinforcement learning approach. First, a Markov decision process model of the rocket landing guidance problem is established, and a staged reward function is designed according to the terminal constraints and fuel consumption index; Further, a multilayer perceptron guidance policy network is developed, and a model-free proximal policy optimization algorithm is adopted to achieve iterative optimization of the guidance policy network through interaction with the rocket landing guidance Markov decision process; Finally, the guidance policy is validated under simulations of a reusable launch vehicle landing scenario. The results show that the proposed reinforcement learning landing guidance policy can achieve high landing accuracy, near optimal fuel consumption, and adaptivity to parameter uncertainty of the rocket model.

Key words： landing guidance reusable launch vehicle optimal control reinforcement learning vertical recycling

收稿日期: 2021-07-13 出版日期: 2021-09-06

ZTFLH:

V475.1

作者简介: 何林坤（1997—），男，博士研究生，主要研究方向为制导控制等。

引用本文:

何林坤, 张冉, 龚庆海. 基于强化学习的可回收运载火箭着陆制导[J]. 空天防御, 2021, 4(3): 33-40.
HE Linkun, ZHANG Ran, GONG Qinghai. Landing Guidance of Reusable Launch Vehicle Based on Reinforcement Learning. Air & Space Defense, 2021, 4(3): 33-40.

链接本文:

https://www.qk.sjtu.edu.cn/ktfy/CN/ 或 https://www.qk.sjtu.edu.cn/ktfy/CN/Y2021/V4/I3/33

参考文献

[1]	王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47.
[2]	郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31.
[3]	马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[4]	李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[5]	孙婕, 李子昊, 张书宇. 机器学习在化学合成及表征中的应用[J]. 上海交通大学学报, 2023, 57(10): 1231-1244.
[6]	苏山, 谢永杰, 白瑜亮, 刘印田, 单永志. 微分对策协同对抗制导律方法研究[J]. 空天防御, 2022, 5(2): 58-64.
[7]	尚熙, 杨革文, 戴少怀, 蒋伊琳. 基于强化学习的一对多雷达干扰资源分配策略研究[J]. 空天防御, 2022, 5(1): 94-101.
[8]	李鹏, 阮晓钢, 朱晓庆, 柴洁, 任顶奇, 刘鹏飞. 基于深度强化学习的区域化视觉导航方法[J]. 上海交通大学学报, 2021, 55(5): 575-585.
[9]	李征, 陈建伟, 彭博. 基于伪谱法的无人机集群飞行路径规划[J]. 空天防御, 2021, 4(1): 52-59.
[10]	柴本本1，巫少方1，张建武1，林连华2，徐海港2. 电驱动双速自动变速器换挡过程的最优控制[J]. 上海交通大学学报（自然版）, 2018, 52(6): 658-665.
[11]	周来, 靳晓伟, 郑益凯. 基于深度强化学习的作战辅助决策研究[J]. 空天防御, 2018, 1(1): 31-35.
[12]	张礼学1，王中伟1，杨希祥1，宋庆雷2. 基于Gauss伪谱法的平流层飞艇上升段航迹规划[J]. 上海交通大学学报（自然版）, 2013, 47(08): 1205-1209.
[13]	谢强德a, 杨明a, 王冰a, 王春香b. 一种基于最优控制的车队协作算法[J]. 上海交通大学学报（自然版）, 2011, 45(07): 949-953.
[14]	刘波，陈哨东，贺建良. 基于概率群集的多战机协同空战决策算法 [J]. 上海交通大学学报（自然版）, 2011, 45(02): 257-0261.
[15]	袁德虎,金惠良，孟国香，冯正进. 机器人yoyo轨迹规划与控制[J]. 上海交通大学学报（自然版）, 2010, 44(07): 940-0945.