Please wait a minute...
空天防御  2024, Vol. 7 Issue (1): 24-31    
0
  专业技术 本期目录 | 过刊浏览 | 高级检索 |
基于强化学习的多对多拦截目标分配方法
郭建国1, 胡冠杰1, 许新鹏1,2, 刘悦2, 曹晋2
1. 西北工业大学 航天学院精确制导与控制研究所,陕西 西安 710072; 2. 上海机电工程研究所,上海 201109
Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions
GUO Jianguo1, HU Guanjie1, XU Xinpeng1,2, LIU Yue2, CAO Jin2
1. Institute of Precision Guidance and Control, School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China; 2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China
全文: PDF(1104 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 针对空中对抗环境中多对多拦截的武器目标分配问题,提出了一种基于强化学习的多目标智能分配方法。在多对多拦截交战场景下,基于交战态势评估构建了目标分配的数学模型。通过引入目标威胁程度和拦截有效程度的概念,充分反映了各目标的拦截紧迫性和各拦截器的拦截能力表征,从而全面评估了攻防双方的交战态势。在目标分配模型的基础上,将目标分配问题构建为马尔可夫决策过程,并采用基于深度Q网络的强化学习算法训练求解。依靠环境交互下的自学习和奖励机制,有效实现了最优分配方案的动态生成。通过数学仿真构建多对多拦截场景,并验证了该方法的有效性,经训练后的目标分配方法能够满足多对多拦截中连续动态的任务分配要求。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 武器目标分配多目标拦截态势评估强化学习深度Q网络    
Abstract:Aiming at the issue of weapon target assignment for a many-to-many interception in the air confrontation environment, this study has proposed a multi-target intelligent assignment method based on reinforcement learning. Under the many-to-many interception engagement scenario, a mathematical model of target assignment was established based on the engagement posture evaluation. By introducing the concepts of target threat degree and interception effectiveness degree, the interception urgency of each target and the interception capability characterization of each interceptor were fully reflected, allowing a comprehensive evaluation of the engagement posture of the attacking and defending sides. Based on the target assignment model, the target assignment issue was built up using a Markov decision process and was trained to be solved by a reinforcement learning algorithm using deep Q-network. Relying on the self-learning and reward mechanism under environment interaction, the dynamic generation of optimal assignment schemes was effectively realized. A many-to-many interception scenario was created and its effectiveness was verified through mathematical simulation, and the result shows that the trained target assignment method satisfies the requirements of continuous and dynamic task assignment in many-to-many interception.
Key wordsweapon-target assignment    multi-target interception    situational evaluation    reinforcement learning    deep Q-network
收稿日期: 2023-09-20      出版日期: 2024-03-04
ZTFLH:  E 927  
基金资助:国家自然科学基金(61973254,92271109,52272404)
作者简介: 郭建国(1975—),男,博士,教授。
引用本文:   
郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31.
GUO Jianguo, HU Guanjie, XU Xinpeng, LIU Yue, CAO Jin. Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions. Air & Space Defense, 2024, 7(1): 24-31.
链接本文:  
https://www.qk.sjtu.edu.cn/ktfy/CN/      或      https://www.qk.sjtu.edu.cn/ktfy/CN/Y2024/V7/I1/24

参考文献
[1] 马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[2] 王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47.
[3] 李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[4] 尚熙, 杨革文, 戴少怀, 蒋伊琳. 基于强化学习的一对多雷达干扰资源分配策略研究[J]. 空天防御, 2022, 5(1): 94-101.
[5] 郑书坚, 赵文杰, 钟永建, 贺敏, 赵文龙. 面向多目标拦截问题的协同任务分配方法研究[J]. 空天防御, 2021, 4(3): 55-64.
[6] 何林坤, 张冉, 龚庆海. 基于强化学习的可回收运载火箭着陆制导[J]. 空天防御, 2021, 4(3): 33-40.
[7] 李昭莹, 欧一鸣, 石若凌. 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4(3): 17-23.
[8] 龙腾, 刘震宇, 史人赫, 王生印. 基于神经网络的防空武器目标智能分配方法[J]. 空天防御, 2021, 4(1): 1-7.
[9] 周来, 靳晓伟, 郑益凯. 基于深度强化学习的作战辅助决策研究[J]. 空天防御, 2018, 1(1): 31-35.
沪ICP备15013849号-1
版权所有 © 2017《空天防御》编辑部
主管单位:中国航天科技集团有限公司 主办单位:上海机电工程研究所 上海交通大学出版社有限公司