基于强化学习的多对多拦截目标分配方法

全文: PDF(1104 KB)
输出: BibTeX | EndNote (RIS)

摘要针对空中对抗环境中多对多拦截的武器目标分配问题，提出了一种基于强化学习的多目标智能分配方法。在多对多拦截交战场景下，基于交战态势评估构建了目标分配的数学模型。通过引入目标威胁程度和拦截有效程度的概念，充分反映了各目标的拦截紧迫性和各拦截器的拦截能力表征，从而全面评估了攻防双方的交战态势。在目标分配模型的基础上，将目标分配问题构建为马尔可夫决策过程，并采用基于深度Q网络的强化学习算法训练求解。依靠环境交互下的自学习和奖励机制，有效实现了最优分配方案的动态生成。通过数学仿真构建多对多拦截场景，并验证了该方法的有效性，经训练后的目标分配方法能够满足多对多拦截中连续动态的任务分配要求。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：武器目标分配, 多目标拦截, 态势评估, 强化学习, 深度Q网络

Abstract：Aiming at the issue of weapon target assignment for a many-to-many interception in the air confrontation environment, this study has proposed a multi-target intelligent assignment method based on reinforcement learning. Under the many-to-many interception engagement scenario, a mathematical model of target assignment was established based on the engagement posture evaluation. By introducing the concepts of target threat degree and interception effectiveness degree, the interception urgency of each target and the interception capability characterization of each interceptor were fully reflected, allowing a comprehensive evaluation of the engagement posture of the attacking and defending sides. Based on the target assignment model, the target assignment issue was built up using a Markov decision process and was trained to be solved by a reinforcement learning algorithm using deep Q-network. Relying on the self-learning and reward mechanism under environment interaction, the dynamic generation of optimal assignment schemes was effectively realized. A many-to-many interception scenario was created and its effectiveness was verified through mathematical simulation, and the result shows that the trained target assignment method satisfies the requirements of continuous and dynamic task assignment in many-to-many interception.

Key words： weapon-target assignment multi-target interception situational evaluation reinforcement learning deep Q-network

收稿日期: 2023-09-20 出版日期: 2024-03-04

ZTFLH:

E 927

基金资助:国家自然科学基金(61973254,92271109,52272404)

作者简介: 郭建国（1975—），男，博士，教授。

引用本文:

郭建国, 胡冠杰, 许新鹏, 刘悦, 曹晋. 基于强化学习的多对多拦截目标分配方法[J]. 空天防御, 2024, 7(1): 24-31.
GUO Jianguo, HU Guanjie, XU Xinpeng, LIU Yue, CAO Jin. Reinforcement Learning-Based Target Assignment Method for Many-to-Many Interceptions. Air & Space Defense, 2024, 7(1): 24-31.

链接本文:

https://www.qk.sjtu.edu.cn/ktfy/CN/ 或 https://www.qk.sjtu.edu.cn/ktfy/CN/Y2024/V7/I1/24

参考文献

[1]	马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
[2]	王旭, 蔡远利, 张学成, 张荣良, 韩成龙. 基于分层强化学习的低过载比拦截制导律[J]. 空天防御, 2024, 7(1): 40-47.
[3]	李梦璇, 郭建国, 许新鹏, 沈昱恒. 基于近端策略优化的制导律设计[J]. 空天防御, 2023, 6(4): 51-57.
[4]	尚熙, 杨革文, 戴少怀, 蒋伊琳. 基于强化学习的一对多雷达干扰资源分配策略研究[J]. 空天防御, 2022, 5(1): 94-101.
[5]	郑书坚, 赵文杰, 钟永建, 贺敏, 赵文龙. 面向多目标拦截问题的协同任务分配方法研究[J]. 空天防御, 2021, 4(3): 55-64.
[6]	何林坤, 张冉, 龚庆海. 基于强化学习的可回收运载火箭着陆制导[J]. 空天防御, 2021, 4(3): 33-40.
[7]	李昭莹, 欧一鸣, 石若凌. 基于深度Q网络的改进RRT路径规划算法[J]. 空天防御, 2021, 4(3): 17-23.
[8]	龙腾, 刘震宇, 史人赫, 王生印. 基于神经网络的防空武器目标智能分配方法[J]. 空天防御, 2021, 4(1): 1-7.
[9]	周来, 靳晓伟, 郑益凯. 基于深度强化学习的作战辅助决策研究[J]. 空天防御, 2018, 1(1): 31-35.