Please wait a minute...
空天防御  2026, Vol. 9 Issue (2): 41-52    
0
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度强化学习的导航信号自适应干扰决策方法
袁景美1, 赵亮2, 孙卓然3,4, 徐志朝3, 牛亚雷3
1. 南京理工大学紫金学院,江苏 南京 210023; 2. 上海机电工程研究所,上海 201109; 3. 北京理工雷科电子信息技术有限公司,北京 100081; 4. 北京理工大学 信息与电子学院雷达技术研究院,北京 100081
Adaptive Jamming Decision Method for Navigation Signals Based on Deep Reinforcement Learning
YUAN Jingmei1, ZHAO Liang2, SUN Zhuoran3,4, XU Zhizhao3, NIU Yalei3
1. Nanjing University of Science and Technology ZiJin College,Nanjing 210023, Jiangsu, China; 2. Shanghai Electro-Mechanical Engineering Institute,Shanghai 201109, China; 3. Beijing Racobit Electronic Information Technology Co.Ltd.,Beijing 100081, China; 4. Radar Technology Research Institute, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
全文: PDF(2076 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 在复杂的电子对抗环境中,如何高效地选择干扰信号参数一直是关键挑战。本文提出了一种基于深度强化学习的导航干扰信号自适应决策方法,可将干扰信号参数优化建模为马尔可夫决策过程:通过合理设计状态空间、动作空间和奖励函数,并引入深度强化学习算法,智能体能够实现干扰参数的动态自适应优化,在环境变化下自主调整干扰策略,从而有效平衡干扰效果与资源利用效率。基于导引头数字仿真平台的全链路仿真结果表明,本文方法在动作空间为50和100的场景下均表现出良好的收敛性与适应性,最终干扰成功率达到99%,频段匹配成功率达到98%,接近穷举法的性能上界。进一步的奖励函数消融实验表明,设计的奖励函数能够有效引导智能体在干扰效果、频段选择与功率消耗之间实现合理权衡,从而形成高效、稳定且具有工程可行性的干扰决策策略。本文研究为导引头导航干扰技术的发展提供了新的思路,并可用于评估导弹导航信号的抗干扰能力边界。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 导航干扰深度强化学习自适应决策    
Abstract:In complex electronic countermeasure environments, efficient selection of jamming signal parameters has remained a critical challenge. This paper proposes an adaptive decision-making method for navigation signal jamming based on deep reinforcement learning, modeling the optimization of jamming signal parameters as a Markov Decision Process (MDP). By rationally designing the state space, action space and reward function, and introducing deep reinforcement learning algorithms, the agent can achieve dynamic adaptive optimization of jamming parameters and autonomously adjust jamming strategies under environmental changes, thereby effectively balancing jamming effectiveness and resource utilization efficiency. Full-link simulation results based on the missile seeker digital simulation platform demonstrate that the proposed method exhibits good convergence and adaptability in scenarios with action spaces of 50 and 100, achieving a final jamming success rate of 99% and frequency band matching success rate of 98%, approaching the performance upper bound of exhaustive search. Further reward function ablation experiments indicate that the designed reward function can effectively guide the agent to achieve reasonable trade-offs among jamming effectiveness, frequency band selection and power consumption, thus forming an efficient, stable and engineering-feasible jamming decision strategy. This research provides new ideas for the development of seeker navigation jamming technology, and can be used to evaluate the anti-jamming capability boundaries of missile navigation signals.
Key wordsnavigation jamming    deep reinforcement learning (DRL)    adaptive decision-making
收稿日期: 2025-10-27      出版日期: 2026-05-06
ZTFLH:  V 249  
通讯作者: 孙卓然(1996—),男,博士,工程师。   
作者简介: 袁景美(1984—),女,硕士,高级工程师。
引用本文:   
袁景美, 赵亮, 孙卓然, 徐志朝, 牛亚雷. 基于深度强化学习的导航信号自适应干扰决策方法[J]. 空天防御, 2026, 9(2): 41-52.
YUAN Jingmei, ZHAO Liang, SUN Zhuoran, XU Zhizhao, NIU Yalei. Adaptive Jamming Decision Method for Navigation Signals Based on Deep Reinforcement Learning. Air & Space Defense, 2026, 9(2): 41-52.
链接本文:  
https://www.qk.sjtu.edu.cn/ktfy/CN/      或      https://www.qk.sjtu.edu.cn/ktfy/CN/Y2026/V9/I2/41

参考文献
[1] 王志博, 呼卫军, 马先龙, 全家乐, 周皓宇. 感知驱动控制的无人机拦截碰撞技术[J]. 空天防御, 2025, 8(4): 78-84.
[2] 周文杰, 付昱龙, 郭相科, 戚玉涛, 张海宾. 基于博弈树与数字平行战场的空战决策方法[J]. 空天防御, 2025, 8(3): 50-58.
[3] 薛雅丽, 徐夏易, 李锦毅, 崔闪, 洪君, 刘世豪. 智能控制技术在导弹制导系统中的应用与发展前景[J]. 空天防御, 2025, 8(2): 1-6.
[4] 全家乐, 马先龙, 沈昱恒. 基于近端策略动态优化的多智能体编队方法[J]. 空天防御, 2024, 7(2): 52-62.
[5] 马驰, 张国群, 孙俊格, 吕广喆, 张涛. 基于深度强化学习的综合电子系统重构方法[J]. 空天防御, 2024, 7(1): 63-70.
沪ICP备15013849号-1
版权所有 © 2017《空天防御》编辑部
主管单位:中国航天科技集团有限公司 主办单位:上海机电工程研究所 上海交通大学出版社有限公司