|
|
The Guidance and Control Method of Multi-Missile Cooperative Encirclement of Maneuvering Targets Based on Proximal Policy Optimization |
ZHANG Wanying1, SIMA Ke2, ZHANG Yuhe3, MENG Jian3,
YANG Zhen3, ZHOU Deyun3 |
1. College of Microelectronics, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China;
2. Shanghai Electro-Mechanical Engineering Institute, Shanghai 201109, China; 3. College of
Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China |
|
|
Abstract To resolve cooperative encirclement by multiple missiles against a manoeuvring target in three-dimensional space, this study proposed an impact-time-control cooperative guidance using proximal policy optimisation (PPO). Firstly, the impact-time-control cooperative guidance model was constructed based on the extended proportional guidance, and the cooperative guidance time error term was improved. Then, the state and action space models for the Markov Decision Process were designed, and the reward function was constructed as a variable-step model combining dense and sparse rewards. The cooperative guidance model was trained using PPO, mapping the guidance state information to the cooperative guidance law. Finally, a multiple-missile cooperative encirclement scenario was established, showcasing the cooperative guidance's ability to achieve model-free, end-to-end coordinated attack timing. Monte Carlo experiments further verified the robustness of its guidance in disturbed environments.
|
Received: 27 April 2025
Published: 10 September 2025
|
|
|
|
|
[1] |
CHEN Shi, YANG Linsen, LIU Yihong, LUO Huan, ZANG Tianlei, ZHOU Buxiang. Optimal Scheduling Strategy of Newly-Built Microgrid in Small Sample Data-Driven Mode[J]. Journal of Shanghai Jiao Tong University, 2025, 59(6): 732-745. |
[2] |
WANG Zhibo, HU Weijun, MA Xianlong, QUAN Jiale, ZHOU Haoyu. Perception-Driven-Controlled UAV Interception and Collision Technology[J]. Air & Space Defense, 2025, 8(4): 78-84. |
[3] |
ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412. |
[4] |
DU Junnan, SHUAI Yixian, CHEN Ding, WANG Min, ZHOU Jinpeng. A Cooperative Deployment Algorithm for Marine Fleet Detection Nodes Based on Constrained Reinforcement Learning[J]. Air & Space Defense, 2025, 8(3): 95-103. |
[5] |
LI Yijia, LI Jianuo, KE Liangjun. Design and Verification of UAV Cooperative Defense Strategy Based on Reinforcement Learning[J]. Air & Space Defense, 2025, 8(3): 73-85. |
[6] |
ZHOU Wenjie, FU Yulong, GUO Xiangke, QI Yutao, ZHANG Haibin. Air Combat Decision-Making Method Based on Game Tree and Digital Parallel Simulation Battlefield[J]. Air & Space Defense, 2025, 8(3): 50-58. |
[7] |
ZHANG Yuge, GENG Jianqiang, YANG Guangyu, ZHU Supeng, HOU Zhenqian, FU Wenxing. Multi-Missile Cooperative Passive Localization Algorithm Based on IMM-SRCKF for Maneuvering Targets[J]. Air & Space Defense, 2025, 8(2): 58-65. |
[8] |
YANG Yinghe, WEI Handi, FAN Dixia, LI Ang. Optimization Method of Underwater Flapping Foil Propulsion Performance Based on Gaussian Process Regression and Deep Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(1): 70-78. |
[9] |
LIU Huahua, WANG Qing. Multi-Aircraft Target Assignment Method Based on Reinforcement Learning[J]. Air & Space Defense, 2024, 7(5): 65-72. |
[10] |
ZHOU Yi, ZHOU Liangcai, SHI Di, ZHAO Xiaoying, SHAN Xin. Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2024, 58(5): 682-692. |
[11] |
DONG Yubo1 (董玉博), CUI Tao1 (崔涛), ZHOU Yufan1 (周禹帆),
SONG Xun2 (宋勋), ZHU Yue2 (祝月), DONG Peng1∗ (董鹏). Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
[12] |
LI Shuyi (李舒逸), LI Minzhe (李旻哲), JING Zhongliang∗ (敬忠良). Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 601-612. |
[13] |
ZHAO Yingce(赵英策), ZHANG Guanghao(张广浩), XING Zhengyu(邢正宇), LI Jianxun(李建勋). Hierarchical Reinforcement Learning Adversarial Algorithm Against Opponent with Fixed Offensive Strategy[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 471-479. |
[14] |
MIAO Zhenhua(苗镇华), HUANG Wentao(黄文焘), ZHANG Yilian(张依恋), FAN Qinqin(范勤勤). Multi-Robot Task Allocation Using Multimodal Multi-Objective
Evolutionary Algorithm Based on Deep Reinforcement Learning[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 377-387. |
[15] |
QUAN Jiale, MA Xianlong, SHEN Yuheng. Multi-agent Formation Method Based on Dynamic Optimization of Proximal Policies[J]. Air & Space Defense, 2024, 7(2): 52-62. |
|
|
|
|