Journal of Shanghai Jiao Tong University ›› 2025, Vol. 59 ›› Issue (3): 400-412.doi: 10.16183/j.cnki.jsjtu.2023.344
• New Type Power System and the Integrated Energy • Previous Articles Next Articles
ZHAO Yingying1,2, QIU Yue3, ZHU Tianchen3(
), LI Fan1,2, SU Yun1,2, TAI Zhenying3, SUN Qingyun3, FAN Hang4
Received:2023-07-24
Revised:2023-09-26
Accepted:2023-11-22
Online:2025-03-28
Published:2025-04-02
CLC Number:
ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning[J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.
Add to citation manager EndNote|Ris|BibTeX
URL: https://xuebao.sjtu.edu.cn/EN/10.16183/j.cnki.jsjtu.2023.344
Tab.1
Hyper-parameters of model
| 参数名 | 参数含义 | 参数取值 |
|---|---|---|
| lr_actor | Actor模型初始学习率 | 1×10-5 |
| lr_critic | Critic模型初始学习率 | 1×10-3 |
| max_episode | 训练总回合数 | 2×105 |
| batch_size | 每批次训练样本大小 | 1 024 |
| gradient_clip | 梯度裁剪上限 | 1.0 |
| init_action_std | 动作随机探索噪声初始标准差 | 0.3 |
| active_function | 模型激活函数 | Tanh |
| mlp_num_layers | Actor和Critic隐藏层数目 | 3 |
| history_state_len | 历史信息序列长度 | 25 |
| gru_num_layers | GRU模型结构层数 | 2 |
| gru_hidden_size | GRU模型隐藏层维度 | 64 |
| gcn_hidden_size | GCN模型隐藏层维度 | 32 |
| gcn_dropout | GCN模型舍弃率 | 0.1 |
Tab.2
Evaluation performance in all test cases(mean±variance)
| 算例 | 算法 | xscore | xround |
|---|---|---|---|
| IEEE-118 | Random | -14.09±8.21 | 21.48±12.88 |
| DDPG | 413.65±114.00 | 844.82±192.19 | |
| TD3 | 497.57±65.75 | 919.82±89.09 | |
| A2C | 5.95±1.48 | 58.20±3.46 | |
| PPO | 5.68±1.39 | 56.34±3.06 | |
| StarHeart | 1327.24±103.59 | 2229.83±186.79 | |
| L2RPN-WCCI-2022 | Random | -8.33±6.12 | 20.22±5.84 |
| DDPG | 58.22±16.97 | 126.32±25.17 | |
| TD3 | 46.51±11.35 | 100.96±19.60 | |
| A2C | 5.43±1.71 | 40.07±2.52 | |
| PPO | 6.46±3.23 | 39.71±2.33 | |
| StarHeart | 76.56±8.31 | 223.66±15.20 | |
| SG-126 | Random | 19.94±1.06 | 30.34±1.89 |
| DDPG | 109.38±13.14 | 141.27±16.98 | |
| TD3 | 251.59±27.26 | 371.75±34.36 | |
| A2C | 263.69±21.29 | 573.17±59.44 | |
| PPO | 150.36±44.69 | 262.03±72.14 | |
| StarHeart | 684.30±60.16 | 783.80±79.15 |
| [1] | 王继业. 人工智能赋能源网荷储协同互动的应用及展望[J]. 中国电机工程学报, 2022, 42(21): 7667-7681. |
| WANG Jiye. Application and prospect of source-grid-load-storage coordination enabled by artificial intelligence[J]. Proceedings of the CSEE, 2022, 42(21): 7667-7681. | |
| [2] |
叶志亮, 黎灿兵, 张勇军, 等. 含高比例气象敏感可再生能源电网日前调度时间颗粒度优化[J]. 上海交通大学学报, 2023, 57(7): 781-790.
doi: 10.16183/j.cnki.jsjtu.2022.277 |
| YE Zhiliang, LI Canbing, ZHANG Yongjun, et al. Optimization of day-ahead dispatch time resolution in power system with a high proportion of climate-sensitive renewable energy sources[J]. Journal of Shanghai Jiao Tong University, 2023, 57(7): 781-790. | |
| [3] | RIFFONNEAU Y, BACHA S, BARRUEL F, et al. Optimal power flow management for grid connected PV systems with batteries[J]. IEEE Transactions on Sustainable Energy, 2011, 2(3): 309-320. |
| [4] | AN L N, QUOC-TUAN T. Optimal energy management for grid connected microgrid by using dynamic programming method[C]//2015 IEEE Power & Energy Society General Meeting. Denver, USA: IEEE, 2015: 1-5. |
| [5] | 李鹏, 王加浩, 黎灿兵, 等. 计及源荷不确定性与设备变工况特性的园区综合能源系统协同优化运行方法[J]. 中国电机工程学报, 2023, 43(20): 7802-7811. |
| LI Peng, WANG Jiahao, LI Canbing, et al. Collaborative optimal scheduling of the community integrated energy system considering source-load uncertainty and equipment off-design performance[J]. Proceedings of the CSEE, 2023, 43(20): 7802-7811. | |
| [6] | GUO Y F, WU Q W, GAO H L, et al. Double-time-scale coordinated voltage control in active distribution networks based on MPC[J]. IEEE Transactions on Sustainable Energy, 2020, 11(1): 294-303. |
| [7] |
陈雨婷, 赵毅, 吴俊达, 等. 考虑碳排放指标的配电网经济调度方法[J]. 上海交通大学学报, 2023, 57(4): 442-451.
doi: 10.16183/j.cnki.jsjtu.2021.482 |
| CHEN Yuting, ZHAO Yi, WU Junda, et al. Economic dispatch method of distribution network considering carbon emission index[J]. Journal of Shanghai Jiao Tong University, 2023, 57(4): 442-451. | |
| [8] | 戚艳, 尚学军, 聂靖宇, 等. 基于改进多目标灰狼算法的冷热电联供型微电网运行优化[J]. 电测与仪表, 2022, 59(6): 12-19. |
| QI Yan, SHANG Xuejun, NIE Jingyu, et al. Optimization of CCHP micro-grid operation based on improved multi-objective grey wolf algorithm[J]. Electrical Measurement & Instrumentation, 2022, 59(6): 12-19. | |
| [9] | 刘新苗, 李卓环, 曾凯文, 等. 基于集群负荷预测的主动配电网多目标优化调度[J]. 电测与仪表, 2021, 58(5): 98-104. |
| LIU Xinmiao, LI Zhuohuan, ZENG Kaiwen, et al. Multi-objective optimal dispatching of active distribution network based on cluster load prediction[J]. Electrical Measurement & Instrumentation, 2021, 58(5): 98-104. | |
| [10] | HIJJO M, FELGNER F, FREY G. PV-Battery-Diesel microgrid layout design based on stochastic optimization[C]//2017 6th International Conference on Clean Electrical Power. Santa Margherita Ligure, Italy: IEEE, 2017: 30-35. |
| [11] |
潘险险, 陈霆威, 许志恒, 等. 适应多场景的微电网一体化柔性规划方法[J]. 上海交通大学学报, 2022, 56(12): 1598-1607.
doi: 10.16183/j.cnki.jsjtu.2021.402 |
| PAN Xianxian, CHEN Tingwei, XU Zhiheng, et al. A multi-scenario integrated flexible planning method for microgrid[J]. Journal of Shanghai Jiao Tong University, 2022, 56(12): 1598-1607. | |
| [12] |
符杨, 丁枳尹, 米阳. 计及储能调节的时滞互联电力系统频率控制[J]. 上海交通大学学报, 2022, 56(9): 1128-1138.
doi: 10.16183/j.cnki.jsjtu.2022.145 |
| FU Yang, DING Zhiyin, MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University, 2022, 56(9): 1128-1138. | |
| [13] |
李珂, 邰能灵, 张沈习. 基于改进粒子群算法的配电网综合运行优化[J]. 上海交通大学学报, 2017, 51(8): 897-902.
doi: 10.16183/j.cnki.jsjtu.2017.08.001 |
| LI Ke, TAI Nengling, ZHANG Shenxi. Comprehensive optimal dispatch of distribution network based on improved particle swarm optimization algorithm[J]. Journal of Shanghai Jiao Tong University, 2017, 51(8): 897-902. | |
| [14] | BADAWY M O, SOZER Y. Power flow management of a grid tied PV-battery system for electric vehicles charging[J]. IEEE Transactions on Industry Applications, 2017, 53(2): 1347-1357. |
| [15] | ERICK A O, FOLLY K A. Reinforcement learning approaches to power management in grid-tied microgrids: A review[C]//2020 Clemson University Power Systems Conference. Clemson, USA: IEEE, 2020: 1-6. |
| [16] | JI Y, WANG J H, XU J C, et al. Real-time energy management of a microgrid using deep reinforcement learning[J]. Energies, 2019, 12(12): 2291. |
| [17] | 余涛, 刘靖, 胡细兵. 基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J]. 电工技术学报, 2012, 27(4): 185-192. |
| YU Tao, LIU Jing, HU Xibing. Optimal power flow for complex power grid using distributed multi-step backtrack Q(λ) learning[J]. Transactions of China Electrotechnical Society, 2012, 27(4): 185-192. | |
| [18] | WEI Y F, ZHANG Z Q, YU F R, et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning[C]//GLOBECOM 2017-2017 IEEE Global Communications Conference. Singapore: IEEE, 2017: 1-5. |
| [19] | 朱介北, 徐思旸, 李炳森, 等. 基于电网专家策略模仿学习的新型电力系统实时调度[J]. 电网技术, 2023, 47(2): 517-530. |
| ZHU Jiebei, XU Siyang, LI Bingsen, et al. Real-time security dispatch of modern power system based on grid expert strategy imitation learning[J]. Power System Technology, 2023, 47(2): 517-530. | |
| [20] | HU J X, YE Y J, TANG Y, et al. Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach[J]. IEEE Transactions on Power Systems, 2024, 39(2): 3972-3986. |
| [21] | CUI H, YE Y J, HU J X, et al. Online preventive control for transmission overload relief using safe reinforcement learning with enhanced spatial-temporal awareness[J]. IEEE Transactions on Power Systems, 2024, 39(1): 517-532. |
| [22] |
俞发强, 张名捷, 程语, 等. 需求响应下的并网型风-光-沼微能源网优化配置[J]. 上海交通大学学报, 2023, 57(1): 10-16.
doi: 10.16183/j.cnki.jsjtu.2022.017 |
| YU Faqiang, ZHANG Mingjie, CHENG Yu, et al. Optimal sizing of grid-connected wind-solar-biogas integrated energy system considering demand response[J]. Journal of Shanghai Jiao Tong University, 2023, 57(1): 10-16. | |
| [23] | ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. |
| [24] | PATERIA S, SUBAGDJA B, TAN A H, et al. Hierarchical reinforcement learning[J]. ACM Computing Surveys, 2022, 54(5): 1-35. |
| [25] | YOON D, HONG S, LEE B J, et al. Winning the l2RPN challenge: Power grid management via semi-markov afterstate actor-critic[C]//The Ninth International Conference on Learning Representations. Vienna, Austria: ICLR, 2021: 1-18. |
| [26] | KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks[DB/OL]. (2017-02-22)[2023-07-22]. https://arxiv.org/abs/1609.02907.pdf. |
| [27] | WU L Z, KONG C, HAO X H, et al. A short-term load forecasting method based on GRU-CNN hybrid neural network model[J]. Mathematical Problems in Engineering, 2020, 2020: 1428104. |
| [28] | LAN T, DUAN J J, ZHANG B, et al. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs[C]//2020 IEEE Power & Energy Society General Meeting. Montreal, Canada: IEEE, 2020: 1-5. |
| [29] | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2015-09-09)[2023-07-22]. http://arxiv.org/abs/1509.02971v6. |
| [30] | SERRÉ G, BOGUSLAWSKI E, DONNOT B, et al. Reinforcement learning for Energies of the future and carbon neutrality: A challenge design[DB/OL]. (2022-07-21) [2023-07-22]. http://arxiv.org/abs/2207.10330v1. |
| [31] | DORFER M, FUXJÄGER A R, KOZÁK K, et al. Power grid congestion management via topology optimization with AlphaZero[DB/OL]. (2022-11-10)[2023-07-22]. https://arxiv.org/abs/2211.05612.pdf. |
| [32] | 季颖, 王建辉. 基于深度强化学习的微电网在线优化调度[J]. 控制与决策, 2022, 37(7): 1675-1684. |
| JI Ying, WANG Jianhui. Online optimal scheduling of a microgrid based on deep reinforcement learning[J]. Control & Decision, 2022, 37(7): 1675-1684. | |
| [33] | 王甜婧, 汤涌, 郭强, 等. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2405. |
| WANG Tianjing, TANG Yong, GUO Qiang, et al. Automatic adjustment method of power flow calculation convergence for large-scale power grid based on knowledge experience and deep reinforcement learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2405. | |
| [34] | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-07-20)[2023-07-22]. http://arxiv.org/abs/1707.06347v2. |
| [35] | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-01-04) [2023-07-22]. http://arxiv.org/abs/1801.01290v2. |
| [36] | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. (2018-02-26)[2023-07-22]. http://arxiv.org/abs/1802.09477v3. |
| [1] | Li Mingwang, Li Xinde, Zhang Zhentong, Wang Zeyu, Zhao Haoming. Haptic-Aided Navigation Vehicle: Enhancing Obstacle Detection in Blind Spots and Transparent Object Scenarios [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 167-175. |
| [2] | Xia Jie, Wu Xiaodong, Xu Min. BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 154-166. |
| [3] | CHEN Shi, YANG Linsen, LIU Yihong, LUO Huan, ZANG Tianlei, ZHOU Buxiang. Optimal Scheduling Strategy of Newly-Built Microgrid in Small Sample Data-Driven Mode [J]. Journal of Shanghai Jiao Tong University, 2025, 59(6): 732-745. |
| [4] | YU Xinyi, XU Siyu, FAN Yuehai, OU Linlin. Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1085-1102. |
| [5] | LI Mingyang, BAO Hujun, HUANG Jin. Dynamic Cloth Folding Using Curriculum Learning [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997. |
| [6] | ZHANG Wanying, SIMA Ke, ZHANG Yuhe, MENG Jian, YANG Zhen, ZHOU Deyun. The Guidance and Control Method of Multi-Missile Cooperative Encirclement of Maneuvering Targets Based on Proximal Policy Optimization [J]. Air & Space Defense, 2025, 8(4): 94-103. |
| [7] | WANG Zhibo, HU Weijun, MA Xianlong, QUAN Jiale, ZHOU Haoyu. Perception-Driven-Controlled UAV Interception and Collision Technology [J]. Air & Space Defense, 2025, 8(4): 78-84. |
| [8] | DU Junnan, SHUAI Yixian, CHEN Ding, WANG Min, ZHOU Jinpeng. A Cooperative Deployment Algorithm for Marine Fleet Detection Nodes Based on Constrained Reinforcement Learning [J]. Air & Space Defense, 2025, 8(3): 95-103. |
| [9] | LI Yijia, LI Jianuo, KE Liangjun. Design and Verification of UAV Cooperative Defense Strategy Based on Reinforcement Learning [J]. Air & Space Defense, 2025, 8(3): 73-85. |
| [10] | ZHOU Wenjie, FU Yulong, GUO Xiangke, QI Yutao, ZHANG Haibin. Air Combat Decision-Making Method Based on Game Tree and Digital Parallel Simulation Battlefield [J]. Air & Space Defense, 2025, 8(3): 50-58. |
| [11] | LIU Yanhang, QIAO Ruyu, LIANG Nan, CHEN Yu, YU Kai, WU Hanxiao. Renewable Energy Consumption Strategies of Power System Integrated with Electric Vehicle Clusters Based on Load Alignment and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(10): 1464-1475. |
| [12] | YANG Yinghe, WEI Handi, FAN Dixia, LI Ang. Optimization Method of Underwater Flapping Foil Propulsion Performance Based on Gaussian Process Regression and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(1): 70-78. |
| [13] | ZHOU Yi, ZHOU Liangcai, SHI Di, ZHAO Xiaoying, SHAN Xin. Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2024, 58(5): 682-692. |
| [14] | LIU Huahua, WANG Qing. Multi-Aircraft Target Assignment Method Based on Reinforcement Learning [J]. Air & Space Defense, 2024, 7(5): 65-72. |
| [15] | DONG Yubo1 (董玉博), CUI Tao1 (崔涛), ZHOU Yufan1 (周禹帆), SONG Xun2 (宋勋), ZHU Yue2 (祝月), DONG Peng1∗ (董鹏). Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning [J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 646-655. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||