A Double Competitive Scheme Based Learning Automata Algorithm

doi:10.16183/j.cnki.jsjtu.2018.10.018

Abstract

Abstract: Learning automaton (LA) plays an essential part in the theoretical system of reinforcement learning. Not only does LA have important theoretical value, but also has extensive practical value, including many applications in security field. As classic family of LA, estimator algorithms have improved the performance of algorithms to a novel level. However, because of the limitation of estimators, the estimated values are not reliable at the beginning of learning process, which will lead to wrong updates of actions’ selection probability vector. Using current update strategy, the offset of wrong updates only relies on large number of extra iterations. Thus, there is still room for enhancement of convergence efficiency. In this paper, for improving the update strategy of probability vector, we proposed a double competitive scheme based learning automata algorithm and proved its ε-optimality. The simulation results showed that the proposed algorithms reached the highest convergence efficiency and confirmed the validity of proposed probability vector’s update strategy.

Key words: learning automata, reinforcement learning, stationary environments, estimator algorithms, double competitive scheme

CLC Number:

TP 181

DI Chong,QI Kaiyue,WU Yue,SU Yu,LI Shenghong. A Double Competitive Scheme Based Learning Automata Algorithm[J]. Journal of Shanghai Jiao Tong University, 2018, 52(10): 1292-1297.

References

［1］NARENDRA K S, THATHACHAR M A L. Learning automata-a survey［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1974, (4): 323-334. ［2］KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey［J］. Journal of Artificial Intelligence Research, 1996, (4): 237-285. ［3］郝圣, 张沪寅, 宋梦凯.基于学习自动机理论与稳定性控制的自适应移动无线AdHoc网络分簇策略［J/OL］. 计算机学报, 2017, 40: 1-20. HAO Sheng, ZHANG Huyin, SONG Mengkai. An adaptive clustering strategy for MANET based on learning automata theory and stability control［J/OL］. Chinese Journal of Computers, 2017, 40: 1-20. ［4］王建平, 陈改霞, 孔德川, 等.一种基于学习自动机的WSN区域覆盖算法［J］.数据采集与处理, 2014, 29(6): 1016-1022. WANG Jianping, CHEN Gaixia, KONG Dechuan, et al. Learning automata-based area coverage algorithm for wireless sensor networks［J］. Journal of Data Acquisition and Processing, 2014, 29(6): 1016-1022. ［5］JIAO Lei, ZHANG Xuan, OOMMEN B J, et al. Optimizing channel selection for cognitive radio networks using a distributed Bayesian learning automata-based approach［J］. Applied Intelligence, 2016, 44(2): 307-321. ［6］REZVANIAN A, SAGHIRI A M, VAHIDIPOUR S M, et al. Learning automata for wireless sensor networks［M］. Cham: Springer, 2018: 91-219. ［7］DI C, SU Y, HAN Z, et al. Learning automata based SVM for intrusion detection［C］//International Conference in Communications, Signal Processing, and Systems. Singapore: Springer, 2017: 2067-2074. ［8］ZHANG Y, DI C, HAN Z, et al. An adaptive honeypot deployment algorithm based on learning automata［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway NJ: ［s.n.］, 2017: 521-527. ［9］GE H, HUANG J, DI C, et al. Learning automata based approach for influence maximization problem on social networks［C］//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC). Piscataway, NJ: ［s.n.］, 2017: 108-117. ［10］TSETLIN M L. On behaviour of finite automata in random medium［J］. Avtomatika i Telemekhanika, 1961 (22): 1345-1354. ［11］VARSHAVSKII V I, VORONTSOVA I P. On the behavior of stochastic automata with a variable structure［J］. Avtomatika i Telemekhanika, 1963, 24(3): 353-360. ［12］OOMMEN B J, HANSEN E. The asymptotic optimality of discretized linear reward-inaction learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1984 (3): 542-545. ［13］THATHACHAR M A L. A class of rapidly converging algorithms for learning automata［C］//IEEE International Conference on Cybernetics and Society. Piscataway, NJ: ［s.n.］, 1984: 602-606. ［14］OOMMEN B J, LANCTT J K. Discretized pursuit learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(4): 931-938. ［15］AGACHE M, OOMMEN B J. Generalized pursuit learning schemes: New families of continuous and discretized learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2002, 32(6): 738-749. ［16］PAPADIMITRIOU G I, SKLIRA M, POMPORTSIS A S. A new class of/spl epsi/-optimal learning automata［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(1): 246-254. ［17］GE H, JIANG W, LI S, et al. A novel estimator based learning automata algorithm［J］. Applied Intelligence, 2015, 42(2): 262-275.

[1]	Li Mingwang, Li Xinde, Zhang Zhentong, Wang Zeyu, Zhao Haoming. Haptic-Aided Navigation Vehicle: Enhancing Obstacle Detection in Blind Spots and Transparent Object Scenarios [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 167-175.
[2]	Xia Jie, Wu Xiaodong, Xu Min. BEV-Fused Imitation and Reinforcement Learning for Autonomous Driving Planning [J]. J Shanghai Jiaotong Univ Sci, 2026, 31(1): 154-166.
[3]	CHEN Shi, YANG Linsen, LIU Yihong, LUO Huan, ZANG Tianlei, ZHOU Buxiang. Optimal Scheduling Strategy of Newly-Built Microgrid in Small Sample Data-Driven Mode [J]. Journal of Shanghai Jiao Tong University, 2025, 59(6): 732-745.
[4]	YU Xinyi, XU Siyu, FAN Yuehai, OU Linlin. Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1085-1102.
[5]	LI Mingyang, BAO Hujun, HUANG Jin. Dynamic Cloth Folding Using Curriculum Learning [J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 988-997.
[6]	ZHANG Wanying, SIMA Ke, ZHANG Yuhe, MENG Jian, YANG Zhen, ZHOU Deyun. The Guidance and Control Method of Multi-Missile Cooperative Encirclement of Maneuvering Targets Based on Proximal Policy Optimization [J]. Air & Space Defense, 2025, 8(4): 94-103.
[7]	WANG Zhibo, HU Weijun, MA Xianlong, QUAN Jiale, ZHOU Haoyu. Perception-Driven-Controlled UAV Interception and Collision Technology [J]. Air & Space Defense, 2025, 8(4): 78-84.
[8]	ZHAO Yingying, QIU Yue, ZHU Tianchen, LI Fan, SU Yun, TAI Zhenying, SUN Qingyun, FAN Hang. Online Steady-State Scheduling of New Power Systems Based on Hierarchical Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(3): 400-412.
[9]	DU Junnan, SHUAI Yixian, CHEN Ding, WANG Min, ZHOU Jinpeng. A Cooperative Deployment Algorithm for Marine Fleet Detection Nodes Based on Constrained Reinforcement Learning [J]. Air & Space Defense, 2025, 8(3): 95-103.
[10]	LI Yijia, LI Jianuo, KE Liangjun. Design and Verification of UAV Cooperative Defense Strategy Based on Reinforcement Learning [J]. Air & Space Defense, 2025, 8(3): 73-85.
[11]	ZHOU Wenjie, FU Yulong, GUO Xiangke, QI Yutao, ZHANG Haibin. Air Combat Decision-Making Method Based on Game Tree and Digital Parallel Simulation Battlefield [J]. Air & Space Defense, 2025, 8(3): 50-58.
[12]	LIU Yanhang, QIAO Ruyu, LIANG Nan, CHEN Yu, YU Kai, WU Hanxiao. Renewable Energy Consumption Strategies of Power System Integrated with Electric Vehicle Clusters Based on Load Alignment and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(10): 1464-1475.
[13]	YANG Yinghe, WEI Handi, FAN Dixia, LI Ang. Optimization Method of Underwater Flapping Foil Propulsion Performance Based on Gaussian Process Regression and Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2025, 59(1): 70-78.
[14]	LIU Huahua, WANG Qing. Multi-Aircraft Target Assignment Method Based on Reinforcement Learning [J]. Air & Space Defense, 2024, 7(5): 65-72.
[15]	ZHOU Yi, ZHOU Liangcai, SHI Di, ZHAO Xiaoying, SHAN Xin. Coordinated Active Power-Frequency Control Based on Safe Deep Reinforcement Learning [J]. Journal of Shanghai Jiao Tong University, 2024, 58(5): 682-692.