1- Yazd University
2- university of Kerman
Abstract: (9687 Views)
In this paper, we present a novel continuous reinforcement learning approach. The proposed approach, called "Fuzzy Least Squares Policy Iteration (FLSPI)", is obtained from combination of "Least Squares Policy Iteration (LSPI)" and a zero order Takagi Sugeno fuzzy system. We define state-action basis function based on fuzzy system so that LSPI conditions are satisfied. It is proven that there is an error bound for difference of the exact state-action value function and approximated state-action value function obtained by FLSPI. Simulation results show that learning speed and operation quality for FLSPI are higher than two previous critic-only fuzzy reinforcement learning approaches i.e. fuzzy Q-learning and fuzzy Sarsa learning. Another advantage of this approach is needlessness to learning rate determination.
Type of Article:
Research paper |
Subject:
Special Received: 2014/05/3 | Accepted: 2014/08/30 | Published: 2014/12/11