1. [1] Sutton, R. S., and Barto, A. G., Reinforcement learning: An introduction, Second Edition, MIT Press, Massachusetts, 2017.
2. [2] Derhami, V., Alamiyan, F., Dowlatshahi, M.B., Reinforcement Learning, Yazd University Press, 2017.
3. [3] Derhami, V., Mehrabi, O., Action value function approximation based on radial basis function network for reinforcement learning, Journal of control, Vol.5, No. 1, pp. 50-63, 2011.
4. [4] Liu, Y. J., Tang, L., Tong, S., Chen, C. P., and Li, D. J., Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no.1, pp. 165-176, 2015. [
DOI:10.1109/TNNLS.2014.2360724]
5. [5] Derhami, V., V.J. Majd, and M.N. Ahmadabadi, Fuzzy Sarsa Learning and The Proof of Existence of its Stationary Points, Asian Journal of Control, pp. 535-549, 2008. [
DOI:10.1002/asjc.54]
6. [6] Ghorbani, F., Derhami, V., and Afsharchi, M., Fuzzy Least Square Policy Iteration and Its Mathematical Analysis, International Journal of fuzzy systems, pp.1-14, 2016. [
DOI:10.1007/s40815-016-0270-1]
7. [7] Barakat, A., Bianchi, P., and Lehmann, J. Analysis of a target-based actor-critic algorithm with linear function approximation. CoRR, abs/2106.07472, 2021.
8. [8] Zaki, M., Mohan, A., Goplan, A., and Manner, S., Actor-Critic based Improper Reinforcement Learning, arXiv, 2022.
9. [9] Allahverdy, D., Fakharian, A. & Menhaj, M.B. Back-Stepping Integral Sliding Mode Control with Iterative Learning Control Algorithm for Quadrotor UAVs. J. Electr. Eng. Technol. 14, 2539-2547, 2019. [
DOI:10.1007/s42835-019-00257-z]
10. [10] Sheikhlar, A., and Fakharian, A. "Online policy iteration-based tracking control of four wheeled omni-directional robots." Journal of Dynamic Systems, Measurement, and Control 140, no. 8, 2018. [
DOI:10.1115/1.4039287]
11. [11] Jia, Y., and Zhou, X. Y., "Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms", arXiv, 2021. [
DOI:10.2139/ssrn.3969101]
12. [12] Lagoudakis, M. G., and Par, R., Least-squares policy iteration, Journal of Machine Learning Research, p. 1107-1249, 2003.
13. [13] Hwang, K.S., Tan, S.W., and Tsai, M. C., "Reinforcement Learning to Adaptive Control of Nonlinear Systems", IEEE Transactions on Systems, Man, and Cybernetics-Part B, Vol.33, No.3, pp.514-521, 2003. [
DOI:10.1109/TSMCB.2003.811112]
14. [14] Bus¸oniu, L., et al., Online least-squares policy iteration for reinforcement learning control, American Control Conference (ACC-10), 2010. [
DOI:10.1109/ACC.2010.5530856]
15. [15] Buşoniu, L., Lazaric, A., Ghavamzadeh, M., Munos, R., Babuška, R., De Schutter, B., Least-squares methods for policy iteration. In: Wiering, M., van Otterlo, M. (Eds.), Reinforcement Learning: State-of-the-Art. In: Adaptation, Learning, and Optimization, vol. 12, Springer, Heidelberg, Germany, pp. 75-109, 2012. [
DOI:10.1007/978-3-642-27645-3_3]
16. [16] Xu, X., Hu, D., Lu, X., Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18 (4), 973-992, 2007. [
DOI:10.1109/TNN.2007.899161]
17. [17] Yahyaa, S., Manderick, B., Knowledge gradient for online reinforcement learning. In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (Eds.), Agents and Artificial Intelligence. In: ICAART 2014 LNCS, vol. 8946, Springer, Cham, pp. 103-118, 2014. [
DOI:10.1007/978-3-319-25210-0_7]
18. [18] Jakab, H.S., Csató, L., Sparse approximations to value functions in reinforcement learning. In: Koprinkova-Hristova, 9999P., Mladenov, V., Kasabov, N.K. (Eds.), Artificial Neural Networks. Springer, Cham, pp. 295-314, 2015. [
DOI:10.1007/978-3-319-09903-3_14]
19. [19] Cui, Y., Matsubara, T., Sugimoto, K., Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states, Neural Netw. 94, 13-23, 2017. [
DOI:10.1016/j.neunet.2017.06.007]
20. [20] Ruan, A., Shi, A., Qin, L., Xu, S., and Zhao, Y., "A Reinforcement Learning-Based Markov-Decision Process (MDP) Implementation for SRAM FPGAs," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 10, pp. 2124-2128, 2020. [
DOI:10.1109/TCSII.2019.2943958]
21. [21] Howard, R.A., Dynamic Programming and Markov Processes. MIT Press, Cambridge, Massachusetts, 1960.
22. [22] Perkins, T.J. and D. Precup, A convergent form of approximate policy iteration. Proc. Int. Conf. Neural Information Processing Systems, p. 1595-1602, 2002.
23. [23] Koller, D. and R. Parr, Policy iteration for factored MDPs. The Sixteenth Conference on Uncertainty in Artificial Intelligence, p. 326-334, 2000.
24. [24] Hartman, E., Keeler, J. D., Kowalski, J. M, "Layered neural networks with Gaussian hidden units as universal approximations", Neural Computation, Vol. 2, No. 2, pp. 210-215, 1990. [
DOI:10.1162/neco.1990.2.2.210]
25. [25] R. M. Kretchmar and C. W. Anderson, "Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning," Proceedings of International Conference on Neural Networks (ICNN'97), Houston, TX, USA, pp. 834-837 vol.2, 1997.
26. [26] Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. arXiv preprint arXiv:1604.06778, 2016.
27. [27] Varga, B, Kulcsár, B, Chehreghani, MH. Deep Q-learning: A robust control approach Int J Robust Nonlinear Control. 33(1): 526- 54, 2023. [
DOI:10.1002/rnc.6457]
28. [28] Xin Xu, Lei Zuo, Zhenhua Huang, Reinforcement learning algorithms with function approximation: Recent advances and applications, Information Sciences, Volume 261, Pages 1-31, 2014. [
DOI:10.1016/j.ins.2013.08.037]
29. [29] second Annual Reinforcement Learning Competition http://rl-competition.org.
30. [30] André da Motta Salles Barreto, Charles W. Anderson, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Artificial Intelligence, Volume 172, Issues 4-5, Pages 454-482, 2008. [
DOI:10.1016/j.artint.2007.08.001]
31. [31] Snehal, N., Pooja, Sonam, W., K., Wagh, S. R., and Singh, N. M., Control of an Acrobot system using reinforcement learning with probabilistic policy search, Australian & New Zealand Control Conference, pp. 68-73, 2021. [
DOI:10.1109/ANZCC53563.2021.9628194]
32. [32] Lim, H. -K., Kim, J. -B., Ullah, I., Heo, J. -S., and Han, Y. -H., Federated Reinforcement Learning Acceleration Method for Precise Control of Multiple Devices, in IEEE Access, vol. 9, pp. 76296-76306, 2021. [
DOI:10.1109/ACCESS.2021.3083087]
33. [33] FRÄMLING, K., Light-weight reinforcement learning with function approximation for real-life control tasks, In: Proceedings of 5th International Conference on Informatics in Control, Automation and Robotics, Funchal, Madeira, Portugal, pp. 127-134, 2008.