To accelerate the learning process in high-dimensional learning problems, the combination of TD techniques, such as Q-learning or SARSA, is usually used with the mechanism of Eligibility Traces. In the newly introduced DQN algorithm, it has been attempted to using deep neural networks in Q learning, to enable reinforcement learning algorithms to reach a greater understanding of the visual world and to address issues Spread in the past that was considered unbreakable. DQN, which is called a deep reinforcement learning algorithm, has a low learning speed. In this paper, we try to use the mechanism of Eligibility Traces, which is one of the basic methods in reinforcement learning, in combination with deep neural networks to improve the learning process speed. Also, for comparing the efficiency with the DQN algorithm, a number of Atari 2600 games were tested and the experimental results obtained showed that the proposed method significantly reduced learning time compared to the DQN algorithm and converges faster to the optimal model.

Type of Article: Research paper |
Subject:
General

Received: 2019/05/13 | Accepted: 2020/01/9 | ePublished ahead of print: 2020/10/5 | Published: 2021/02/19

Received: 2019/05/13 | Accepted: 2020/01/9 | ePublished ahead of print: 2020/10/5 | Published: 2021/02/19

1. A. J. Krener and W. Respondek, "Nonlinear observer with linearizable error dynamics," SIAM J. Control & Optim., vol. 23, pp. 197-216, 1985. [DOI:10.1137/0323016]

2. R. S. Sutton, and A. G. Barto, Introduction to Reinforcement Learning: MIT Press, 1998.

3. L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [DOI:10.1613/jair.301]

4. C. J. C. H. Watkins, "Learning from Delayed Rewards," King's College, Cambridge University, Cambridge, UK, 1989.

5. G. A. Rummery, and M. Niranjan, On-Line Q-Learning Using Connectionist Systems, Engineering Department, Cambridge University, Cambridge, UK 1994.

6. J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. [DOI:10.1177/0278364913495721]

7. D. Vengerov, "A reinforcement learning approach to dynamic resource allocation," Engineering Applications of Artificial Intelligence, vol. 20, no. 3, pp. 383-390, 2007. [DOI:10.1016/j.engappai.2006.06.019]

8. A. G. Barto, and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, 2003. [DOI:10.1023/A:1025696116075]

9. R. S. Sutton, "Learning to Predict by the Methods of Temporal Differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988. [DOI:10.1007/BF00115009]

10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep Learning. Nature, 521(7553):436-444, 2015. [DOI:10.1038/nature14539]

11. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013. [DOI:10.1109/TPAMI.2013.50]

12. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari With Deep Reinforcement Learning,", In proceedings of NIPS Deep Learning Workshop, 2013.

13. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529- 533, 2015. [DOI:10.1038/nature14236]

14. Christopher JCH Watkins and Peter Dayan. "Q-Learning" . Machine Learning, vol. 8 No. 3, pp. 279-292, 1992. [DOI:10.1023/A:1022676722315]

15. Wang X, Deep reinforcement learning: case study with standard RL testing domains, Master's thesis, Technische Universiteit Eindhoven, Eindhoven, 2016

16. J. Peng, and R. J. Williams, "Incremental multi-step Q-learning," Machine Learning, vol. 22, no. 1-3, pp. 283-290, 1996. [DOI:10.1007/BF00114731]

17. R. S. Sutton, A. M. David, P. S. Satinder, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," In Advances in Neural Information Processing Systems (NIPS) 12, pp. 1057--1063, 2000.

18. J. N. Tsitsiklis, and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE transactions on automatic control, vol. 42, no. 5, pp. 674-690, 1997. [DOI:10.1109/9.580874]

19. Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993

20. van Hasselt, H.; Guez, A.; and Silver, D. Deep reinforcement learning with double Q-learning. In Proc. Of AAAI, pp. 2094-2100, 2016.

21. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized Experience Replay. In proceedings of ICLR, 2016.

22. Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling Network Architectures for Deep Reinforcement Learning. In proceedings of ICLR, 2016.

23. Leemon C Baird . Advantage Updating. Technical report, DTIC Document, 1993. [DOI:10.21236/ADA280862]

24. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In proceedings of ICLR, 2016.

25. Hessel M., Modayil J., Van Hasselt H., Schaul T., Ostrovski G., Dabney W., Horgan D., Piot B., Azar M., and Silver D., Rainbow: Combining improvements in deep reinforcement learning. In proceedings of AAAI Conference on Artificial Intelligence (AAAI). 2018. [DOI:10.1609/aaai.v33i01.33013796]

26. Alex Braylan, Mark Hollenbeck, Elliot Meyerson, and Risto Miikkulainen. Frame skip is a powerful parameter for learning to play atari. In proceedings of AAAI workshop, 2015.

27. Aravind S Lakshminarayanan, Sahil Sharma, and Balaraman Ravindran. Dynamic frame skip deep q network, In Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence. New York, USA, 2016.

28. Martin Riedmiller. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In proceedings of ECML, pp. 317-328., 2005. [DOI:10.1007/11564096_32]

29. Sascha Lange and Martin Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In proceedings of The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1-8., 2010. [DOI:10.1109/IJCNN.2010.5596468]

30. https://github.com/deepmind/dqn, Accessed at august 2018.

31. Hinton Geoffrey, Lecture 6a: Overview of mini‐batch gradient descent, http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed at august 2018

32. Kovalev, Vassili & Kalinovsky, Alexander & Kovalev, Sergey. Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy?, In proceedings of The 13th International Conference on Pattern Recognition and Information Processing, 2016

33. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents. In proceeidings of IJCAI, 2015.