Volume 14, Issue 4 (Journal of Control, V.14, N.4 Winter 2021)                   JoC 2021, 14(4): 13-23 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Khoshroo S A, Khasteh S H. Increase the speed of the DQN learning process with the Eligibility Traces. JoC. 2021; 14 (4) :13-23
URL: http://joc.kntu.ac.ir/article-1-668-en.html
1- K. N. Toosi University of Technology
Abstract:   (2302 Views)
To accelerate the learning process in high-dimensional learning problems, the combination of TD techniques, such as Q-learning or SARSA, is usually used with the mechanism of Eligibility Traces. In the newly introduced DQN algorithm, it has been attempted to using deep neural networks in Q learning, to enable reinforcement learning algorithms to reach a greater understanding of the visual world and to address issues Spread in the past that was considered unbreakable. DQN, which is called a deep reinforcement learning algorithm, has a low learning speed. In this paper, we try to use the mechanism of Eligibility Traces, which is one of the basic methods in reinforcement learning, in combination with deep neural networks to improve the learning process speed. Also, for comparing the efficiency with the DQN algorithm, a number of Atari 2600 games were tested and the experimental results obtained showed that the proposed method significantly reduced learning time compared to the DQN algorithm and converges faster to the optimal model.
Full-Text [PDF 581 kb]   (274 Downloads)    
Type of Article: Research paper | Subject: General
Received: 2019/05/13 | Accepted: 2020/01/9 | ePublished ahead of print: 2020/10/5 | Published: 2021/02/19

References
1. A. J. Krener and W. Respondek, "Nonlinear observer with linearizable error dynamics," SIAM J. Control & Optim., vol. 23, pp. 197-216, 1985. [DOI:10.1137/0323016]
2. R. S. Sutton, and A. G. Barto, Introduction to Reinforcement Learning: MIT Press, 1998.
3. L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [DOI:10.1613/jair.301]
4. C. J. C. H. Watkins, "Learning from Delayed Rewards," King's College, Cambridge University, Cambridge, UK, 1989.
5. G. A. Rummery, and M. Niranjan, On-Line Q-Learning Using Connectionist Systems, Engineering Department, Cambridge University, Cambridge, UK 1994.
6. J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. [DOI:10.1177/0278364913495721]
7. D. Vengerov, "A reinforcement learning approach to dynamic resource allocation," Engineering Applications of Artificial Intelligence, vol. 20, no. 3, pp. 383-390, 2007. [DOI:10.1016/j.engappai.2006.06.019]
8. A. G. Barto, and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, 2003. [DOI:10.1023/A:1025696116075]
9. R. S. Sutton, "Learning to Predict by the Methods of Temporal Differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988. [DOI:10.1007/BF00115009]
10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep Learning. Nature, 521(7553):436-444, 2015. [DOI:10.1038/nature14539]
11. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013. [DOI:10.1109/TPAMI.2013.50]
12. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari With Deep Reinforcement Learning,", In proceedings of NIPS Deep Learning Workshop, 2013.
13. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529- 533, 2015. [DOI:10.1038/nature14236]
14. Christopher JCH Watkins and Peter Dayan. "Q-Learning" . Machine Learning, vol. 8 No. 3, pp. 279-292, 1992. [DOI:10.1023/A:1022676722315]
15. Wang X, Deep reinforcement learning: case study with standard RL testing domains, Master's thesis, Technische Universiteit Eindhoven, Eindhoven, 2016
16. J. Peng, and R. J. Williams, "Incremental multi-step Q-learning," Machine Learning, vol. 22, no. 1-3, pp. 283-290, 1996. [DOI:10.1007/BF00114731]
17. R. S. Sutton, A. M. David, P. S. Satinder, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," In Advances in Neural Information Processing Systems (NIPS) 12, pp. 1057--1063, 2000.
18. J. N. Tsitsiklis, and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE transactions on automatic control, vol. 42, no. 5, pp. 674-690, 1997. [DOI:10.1109/9.580874]
19. Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993
20. van Hasselt, H.; Guez, A.; and Silver, D. Deep reinforcement learning with double Q-learning. In Proc. Of AAAI, pp. 2094-2100, 2016.
21. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized Experience Replay. In proceedings of ICLR, 2016.
22. Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling Network Architectures for Deep Reinforcement Learning. In proceedings of ICLR, 2016.
23. Leemon C Baird . Advantage Updating. Technical report, DTIC Document, 1993. [DOI:10.21236/ADA280862]
24. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In proceedings of ICLR, 2016.
25. Hessel M., Modayil J., Van Hasselt H., Schaul T., Ostrovski G., Dabney W., Horgan D., Piot B., Azar M., and Silver D., Rainbow: Combining improvements in deep reinforcement learning. In proceedings of AAAI Conference on Artificial Intelligence (AAAI). 2018. [DOI:10.1609/aaai.v33i01.33013796]
26. Alex Braylan, Mark Hollenbeck, Elliot Meyerson, and Risto Miikkulainen. Frame skip is a powerful parameter for learning to play atari. In proceedings of AAAI workshop, 2015.
27. Aravind S Lakshminarayanan, Sahil Sharma, and Balaraman Ravindran. Dynamic frame skip deep q network, In Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence. New York, USA, 2016.
28. Martin Riedmiller. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In proceedings of ECML, pp. 317-328., 2005. [DOI:10.1007/11564096_32]
29. Sascha Lange and Martin Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In proceedings of The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1-8., 2010. [DOI:10.1109/IJCNN.2010.5596468]
30. https://github.com/deepmind/dqn, Accessed at august 2018.
31. Hinton Geoffrey, Lecture 6a: Overview of mini‐batch gradient descent, http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed at august 2018
32. Kovalev, Vassili & Kalinovsky, Alexander & Kovalev, Sergey. Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy?, In proceedings of The 13th International Conference on Pattern Recognition and Information Processing, 2016
33. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents. In proceeidings of IJCAI, 2015.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2021 CC BY-NC 4.0 | Journal of Control

Designed & Developed by : Yektaweb