افزایش سرعت فرایند یادگیری DQN با مکانیزم آثار شایستگی

خوشرو, سید علی; خواسته, سید حسین

doi:10.52547/joc.14.4.13

دوره 14، شماره 4 - ( مجله کنترل، جلد 14، شماره 4، زمستان 1399 ) جلد 14 شماره 4,1399 صفحات 23-13 | برگشت به فهرست نسخه ها

‎ 10.52547/joc.14.4.13

‎ 20.1001.1.20088345.1399.14.4.10.4

Mendeley

Zotero

RefWorks

Khoshroo S A, Khasteh S H. Increase the speed of the DQN learning process with the Eligibility Traces. JoC 2021; 14 (4) :13-23
URL: http://joc.kntu.ac.ir/article-1-668-fa.html

خوشرو سید علی، خواسته سید حسین. افزایش سرعت فرایند یادگیری DQN با مکانیزم آثار شایستگی. مجله کنترل. 1399; 14 (4) :13-23

URL: http://joc.kntu.ac.ir/article-1-668-fa.html

افزایش سرعت فرایند یادگیری DQN با مکانیزم آثار شایستگی

سید علی خوشرو¹

، سید حسین خواسته^*¹

1- گروه هوش مصنوعی،دانشکده برق و کامپیوتر، دانشگاه صنعتی خواجه نصیرالدین طوسی،تهران، ایران

چکیده: (7805 مشاهده)

برای سرعت بخشیدن به فرآیند یادگیری در مسائل یادگیری تقویتی با ابعاد بالا، معمولا از ترکیب روش‌های TD، مانند یادگیری Q یا سارسا، با مکانیزم آثار شایستگی، استفاده می‌شود. در الگوریتم شبکه عمیق Q (DQN)، که به تازگی معرفی شده، تلاش شده است که با استفاده از شبکه‌های عصبی عمیق در یادگیری Q، الگوریتم‌های یادگیری تقویتی را قادر سازد که به درک بالاتری از دنیای بصری رسیده و به مسائلی گسترش یابند که در گذشته رام‌نشدنی تلقی می‌شدند. DQN که یک الگوریتم یادگیری تقویتی عمیق خوانده می‌شود، از سرعت یادگیری پایینی برخوردار است. در این مقاله سعی می‌شود که از مکانیزم آثار شایستگی که یکی از روش‌های پایه‌ای در یادگیری تقویتی به حساب می‌آید، در یادگیری تقویتی در ترکیب با شبکه‌های عصبی عمیق استفاده شود تا سرعت فرایند یادگیری بهبود بخشیده شود. همچنین برای مقایسه کارایی با الگوریتم DQN، روی تعدادی از بازی‌های آتاری 2600، آزمایش انجام شد و نتایج تجربی به دست آمده در آنها نشان می‌دهند که روش ارائه شده، زمان یادگیری را در مقایسه با الگوریتم DQN، به طرز قابل توجهی کاهش داده و سریعتر به مدل مطلوب همگرا می‌شود.

واژه‌های کلیدی: شبکه‌های عصبی عمیق، Deep Q Network (DQN)، آثار شایستگی، یادگیری تقویتی عمیق.

متن کامل [PDF 581 kb] (3026 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: عمومى
دریافت: 1398/2/23 | پذیرش: 1398/10/19 | انتشار الکترونیک پیش از انتشار نهایی: 1399/7/14 | انتشار: 1399/12/1

فهرست منابع

1. A. J. Krener and W. Respondek, "Nonlinear observer with linearizable error dynamics," SIAM J. Control & Optim., vol. 23, pp. 197-216, 1985. [DOI:10.1137/0323016]

2. R. S. Sutton, and A. G. Barto, Introduction to Reinforcement Learning: MIT Press, 1998.

3. L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [DOI:10.1613/jair.301]

4. C. J. C. H. Watkins, "Learning from Delayed Rewards," King's College, Cambridge University, Cambridge, UK, 1989.

5. G. A. Rummery, and M. Niranjan, On-Line Q-Learning Using Connectionist Systems, Engineering Department, Cambridge University, Cambridge, UK 1994.

6. J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. [DOI:10.1177/0278364913495721]

7. D. Vengerov, "A reinforcement learning approach to dynamic resource allocation," Engineering Applications of Artificial Intelligence, vol. 20, no. 3, pp. 383-390, 2007. [DOI:10.1016/j.engappai.2006.06.019]

8. A. G. Barto, and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, 2003. [DOI:10.1023/A:1025696116075]

9. R. S. Sutton, "Learning to Predict by the Methods of Temporal Differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988. [DOI:10.1007/BF00115009]

10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep Learning. Nature, 521(7553):436-444, 2015. [DOI:10.1038/nature14539]

11. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013. [DOI:10.1109/TPAMI.2013.50]

12. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari With Deep Reinforcement Learning,", In proceedings of NIPS Deep Learning Workshop, 2013.

13. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529- 533, 2015. [DOI:10.1038/nature14236]

14. Christopher JCH Watkins and Peter Dayan. "Q-Learning" . Machine Learning, vol. 8 No. 3, pp. 279-292, 1992. [DOI:10.1023/A:1022676722315]

15. Wang X, Deep reinforcement learning: case study with standard RL testing domains, Master's thesis, Technische Universiteit Eindhoven, Eindhoven, 2016

16. J. Peng, and R. J. Williams, "Incremental multi-step Q-learning," Machine Learning, vol. 22, no. 1-3, pp. 283-290, 1996. [DOI:10.1007/BF00114731]

17. R. S. Sutton, A. M. David, P. S. Satinder, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," In Advances in Neural Information Processing Systems (NIPS) 12, pp. 1057--1063, 2000.

18. J. N. Tsitsiklis, and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE transactions on automatic control, vol. 42, no. 5, pp. 674-690, 1997. [DOI:10.1109/9.580874]

19. Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993

20. van Hasselt, H.; Guez, A.; and Silver, D. Deep reinforcement learning with double Q-learning. In Proc. Of AAAI, pp. 2094-2100, 2016.

21. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized Experience Replay. In proceedings of ICLR, 2016.

22. Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling Network Architectures for Deep Reinforcement Learning. In proceedings of ICLR, 2016.

23. Leemon C Baird . Advantage Updating. Technical report, DTIC Document, 1993. [DOI:10.21236/ADA280862]

24. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In proceedings of ICLR, 2016.

25. Hessel M., Modayil J., Van Hasselt H., Schaul T., Ostrovski G., Dabney W., Horgan D., Piot B., Azar M., and Silver D., Rainbow: Combining improvements in deep reinforcement learning. In proceedings of AAAI Conference on Artificial Intelligence (AAAI). 2018. [DOI:10.1609/aaai.v33i01.33013796]

26. Alex Braylan, Mark Hollenbeck, Elliot Meyerson, and Risto Miikkulainen. Frame skip is a powerful parameter for learning to play atari. In proceedings of AAAI workshop, 2015.

27. Aravind S Lakshminarayanan, Sahil Sharma, and Balaraman Ravindran. Dynamic frame skip deep q network, In Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence. New York, USA, 2016.

28. Martin Riedmiller. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In proceedings of ECML, pp. 317-328., 2005. [DOI:10.1007/11564096_32]

29. Sascha Lange and Martin Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In proceedings of The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1-8., 2010. [DOI:10.1109/IJCNN.2010.5596468]

30. https://github.com/deepmind/dqn, Accessed at august 2018.

31. Hinton Geoffrey, Lecture 6a: Overview of mini‐batch gradient descent, http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed at august 2018

32. Kovalev, Vassili & Kalinovsky, Alexander & Kovalev, Sergey. Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy?, In proceedings of The 13th International Conference on Pattern Recognition and Information Processing, 2016

33. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents. In proceeidings of IJCAI, 2015.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این وب سایت متعلق به مجله کنترل می باشد.

طراحی و برنامه نویسی : یکتاوب افزار شرق

Designed & Developed by : Yektaweb

پایگاه های مرتبط

کلمات کلیدی