1. A. J. Krener and W. Respondek, "Nonlinear observer with linearizable error dynamics," SIAM J. Control & Optim., vol. 23, pp. 197-216, 1985. [
DOI:10.1137/0323016]
2. R. S. Sutton, and A. G. Barto, Introduction to Reinforcement Learning: MIT Press, 1998.
3. L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of artificial intelligence research, vol. 4, pp. 237-285, 1996. [
DOI:10.1613/jair.301]
4. C. J. C. H. Watkins, "Learning from Delayed Rewards," King's College, Cambridge University, Cambridge, UK, 1989.
5. G. A. Rummery, and M. Niranjan, On-Line Q-Learning Using Connectionist Systems, Engineering Department, Cambridge University, Cambridge, UK 1994.
6. J. Kober, J. A. Bagnell, and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. [
DOI:10.1177/0278364913495721]
7. D. Vengerov, "A reinforcement learning approach to dynamic resource allocation," Engineering Applications of Artificial Intelligence, vol. 20, no. 3, pp. 383-390, 2007. [
DOI:10.1016/j.engappai.2006.06.019]
8. A. G. Barto, and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, 2003. [
DOI:10.1023/A:1025696116075]
9. R. S. Sutton, "Learning to Predict by the Methods of Temporal Differences," Machine Learning, vol. 3, no. 1, pp. 9-44, 1988. [
DOI:10.1007/BF00115009]
10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep Learning. Nature, 521(7553):436-444, 2015. [
DOI:10.1038/nature14539]
11. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives. IEEE Trans. on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013. [
DOI:10.1109/TPAMI.2013.50]
12. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari With Deep Reinforcement Learning,", In proceedings of NIPS Deep Learning Workshop, 2013.
13. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529- 533, 2015. [
DOI:10.1038/nature14236]
14. Christopher JCH Watkins and Peter Dayan. "Q-Learning" . Machine Learning, vol. 8 No. 3, pp. 279-292, 1992. [
DOI:10.1023/A:1022676722315]
15. Wang X, Deep reinforcement learning: case study with standard RL testing domains, Master's thesis, Technische Universiteit Eindhoven, Eindhoven, 2016
16. J. Peng, and R. J. Williams, "Incremental multi-step Q-learning," Machine Learning, vol. 22, no. 1-3, pp. 283-290, 1996. [
DOI:10.1007/BF00114731]
17. R. S. Sutton, A. M. David, P. S. Satinder, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," In Advances in Neural Information Processing Systems (NIPS) 12, pp. 1057--1063, 2000.
18. J. N. Tsitsiklis, and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE transactions on automatic control, vol. 42, no. 5, pp. 674-690, 1997. [
DOI:10.1109/9.580874]
19. Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993
20. van Hasselt, H.; Guez, A.; and Silver, D. Deep reinforcement learning with double Q-learning. In Proc. Of AAAI, pp. 2094-2100, 2016.
21. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized Experience Replay. In proceedings of ICLR, 2016.
22. Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling Network Architectures for Deep Reinforcement Learning. In proceedings of ICLR, 2016.
23. Leemon C Baird . Advantage Updating. Technical report, DTIC Document, 1993. [
DOI:10.21236/ADA280862]
24. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In proceedings of ICLR, 2016.
25. Hessel M., Modayil J., Van Hasselt H., Schaul T., Ostrovski G., Dabney W., Horgan D., Piot B., Azar M., and Silver D., Rainbow: Combining improvements in deep reinforcement learning. In proceedings of AAAI Conference on Artificial Intelligence (AAAI). 2018. [
DOI:10.1609/aaai.v33i01.33013796]
26. Alex Braylan, Mark Hollenbeck, Elliot Meyerson, and Risto Miikkulainen. Frame skip is a powerful parameter for learning to play atari. In proceedings of AAAI workshop, 2015.
27. Aravind S Lakshminarayanan, Sahil Sharma, and Balaraman Ravindran. Dynamic frame skip deep q network, In Proceedings of the Workshops at the International Joint Conference on Artificial Intelligence. New York, USA, 2016.
28. Martin Riedmiller. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In proceedings of ECML, pp. 317-328., 2005. [
DOI:10.1007/11564096_32]
29. Sascha Lange and Martin Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In proceedings of The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1-8., 2010. [
DOI:10.1109/IJCNN.2010.5596468]
30. https://github.com/deepmind/dqn, Accessed at august 2018.
31. Hinton Geoffrey, Lecture 6a: Overview of mini‐batch gradient descent, http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed at august 2018
32. Kovalev, Vassili & Kalinovsky, Alexander & Kovalev, Sergey. Deep Learning with Theano, Torch, Caffe, TensorFlow, and Deeplearning4J: Which One Is the Best in Speed and Accuracy?, In proceedings of The 13th International Conference on Pattern Recognition and Information Processing, 2016
33. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents. In proceeidings of IJCAI, 2015.