In this paper, an online learning algorithm based on approximate dynamic programming is proposed to approximately solve the nonlinear continuous time differential graphical games with infinite horizon cost functions and known dynamics. In the proposed algorithm, every agent employs a critic neural network (NN) to approximate its optimal value and control policy and utilizes the proposed weight tuning laws to learn its critic NN optimal weights in an online fashion. Critic NN weight tuning laws containing a stabilizer switch guarantees the closed-loop system stability and the control policies convergence to the Nash equilibrium. In this algorithm, there is no requirement for any set of initial stabilizing control policies anymore. Furthermore, Lyapunov theory is employed to show uniform ultimate boundedness of the closedloop system. Finally, a simulation example is presented to illustrate the efficiency of the proposed algorithm.

Type of Article: Research paper |
Subject:
Special

Received: 2016/06/16 | Accepted: 2017/12/10 | Published: 2018/10/3

Received: 2016/06/16 | Accepted: 2017/12/10 | Published: 2018/10/3

1. Olfati-Saber R. and Murray R. M., 2004, "Consensus problems in networks of agents with switching topology and time-delays," IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520–1533. [DOI:10.1109/TAC.2004.834113]

2. Ren W., Beard R. W. and Atkins E. M., 2005, "A survey of consensus problems in multi-agent coordination," in Proc. of the 2005 IEEE American Control Conference, pp. 1859–1864.

3. Olfati-Saber R., Alex Fax J. and Murray R. M., 2007, "Consensus and cooperation in networked multi-agent systems," in Proc. of the IEEE 2007, vol. 95, no. 1, pp. 215–233. [DOI:10.1109/JPROC.2006.887293]

4. Qu Z., Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer-Verlag, 2009.

5. Defoort M., Floquet T., Kokosy A., et al. 2008, "Sliding-mode formation control for cooperative autonomous mobile robots", IEEE Transactions on Industrial Electronics, vol. 55, no. 11, pp. 3944–3953. [DOI:10.1109/TIE.2008.2002717]

6. Lin W., 2014, "Distributed UAV formation control using differential game approach", Aerospace Science and Technology, vol. 35, pp. 54–62. [DOI:10.1016/j.ast.2014.02.004]

7. Beard, R. W. and Stepanyan, V., 2003, "Synchronization of information in distributed multiple vehicle coordination control". In Proc. of the IEEE conference on decision and control, Maui, HI, pp. 2029–2034.

8. Mu S., Chu T. and Wang L., 2005, "Coordinated collective motion in a motile particle group with a leader", Physica A, vol. 351, pp. 211–226. [DOI:10.1016/j.physa.2004.12.054]

9. Nasirian V., Davoudi A., and Lewis F. L., 2014 "Distributed adaptive droop control for DC Microgrids," in Proc. 29th IEEE Applied Power Electronics Conference and Exposition, pp. 1147–1152.

10. Rong L., Xu S. and Zhang B., 2012, "On the general second-order consensus protocol in multi-agent systems with input delays", Transactions of the Institute of Measurement and Control, vol. 34, no. 8, pp. 983–989. [DOI:10.1177/0142331211432950]

11. Xie D. and Chen J., 2013, "Consensus problem of data-sampled networked multi-agent systems with time-varying communication delays", Transactions of the Institute of Measurement and Control, vol. 35, no. 6, pp. 753–763. [DOI:10.1177/0142331212472223]

12. Zhang H., Lewis F. and Qu Z., 2012, "Lyapunov, adaptive, and optimal design techniques for cooperative systems on directed communication graphs", IEEE Transactions on Industrial Electronics, vol. 59, pp. 3026–3041. [DOI:10.1109/TIE.2011.2160140]

13. Ren W., Beard R. and Atkins E., 2007, "Information consensus in multi vehicle cooperative control", IEEE Control Systems, vol. 27, no.2, pp. 71–82. [DOI:10.1109/MCS.2007.338264]

14. Zhuand W. and Cheng D., 2010, "Leader-following consensus of second-order agents with multiple time-varying delays". Automatica 46(12): 1994–1999. [DOI:10.1016/j.automatica.2010.08.003]

15. Ren W., Moore K. and Chen Y., 2007, "High-order and model reference consensus algorithms in cooperative control of multi vehicle systems", Journal of Dynamic Systems, Measurement, and Control, vol. 129, no. 5, pp. 678–688. [DOI:10.1115/1.2764508]

16. Wang X. and Chen G., 2002, "Pinning control of scale-free dynamical networks", Physica A, vol. 310, no. 3–4, pp. 521–531. [DOI:10.1016/S0378-4371(02)00772-0]

17. Hong Y., Hu J. and Gao L., 2006, "Tracking control for multi-agent consensus with an active leader and variable topology", Automatica, vol. 42, no. 7, pp. 1177–1182. [DOI:10.1016/j.automatica.2006.02.013]

18. Li X., Wang X. and Chen G., 2004, "Pinning a complex dynamical network to its equilibrium", IEEE Transactions on Circuits and Systems, vol. 51, no.10, pp. 2074–2087. [DOI:10.1109/TCSI.2004.835655]

19. Tang Z., 2015, "Leader-following consensus with directed switching topologies", Transactions of the Institute of Measurement and Control, vol. 37, no. 3, pp. 406-413. [DOI:10.1177/0142331214540931]

20. Xie D., Yuan D., Lu J., et al., 2013, "Consensus control of second-order leader–follower multi-agent systems with event-triggered strategy", Transactions of the Institute of Measurement and Control, vol. 35, no.4, pp. 426–436. [DOI:10.1177/0142331212454046]

21. Başar, T. and Olsder, G. J., Classics in applied mathematics, Dynamic noncooperative game theory (2nd ed.). Philadelphia: SIAM, 1999.

22. Vamvoudakis, K. G., Lewis, F. L., and Hudas, G. R., 2012, "Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality", Automatica, vol. 48, no. 8, pp. 1598–1611. [DOI:10.1016/j.automatica.2012.05.074]

23. Sutton, R. S. and Barto, A. G., Reinforcement learning—an introduction. Cambridge, MA: MIT Press, 1998.

24. Sen, S. and Weiss, G., Learning in multi-agent systems, in multi-agent systems: a modern approach to distributed artificial intelligence. (pp. 259–298). Cambridge, MA: MIT Press, 1999.

25. Murray J.J., Cox C.J., Lendaris G.G., et al., 2002, "Adaptive dynamic programming", IEEE Transactions on Systems, Man, and Cybernetics, vol. 32, no. 2, pp. 140–153. [DOI:10.1109/TSMCC.2002.801727]

26. Wei, Q., Liu, D., and Lewis F. L., 2015, "Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games," Inform. Sci., vol. 317, pp. 96-113. [DOI:10.1016/j.ins.2015.04.044]

27. Jiao, Q., Modares, H., Xu, S., Lewis, F. L., and Vamvoudakis, K. G., 2016, "Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control," Automatica, vol. 69, pp. 24-34. [DOI:10.1016/j.automatica.2016.02.002]

28. Abouheaf M. I. and Lewis F. L., 2013, "Multi-agent differential graphical games: Nash online adaptive learning solutions", 52nd IEEE Conference on Decision and Control, pp. 5803-5809. [DOI:10.1109/CDC.2013.6760804]

29. Abouheaf M. I., Lewis F. L. and Mahmoud M. S., 2014, "Differential graphical games: Policy iteration solutions and coupled Riccati formulation", European Control Conference, pp.1594-1599.

30. Barto A.G., Sutton R.S. and Anderson C.W., 1983, "Neuronlike adaptive elements that can solve difficult learning control problems", IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, pp. 834–846. [DOI:10.1109/TSMC.1983.6313077]

31. Pao Y.H. and Philips S.M., 1995, "The functional link net learning optimal control", Neurocomputing vol. 9, pp. 149–164. [DOI:10.1016/0925-2312(95)00066-F]

32. Abu-Khalaf M. and Lewis F.L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach", Automatica, vol. 41, pp. 779–791. [DOI:10.1016/j.automatica.2004.11.034]

33. Modares, H., Lewis, F. L., and Naghibi-Sistani, M. B., 2014, "Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems," Automatica, vol. 50, no. 1, pp. 193-202. [DOI:10.1016/j.automatica.2013.09.043]

34. Tatari F., Naghibi-Sistani M. B., Vamvoudakis K. G., 2015, "Distributed Learning Algorithm for Nonlinear Differential Graphical Games," in Transactions of the Institute of Measurement and Control, doi: 10.1177/0142331215603791. [DOI:10.1177/0142331215603791]

35. Zhang H., Cui L. and Luo Y., 2013, "Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP", IEEE Transactions on Systems, Man, and Cybernetics, vol. 43, no. 1, pp. 206–216.

36. Dierks, T., and Jagannathan, S., 2010, "Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation," In: Proceedings of the 49th Decision and Control Conference. Atlanta, GA: IEEE, 3048 – 3053.

37. Lewis F. L., Vrabie D. and Syrmos V. L., Optimal Control. 3rd Edition. John Wiley, 2012. [DOI:10.1002/9781118122631]

38. Abu-Khalaf M., and Lewis F. L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica 41: 779–791. [DOI:10.1016/j.automatica.2004.11.034]

39. Finlayson B.A., The Method of Weighted Residuals and Variational Principles. New York: Academic Press, 1990.

40. Hornik K., Stinchcombe M. and White H., 1990, "Universal approximation of an unknown mapping and its derivatives using multi layer feedforward networks", Neural Networks, vol. 3, no. 5, pp. 551–560. [DOI:10.1016/0893-6080(90)90005-6]

41. Khalil H. K., Nonlinear System. Englewood Cliffs, NJ: Prentice-Hall, 1996.