Volume 12, Issue 3 (Journal of Control, V.12, N.3 Fall 2018)                   JoC 2018, 12(3): 13-28 | Back to browse issues page


XML Persian Abstract Print


1- Electrical engineering department, Electrical and Computer engineering faculty, Semnan university, Semnan, Iran
2- Electrical Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
Abstract:   (8935 Views)

In this paper an online optimal distributed algorithm is introduced for multi-agent systems synchronization under unknown dynamics based on approximate dynamic programming and neural networks. Every agent has employed an actor-critic structure to learn its distributed optimal policy and the unknown dynamics of every agent is identified by employing a neural network approximator. The unknown dynamics are identified based on the experience replay technique where the recorded data and current data are used to adopt the approximators weights. The introduced algorithm learns the solution of coupled Hamilton-Jacobi equations under unknown dynamics in an online fashion. While the weights of the identifiers and actor-critic approximators are being tuned, the boundedness of the closed loop system signals are assured using Lyapunov theory. The effectiveness of the proposed algorithm is shown through the simulation results.

Full-Text [PDF 794 kb]   (2889 Downloads)    
Type of Article: Research paper | Subject: Special
Received: 2017/07/2 | Accepted: 2018/04/21 | Published: 2019/04/28

References
1. [1] Hong Y., Hu J., Gao L., 2006 "Tracking control for multi-agent consensus with an active leader and variable topology," Automatica, 42 (7), 1177-1182. [DOI:10.1016/j.automatica.2006.02.013]
2. [2] Ren W., Moore K., Chen Y., 2007, "High-order and model reference consensus algorithms in cooperative control of multivehicle systems," J. Dynam. Syst., Meas., Control, 129(5), 678-688. [DOI:10.1115/1.2764508]
3. [3] Wang X., Chen G., 2002, "Pinning control of scale-free dynamical networks," Physica A, 310(3-4), 521-531. [DOI:10.1016/S0378-4371(02)00772-0]
4. [4] Wu Y., Meng X., Xie L., Lu R., Su H., Wu Z. G., 2017, "An input-based triggering approach to leader-following problems," Automatica, 75, 221-228. [DOI:10.1016/j.automatica.2016.09.040]
5. [5] Zhang D., Xu Z., Wang Q. G., Zhao Y. B., 2017, "Leader-follower consensus of linear multi-agent systems with aperiodic sampling and switching connected topologies," ISA Transactions, 68, 150-159. [DOI:10.1016/j.isatra.2017.01.001]
6. [6] Wang B., Wang J., Zhang B., Lin H., Li X., Wang H., 2016, "Leader-follower consensus for multi-agent systems with three-layer network framework and dynamic interaction jointly connected topology," Neurocomputing, 207 (26), 231-239. [DOI:10.1016/j.neucom.2016.03.073]
7. [7] Han T., Guan Z., Chi M., Hu B., Li T., Zhang X., 2017, "Multi-formation control of nonlinear leader-following multi-agent systems," ISA Transactions, DOI: 10.1016/j.isatra.2017.05.003. [DOI:10.1016/j.isatra.2017.05.003]
8. [8] Semsar-Kazerooni E., Khorasani K., 2009, "Multi-agent team cooperation: A game theory approach," Automatica, 45, 2205-2213. [DOI:10.1016/j.automatica.2009.06.006]
9. [9] Mao D., He Y., Ye X., Yu M., 2011, "Inverse optimal stabilization of cooperative control in networked multi-agent systems," Control and Decision Conference (CCDC), 1031 - 1037. [DOI:10.1109/CCDC.2011.5968336]
10. [10] Tijs S., "Introduction to Game Theory," India: Hindustan Book Agency, 2003.
11. [11] Isaacs R., "Differential Games," New York, Wiley, 1965.
12. [12] Tolwinski B., Havrie A., Leimann G., 1986, "Cooperative equilibrium in differential games," Journal of Mathematical Analysis and Applications, 119, 182-202. [DOI:10.1016/0022-247X(86)90152-6]
13. [13] Esparza L. G., Torres G. M., Saynes Torres L. M., 2013, "A brief introduction to differential games," International Journal of Physical and Mathematical Sciences, 4(1), 396-411.
14. [14] Başar T., Olsder G. "Dynamic Non-cooperative Game Theory," 2nd edition, Classics in Applied Mathematics. SIAM: Philadelphia, 1999.
15. [15] Freiling G., Jank G., Abou-Kandil H., 2002, "On global existence of Solutions to Coupled Matrix Riccati equations in closed loop Nash Games," IEEE Transactions on Automatic Control, 41(2), 264- 269. [DOI:10.1109/9.481532]
16. [16] Gajic Z., Li T., 1988, "Simulation results for two new algorithms for solving coupled algebraic Riccati equations," Third Int. Symp. On Differential Games. Sophia, Antipolis, France.
17. [17] Sutton R., Barto A., "Reinforcement Learning-An Introduction," Massachusetts: Cambridge, MIT Press, 1998. [DOI:10.1109/TNN.1998.712192]
18. [18] Werbos P., "Approximate dynamic programming for real-time control and neural modeling Handbook of Intelligent Control," Ed. D.A. White and D.A. Sofge, New York: Van Nostrand Reinhold, 1992.
19. [19] Vrabie D., Pastravanu O., Lewis F. L., Abu-Khalaf M., 2009, "Adaptive optimal control for continuous-time linear systems based on policy iteration," Automatica, 45(2), 477-484. [DOI:10.1016/j.automatica.2008.08.017]
20. [20] Vamvoudakis K., Lewis F.L., 2011, "Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations," Automatica, 47, 1556-1559. [DOI:10.1016/j.automatica.2011.03.005]
21. [21] Vrabie D., Lewis F., 2010, "Integral Reinforcement Learning for Online Computation of Feedback Nash Strategies of Nonzero-Sum Differential Games," 49th IEEE Conference on Decision and Control, Atlanta, GA, USA. [DOI:10.1109/CDC.2010.5718152]
22. [22] Vrabie D., Lewis F.L., 2011, "Integral reinforcement learning for finding online the feedback Nash equilibrium of Nonzero-sum differential games," Advances in Reinforcement Learning, Intech, 2011.
23. [23] Vamvoudakis K. G., Lewis F. L., Hudas G. R., 2012, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality," Automatica, 48, 1598-1611. [DOI:10.1016/j.automatica.2012.05.074]
24. [24] Abouheaf M. I., Lewis F. L., 2013, Multi-Agent Differential Graphical Games: Nash Online Adaptive Learning Solutions, 52nd IEEE Conference on Decision and Control, Florence, Italy. [DOI:10.1109/CDC.2013.6760804]
25. [25] Tatari F., Naghibi-Sistani M. B., Vamvoudakis K. G., 2017, "Distributed Optimal Synchronization Control of Linear Networked Systems under Unknown Dynamics," Proc. American Control Conference, 668-673, Seattle, WA. [DOI:10.23919/ACC.2017.7963029]
26. [26] Tatari F., Naghibi-S M., 2015, "Distributed Optimal Control of Nonlinear Differential Graphical Games based on Reinforcement Learning," Journal of Control, 8 (4),15-30.
27. [27] J. Li, H. Modares, T.Chai, F. L. Lewis, L. Xie, 2017, " Off-policy reinforcement learning for synchronization in multiagent graphical games," IEEE transactions on neural networks and learning systems, 28(10), 2434 - 2445. [DOI:10.1109/TNNLS.2016.2609500]
28. [28] Kyriakos G. Vamvoudakis, 2017 "Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics," International Journal of Robust and Nonlinear Control, 27(16), 2900-2920. [DOI:10.1002/rnc.3719]
29. [29] Vamvoudakis K., Lewis F. L., 2011, "Online actor-critic algorithm to solve continuous-time infinite horizon optimal control problem," Automatica, 46, 787-788.
30. [30] Modares H., Lewis F.L., Naghibi-Sistani M.B., 2013, "Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks," IEEE Transactions on neural networks and learning systems, 24(10), 1513-1525. [DOI:10.1109/TNNLS.2013.2276571]
31. [31] Zhang H., Cui L., Luo Y., 2013, "Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP," IEEE Trans. Cybern, 43, 206-216. [DOI:10.1109/TSMCB.2012.2203336]
32. [32] Abu-Khalaf M., Lewis F. L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, 41, 779-791. [DOI:10.1016/j.automatica.2004.11.034]
33. [33] Finlayson B. A., "The method of weighted residuals and variational principles," New York: Academic Press, 1990.
34. [34] Hornik K., Stinchcombe M., White H., 1990, "Universal approximation of an unknown mapping and its derivatives using multi layer feedforward networks," Neural Networks, 3(5), 551-560. [DOI:10.1016/0893-6080(90)90005-6]
35. [35] Hardy G., Littlewood J., Polya G., "Inequalities," 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1989.
36. [36] Khalil H. K., "Nonlinear systems," Prentice-Hall, 1996

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.