همزمانسازی بهینه برخط سیستم های چندعاملی غیر خطی با دینامیک های نامعلوم

تاتاری, فرزانه; نقیبی سیستانی, محمدباقر

doi:10.29252/joc.12.3.13

دوره 12، شماره 3 - ( مجله کنترل، جلد 12، شماره 3، پاییز 1397 ) جلد 12 شماره 3,1397 صفحات 28-13 | برگشت به فهرست نسخه ها

‎ 10.29252/joc.12.3.13

‎ 20.1001.1.20088345.1397.12.3.6.4

Mendeley

Zotero

RefWorks

Tatari F, Naghibi-S. M. Online Optimal Synchronization of Nonlinear Multi-agent Systems under Unknown Dynamics. JoC 2018; 12 (3) :13-28
URL: http://joc.kntu.ac.ir/article-1-497-fa.html

تاتاری فرزانه، نقیبی سیستانی محمدباقر. همزمانسازی بهینه برخط سیستم های چندعاملی غیر خطی با دینامیک های نامعلوم . مجله کنترل. 1397; 12 (3) :13-28

URL: http://joc.kntu.ac.ir/article-1-497-fa.html

همزمانسازی بهینه برخط سیستم های چندعاملی غیر خطی با دینامیک های نامعلوم

فرزانه تاتاری^*¹

، محمدباقر نقیبی سیستانی²

1- سمنان، دانشگاه سمنان، دانشکده مهندسی برق و کامپیوتر، گروه مهندسی برق
2- مشهد، دانشگاه فردوسی مشهد، دانشکده مهندسی، گروه مهندسی برق

چکیده: (11136 مشاهده)

در این مقاله، الگوریتم بهینه توزیع شده تطبیقی برخط برای همزمانسازی عامل های غیرخطی یک سیستم چندعاملی با دینامیک های نامعلوم به عامل رهبر بر اساس تکنیک های برنامه ریزی پویای تقریبی و شناساگرهای شبکه های عصبی ارایه شده‏ است. الگوریتم پیشنهاد شده به یادگیری حل برخط معادلات همیلتون-جاکوبی تزویج شده[1] (CHJ) تحت دینامیک های نامعلوم پرداخته است. هر عامل جهت یادگیری سیاست بهینه محلی از ساختار عملگر-نقاد بهره برده و دینامیک نامعلوم هر عامل نیز با به کارگیری یک تقریبگر شبکه عصبی، تقریب زده شده است. شناسایی دینامیک های نامعلوم با استفاده از قانون تکرار تجربیات انجام شده است به طوری که از اطلاعات ثبت شده به همراه داده های لحظه ای برای انطباق وزن های شبکه عصبی شناساگر دینامیک عامل ها، استفاده شده است. در حالی که وزن های تقریبگرهای دینامیک و شبکه های عملگر-نقاد به صورت همزمان در حال انطباق هستند، کرانداری تمامی سیگنال های حلقه بسته توسط تئوری لیاپانوف تضمین شده است. در انتها صحت الگوریتم پیشنهاد شده با ذکر نتایج شبیه سازی، نشان داده شده است.

[1] Coupled Hamilton-Jacobi

واژه‌های کلیدی: برنامه ریزی پویای تقریبی، تقریبگرهای عملگر-نقاد، سیستم های چندعاملی، کنترل بهینه توزیع شده، همزمانسازی.

متن کامل [PDF 794 kb] (3647 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: تخصصي
دریافت: 1396/4/11 | پذیرش: 1397/2/1 | انتشار: 1398/2/8

فهرست منابع

1. [1] Hong Y., Hu J., Gao L., 2006 "Tracking control for multi-agent consensus with an active leader and variable topology," Automatica, 42 (7), 1177-1182. [DOI:10.1016/j.automatica.2006.02.013]

2. [2] Ren W., Moore K., Chen Y., 2007, "High-order and model reference consensus algorithms in cooperative control of multivehicle systems," J. Dynam. Syst., Meas., Control, 129(5), 678-688. [DOI:10.1115/1.2764508]

3. [3] Wang X., Chen G., 2002, "Pinning control of scale-free dynamical networks," Physica A, 310(3-4), 521-531. [DOI:10.1016/S0378-4371(02)00772-0]

4. [4] Wu Y., Meng X., Xie L., Lu R., Su H., Wu Z. G., 2017, "An input-based triggering approach to leader-following problems," Automatica, 75, 221-228. [DOI:10.1016/j.automatica.2016.09.040]

5. [5] Zhang D., Xu Z., Wang Q. G., Zhao Y. B., 2017, "Leader-follower consensus of linear multi-agent systems with aperiodic sampling and switching connected topologies," ISA Transactions, 68, 150-159. [DOI:10.1016/j.isatra.2017.01.001]

6. [6] Wang B., Wang J., Zhang B., Lin H., Li X., Wang H., 2016, "Leader-follower consensus for multi-agent systems with three-layer network framework and dynamic interaction jointly connected topology," Neurocomputing, 207 (26), 231-239. [DOI:10.1016/j.neucom.2016.03.073]

7. [7] Han T., Guan Z., Chi M., Hu B., Li T., Zhang X., 2017, "Multi-formation control of nonlinear leader-following multi-agent systems," ISA Transactions, DOI: 10.1016/j.isatra.2017.05.003. [DOI:10.1016/j.isatra.2017.05.003]

8. [8] Semsar-Kazerooni E., Khorasani K., 2009, "Multi-agent team cooperation: A game theory approach," Automatica, 45, 2205-2213. [DOI:10.1016/j.automatica.2009.06.006]

9. [9] Mao D., He Y., Ye X., Yu M., 2011, "Inverse optimal stabilization of cooperative control in networked multi-agent systems," Control and Decision Conference (CCDC), 1031 - 1037. [DOI:10.1109/CCDC.2011.5968336]

10. [10] Tijs S., "Introduction to Game Theory," India: Hindustan Book Agency, 2003.

11. [11] Isaacs R., "Differential Games," New York, Wiley, 1965.

12. [12] Tolwinski B., Havrie A., Leimann G., 1986, "Cooperative equilibrium in differential games," Journal of Mathematical Analysis and Applications, 119, 182-202. [DOI:10.1016/0022-247X(86)90152-6]

13. [13] Esparza L. G., Torres G. M., Saynes Torres L. M., 2013, "A brief introduction to differential games," International Journal of Physical and Mathematical Sciences, 4(1), 396-411.

14. [14] Başar T., Olsder G. "Dynamic Non-cooperative Game Theory," 2nd edition, Classics in Applied Mathematics. SIAM: Philadelphia, 1999.

15. [15] Freiling G., Jank G., Abou-Kandil H., 2002, "On global existence of Solutions to Coupled Matrix Riccati equations in closed loop Nash Games," IEEE Transactions on Automatic Control, 41(2), 264- 269. [DOI:10.1109/9.481532]

16. [16] Gajic Z., Li T., 1988, "Simulation results for two new algorithms for solving coupled algebraic Riccati equations," Third Int. Symp. On Differential Games. Sophia, Antipolis, France.

17. [17] Sutton R., Barto A., "Reinforcement Learning-An Introduction," Massachusetts: Cambridge, MIT Press, 1998. [DOI:10.1109/TNN.1998.712192]

18. [18] Werbos P., "Approximate dynamic programming for real-time control and neural modeling Handbook of Intelligent Control," Ed. D.A. White and D.A. Sofge, New York: Van Nostrand Reinhold, 1992.

19. [19] Vrabie D., Pastravanu O., Lewis F. L., Abu-Khalaf M., 2009, "Adaptive optimal control for continuous-time linear systems based on policy iteration," Automatica, 45(2), 477-484. [DOI:10.1016/j.automatica.2008.08.017]

20. [20] Vamvoudakis K., Lewis F.L., 2011, "Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations," Automatica, 47, 1556-1559. [DOI:10.1016/j.automatica.2011.03.005]

21. [21] Vrabie D., Lewis F., 2010, "Integral Reinforcement Learning for Online Computation of Feedback Nash Strategies of Nonzero-Sum Differential Games," 49th IEEE Conference on Decision and Control, Atlanta, GA, USA. [DOI:10.1109/CDC.2010.5718152]

22. [22] Vrabie D., Lewis F.L., 2011, "Integral reinforcement learning for finding online the feedback Nash equilibrium of Nonzero-sum differential games," Advances in Reinforcement Learning, Intech, 2011.

23. [23] Vamvoudakis K. G., Lewis F. L., Hudas G. R., 2012, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality," Automatica, 48, 1598-1611. [DOI:10.1016/j.automatica.2012.05.074]

24. [24] Abouheaf M. I., Lewis F. L., 2013, Multi-Agent Differential Graphical Games: Nash Online Adaptive Learning Solutions, 52nd IEEE Conference on Decision and Control, Florence, Italy. [DOI:10.1109/CDC.2013.6760804]

25. [25] Tatari F., Naghibi-Sistani M. B., Vamvoudakis K. G., 2017, "Distributed Optimal Synchronization Control of Linear Networked Systems under Unknown Dynamics," Proc. American Control Conference, 668-673, Seattle, WA. [DOI:10.23919/ACC.2017.7963029]

26. [26] Tatari F., Naghibi-S M., 2015, "Distributed Optimal Control of Nonlinear Differential Graphical Games based on Reinforcement Learning," Journal of Control, 8 (4),15-30.

27. [27] J. Li, H. Modares, T.Chai, F. L. Lewis, L. Xie, 2017, " Off-policy reinforcement learning for synchronization in multiagent graphical games," IEEE transactions on neural networks and learning systems, 28(10), 2434 - 2445. [DOI:10.1109/TNNLS.2016.2609500]

28. [28] Kyriakos G. Vamvoudakis, 2017 "Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics," International Journal of Robust and Nonlinear Control, 27(16), 2900-2920. [DOI:10.1002/rnc.3719]

29. [29] Vamvoudakis K., Lewis F. L., 2011, "Online actor-critic algorithm to solve continuous-time infinite horizon optimal control problem," Automatica, 46, 787-788.

30. [30] Modares H., Lewis F.L., Naghibi-Sistani M.B., 2013, "Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks," IEEE Transactions on neural networks and learning systems, 24(10), 1513-1525. [DOI:10.1109/TNNLS.2013.2276571]

31. [31] Zhang H., Cui L., Luo Y., 2013, "Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP," IEEE Trans. Cybern, 43, 206-216. [DOI:10.1109/TSMCB.2012.2203336]

32. [32] Abu-Khalaf M., Lewis F. L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, 41, 779-791. [DOI:10.1016/j.automatica.2004.11.034]

33. [33] Finlayson B. A., "The method of weighted residuals and variational principles," New York: Academic Press, 1990.

34. [34] Hornik K., Stinchcombe M., White H., 1990, "Universal approximation of an unknown mapping and its derivatives using multi layer feedforward networks," Neural Networks, 3(5), 551-560. [DOI:10.1016/0893-6080(90)90005-6]

35. [35] Hardy G., Littlewood J., Polya G., "Inequalities," 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1989.

36. [36] Khalil H. K., "Nonlinear systems," Prentice-Hall, 1996

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این وب سایت متعلق به مجله کنترل می باشد.

طراحی و برنامه نویسی : یکتاوب افزار شرق

Designed & Developed by : Yektaweb

پایگاه های مرتبط

کلمات کلیدی