حل زیربهینه بازی های گرافی دیفرانسیلی غیر خطی با استفاده از برنامه ریزی پویای تقریبی تک-شبکه

مازوچی, مجید; نقیبی سیستانی, محمد باقر; حسینی ثانی, سید کمال

doi:10.29252/joc.12.2.13

دوره 12، شماره 2 - ( مجله کنترل، جلد 12، شماره 2، تابستان 1397 ) جلد 12 شماره 2,1397 صفحات 25-13 | برگشت به فهرست نسخه ها

‎ 10.29252/joc.12.2.13

‎ 20.1001.1.20088345.1397.12.2.1.7

Mendeley

Zotero

RefWorks

Mazouchi M, Naghibi Sistani M B, Hosseini Sani S K. Suboptimal Solution of Nonlinear Graphical Games Using Single Network Approximate Dynamic Programming . JoC 2018; 12 (2) :13-25
URL: http://joc.kntu.ac.ir/article-1-382-fa.html

مازوچی مجید، نقیبی سیستانی محمد باقر، حسینی ثانی سید کمال. حل زیربهینه بازی های گرافی دیفرانسیلی غیر خطی با استفاده از برنامه ریزی پویای تقریبی تک-شبکه. مجله کنترل. 1397; 12 (2) :13-25

URL: http://joc.kntu.ac.ir/article-1-382-fa.html

حل زیربهینه بازی های گرافی دیفرانسیلی غیر خطی با استفاده از برنامه ریزی پویای تقریبی تک-شبکه

مجید مازوچی¹

، محمد باقر نقیبی سیستانی^*¹

، سید کمال حسینی ثانی¹

1- دانشگاه فردوسی مشهد

چکیده: (13492 مشاهده)

در ایﻦ ﻣﻘﺎﻟﻪ یﮏ اﻟﮕﻮریﺘﻢ یﺎدﮔﯿﺮی ﺑﺮﺧﻂ ﺑﺮﻣﺒﻨﺎی ﺑﺮﻧﺎﻣﻪ ریﺰی ﭘﻮیﺎی ﺗﻘﺮیﺒﯽ ﺗﮏ-ﺷﺒﮑﻪ ﺑﺮای ﺣﻞ ﺗﻘﺮیﺒﯽ ﺑﺎزی ﻫﺎی ﮔﺮاﻓﯽ دیﻔﺮاﻧﺴﯿﻠﯽ زﻣﺎن ﭘﯿﻮﺳﺘﻪ ﻏﯿﺮﺧﻄﯽ ﺑﺎ ﺗﺎﺑﻊ ﻫﺰیﻨﻪ زﻣﺎن ﻧﺎﻣﺤﺪود و دیﻨﺎﻣﯿﮏ ﻣﻌﯿﻦ ﭘﯿﺸﻨﻬﺎد ﺷﺪه اﺳﺖ. در ﺑﺎزی ﻫﺎی ﮔﺮاﻓﯽ دیﻔﺮاﻧﺴﯿﻠﯽ، ﻫﺪف ﻋﺎﻣﻞ ﻫﺎ ردیﺎﺑﯽ ﺣﺎﻟﺖ رﻫﺒﺮ ﺑﻪ ﺻﻮرت ﺑﻬﯿﻨﻪ ﻣﯽ ﺑﺎﺷﺪ، ﺑﻪ ﻃﻮری ﮐﻪ دیﻨﺎﻣﯿﮏ ﺧﻄﺎ و اﻧﺪیﺲ ﻋﻤﻠﮑﺮد ﻫﺮ ﻋﺎﻣﻞ ﺑﺴﺘﮕﯽ ﺑﻪ ﺗﻮﭘﻮﻟﻮژی ﮔﺮاف ﺗﻌﺎﻣﻠﯽ ﺑﺎزی دارد. در اﻟﮕﻮریﺘﻢ ﭘﯿﺸﻨﻬﺎدی، ﻫﺮ ﻋﺎﻣﻞ ﺗﻨﻬﺎ از یﮏ ﺷﺒﮑﻪ ﻋﺼﺒﯽ ﻧﻘﺎد ﺑﺮای ﺗﻘﺮیﺐ ارزش و ﺳﯿﺎﺳﺖ ﮐﻨﺘﺮﻟﯽ ﺑﻬﯿﻨﻪ ﺧﻮد اﺳﺘﻔﺎده ﻣﯽ ﮐﻨﺪ و از ﻗﻮاﻧﯿﻦ ﺗﻨﻈﯿﻢ وزن ﭘﯿﺸﻨﻬﺎد ﺷﺪه ﺑﺮای ﺑﻪ روزرﺳﺎﻧﯽ ﺑﺮﺧﻂ وزن ﻫﺎی ﺷﺒﮑﻪ ﻋﺼﺒﯽ ﻧﻘﺎد ﺧﻮد ﺑﻬﺮه ﻣﯽ ﺟﻮیﺪ. در ایﻦ ﻣﻘﺎﻟﻪ، ﺑﺎ ﻣﻌﺮﻓﯽ ﺳﻮﺋﯿﭻ ﻫﺎی ﭘﺎیﺪار ﺳﺎز ﻣﺤﻠﯽ در ﻗﻮاﻧﯿﻦ ﺗﻨﻈﯿﻢ وزن ﻫﺎی ﺷﺒﮑﻪ ﻋﺼﺒﯽ ﮐﻪ ﭘﺎیﺪاری ﺳﯿﺴﺘﻢ ﺣﻠﻘﻪ ﺑﺴﺘﻪ و ﻫﻤﮕﺮایﯽ ﺑﻪ ﺳﯿﺎﺳﺖ ﻫﺎی ﺗﻌﺎدل ﻧﺶ را ﺗﻀﻤﯿﻦ ﻣﯽ ﮐﻨﻨﺪ، دیﮕﺮ ﻧﯿﺎزی ﺑﻪ ﻣﺠﻤﻮﻋﻪ ﺳﯿﺎﺳﺖ ﻫﺎی ﮐﻨﺘﺮﻟﯽ ﭘﺎیﺪار ﺳﺎز اوﻟﯿﻪ وﺟﻮد ﻧﺪارد. ﺑﻌﻼوه در ایﻦ ﻣﻘﺎﻟﻪ از ﺗﺌﻮری ﻟﯿﺎﭘﺎﻧﻮف ﺑﺮای اﺛﺒﺎت ﭘﺎیﺪاری ﺳﯿﺴﺘﻢ ﺣﻠﻘﻪ ﺑﺴﺘﻪ اﺳﺘﻔﺎده ﻣﯽ ﺷﻮد. در ﭘﺎیﺎن، ﻣﺜﺎل ﺷﺒﯿﻪ ﺳﺎزی، ﻣﻮﺛﺮ ﺑﻮدن اﻟﮕﻮریﺘﻢ ﭘﯿﺸﻨﻬﺎدی را ﻧﺸﺎن ﻣﯽ دﻫﺪ

واژه‌های کلیدی: برنامه ریزی پویای تقریبی، شبکه های عصبی، کنترل بهینه، یادگیری تقویتی

متن کامل [PDF 457 kb] (4197 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: تخصصي
دریافت: 1395/3/27 | پذیرش: 1396/9/19 | انتشار: 1397/7/11

فهرست منابع

1. Olfati-Saber R. and Murray R. M., 2004, "Consensus problems in networks of agents with switching topology and time-delays," IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520–1533. [DOI:10.1109/TAC.2004.834113]

2. Ren W., Beard R. W. and Atkins E. M., 2005, "A survey of consensus problems in multi-agent coordination," in Proc. of the 2005 IEEE American Control Conference, pp. 1859–1864.

3. Olfati-Saber R., Alex Fax J. and Murray R. M., 2007, "Consensus and cooperation in networked multi-agent systems," in Proc. of the IEEE 2007, vol. 95, no. 1, pp. 215–233. [DOI:10.1109/JPROC.2006.887293]

4. Qu Z., Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer-Verlag, 2009.

5. Defoort M., Floquet T., Kokosy A., et al. 2008, "Sliding-mode formation control for cooperative autonomous mobile robots", IEEE Transactions on Industrial Electronics, vol. 55, no. 11, pp. 3944–3953. [DOI:10.1109/TIE.2008.2002717]

6. Lin W., 2014, "Distributed UAV formation control using differential game approach", Aerospace Science and Technology, vol. 35, pp. 54–62. [DOI:10.1016/j.ast.2014.02.004]

7. Beard, R. W. and Stepanyan, V., 2003, "Synchronization of information in distributed multiple vehicle coordination control". In Proc. of the IEEE conference on decision and control, Maui, HI, pp. 2029–2034.

8. Mu S., Chu T. and Wang L., 2005, "Coordinated collective motion in a motile particle group with a leader", Physica A, vol. 351, pp. 211–226. [DOI:10.1016/j.physa.2004.12.054]

9. Nasirian V., Davoudi A., and Lewis F. L., 2014 "Distributed adaptive droop control for DC Microgrids," in Proc. 29th IEEE Applied Power Electronics Conference and Exposition, pp. 1147–1152.

10. Rong L., Xu S. and Zhang B., 2012, "On the general second-order consensus protocol in multi-agent systems with input delays", Transactions of the Institute of Measurement and Control, vol. 34, no. 8, pp. 983–989. [DOI:10.1177/0142331211432950]

11. Xie D. and Chen J., 2013, "Consensus problem of data-sampled networked multi-agent systems with time-varying communication delays", Transactions of the Institute of Measurement and Control, vol. 35, no. 6, pp. 753–763. [DOI:10.1177/0142331212472223]

12. Zhang H., Lewis F. and Qu Z., 2012, "Lyapunov, adaptive, and optimal design techniques for cooperative systems on directed communication graphs", IEEE Transactions on Industrial Electronics, vol. 59, pp. 3026–3041. [DOI:10.1109/TIE.2011.2160140]

13. Ren W., Beard R. and Atkins E., 2007, "Information consensus in multi vehicle cooperative control", IEEE Control Systems, vol. 27, no.2, pp. 71–82. [DOI:10.1109/MCS.2007.338264]

14. Zhuand W. and Cheng D., 2010, "Leader-following consensus of second-order agents with multiple time-varying delays". Automatica 46(12): 1994–1999. [DOI:10.1016/j.automatica.2010.08.003]

15. Ren W., Moore K. and Chen Y., 2007, "High-order and model reference consensus algorithms in cooperative control of multi vehicle systems", Journal of Dynamic Systems, Measurement, and Control, vol. 129, no. 5, pp. 678–688. [DOI:10.1115/1.2764508]

16. Wang X. and Chen G., 2002, "Pinning control of scale-free dynamical networks", Physica A, vol. 310, no. 3–4, pp. 521–531. [DOI:10.1016/S0378-4371(02)00772-0]

17. Hong Y., Hu J. and Gao L., 2006, "Tracking control for multi-agent consensus with an active leader and variable topology", Automatica, vol. 42, no. 7, pp. 1177–1182. [DOI:10.1016/j.automatica.2006.02.013]

18. Li X., Wang X. and Chen G., 2004, "Pinning a complex dynamical network to its equilibrium", IEEE Transactions on Circuits and Systems, vol. 51, no.10, pp. 2074–2087. [DOI:10.1109/TCSI.2004.835655]

19. Tang Z., 2015, "Leader-following consensus with directed switching topologies", Transactions of the Institute of Measurement and Control, vol. 37, no. 3, pp. 406-413. [DOI:10.1177/0142331214540931]

20. Xie D., Yuan D., Lu J., et al., 2013, "Consensus control of second-order leader–follower multi-agent systems with event-triggered strategy", Transactions of the Institute of Measurement and Control, vol. 35, no.4, pp. 426–436. [DOI:10.1177/0142331212454046]

21. Başar, T. and Olsder, G. J., Classics in applied mathematics, Dynamic noncooperative game theory (2nd ed.). Philadelphia: SIAM, 1999.

22. Vamvoudakis, K. G., Lewis, F. L., and Hudas, G. R., 2012, "Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality", Automatica, vol. 48, no. 8, pp. 1598–1611. [DOI:10.1016/j.automatica.2012.05.074]

23. Sutton, R. S. and Barto, A. G., Reinforcement learning—an introduction. Cambridge, MA: MIT Press, 1998.

24. Sen, S. and Weiss, G., Learning in multi-agent systems, in multi-agent systems: a modern approach to distributed artificial intelligence. (pp. 259–298). Cambridge, MA: MIT Press, 1999.

25. Murray J.J., Cox C.J., Lendaris G.G., et al., 2002, "Adaptive dynamic programming", IEEE Transactions on Systems, Man, and Cybernetics, vol. 32, no. 2, pp. 140–153. [DOI:10.1109/TSMCC.2002.801727]

26. Wei, Q., Liu, D., and Lewis F. L., 2015, "Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games," Inform. Sci., vol. 317, pp. 96-113. [DOI:10.1016/j.ins.2015.04.044]

27. Jiao, Q., Modares, H., Xu, S., Lewis, F. L., and Vamvoudakis, K. G., 2016, "Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control," Automatica, vol. 69, pp. 24-34. [DOI:10.1016/j.automatica.2016.02.002]

28. Abouheaf M. I. and Lewis F. L., 2013, "Multi-agent differential graphical games: Nash online adaptive learning solutions", 52nd IEEE Conference on Decision and Control, pp. 5803-5809. [DOI:10.1109/CDC.2013.6760804]

29. Abouheaf M. I., Lewis F. L. and Mahmoud M. S., 2014, "Differential graphical games: Policy iteration solutions and coupled Riccati formulation", European Control Conference, pp.1594-1599.

30. Barto A.G., Sutton R.S. and Anderson C.W., 1983, "Neuronlike adaptive elements that can solve difficult learning control problems", IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, pp. 834–846. [DOI:10.1109/TSMC.1983.6313077]

31. Pao Y.H. and Philips S.M., 1995, "The functional link net learning optimal control", Neurocomputing vol. 9, pp. 149–164. [DOI:10.1016/0925-2312(95)00066-F]

32. Abu-Khalaf M. and Lewis F.L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach", Automatica, vol. 41, pp. 779–791. [DOI:10.1016/j.automatica.2004.11.034]

33. Modares, H., Lewis, F. L., and Naghibi-Sistani, M. B., 2014, "Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems," Automatica, vol. 50, no. 1, pp. 193-202. [DOI:10.1016/j.automatica.2013.09.043]

34. Tatari F., Naghibi-Sistani M. B., Vamvoudakis K. G., 2015, "Distributed Learning Algorithm for Nonlinear Differential Graphical Games," in Transactions of the Institute of Measurement and Control, doi: 10.1177/0142331215603791. [DOI:10.1177/0142331215603791]

35. Zhang H., Cui L. and Luo Y., 2013, "Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP", IEEE Transactions on Systems, Man, and Cybernetics, vol. 43, no. 1, pp. 206–216.

36. Dierks, T., and Jagannathan, S., 2010, "Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation," In: Proceedings of the 49th Decision and Control Conference. Atlanta, GA: IEEE, 3048 – 3053.

37. Lewis F. L., Vrabie D. and Syrmos V. L., Optimal Control. 3rd Edition. John Wiley, 2012. [DOI:10.1002/9781118122631]

38. Abu-Khalaf M., and Lewis F. L., 2005, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach". Automatica 41: 779–791. [DOI:10.1016/j.automatica.2004.11.034]

39. Finlayson B.A., The Method of Weighted Residuals and Variational Principles. New York: Academic Press, 1990.

40. Hornik K., Stinchcombe M. and White H., 1990, "Universal approximation of an unknown mapping and its derivatives using multi layer feedforward networks", Neural Networks, vol. 3, no. 5, pp. 551–560. [DOI:10.1016/0893-6080(90)90005-6]

41. Khalil H. K., Nonlinear System. Englewood Cliffs, NJ: Prentice-Hall, 1996.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این وب سایت متعلق به مجله کنترل می باشد.

طراحی و برنامه نویسی : یکتاوب افزار شرق

Designed & Developed by : Yektaweb

پایگاه های مرتبط

کلمات کلیدی