Journal of Control

fa روشی نوین برای یادگیری تقویتی فازیِ باناظر برای ناوبری ربات A Novel Supervised Fuzzy Reinforcement Learning for Robot Navigation تخصصي Special پژوهشي Research paper : استفاده از یادگیری باناظر در ناوبری ربات های متحرک، با چالش های جدی از قبیل ناسازگاری و اختلال در داده ها، مشکل جمع آوری نمودن داده آموزش و خطای زیاد در داده های آموزشی مواجه می باشد. قابلیت های یادگیری تقویتی همچون عدم نیاز به داده آموزشی و آموزش تنها با استفاده از یک معیار اسکالر راندمان باعث کاربرد آن در ناوبری ربات شده است. از طرفی یادگیری تقویتی زمانبر بوده و دارای نرخ شکست های بالا در مرحله آموزش می باشد. در این مقاله، یک ایده جدید برای استفاده مؤثّر از هر دو الگوریتم یادگیری فوق ارائه می‌شود. یک کنترلگر فازی سوگنو مرتبه صفر با تعدادی عمل کاندید برای هر قاعده جهت تولید فرمان های کنترل ربات در نظر گرفته شده است. هدف از آموزش تعیین عمل مناسب برای هر قاعده است. روش ترکیبی پیشنهاد شده دو مرحله دارد. در مرحله اول، داده آموزشی با حرکت ربات توسط ناظر در محیط جمع آوری می شود. سپس با بهره گیری از روش جدید ارائه شده، پارامترهای ارزشِ هر عمل کاندید در قواعد فازی با کمک داده‌های آموزشی مقدار دهی اولیه می‌شوند. در مرحله دوم از الگوریتم سارسای فازی برای تنظیم دقیق‌تر پارامترهای تالی کنترلگر بصورت برخط استفاده می شود. نتایج شبیه سازی در شبیه‌ساز KiKS برای ربات خپرا حاکی از بهبود قابل توجه در زمان یادگیری، تعداد شکست ها، و کیفیت حرکت ربات می‌باشد. Applying supervised learning in robot navigation encounters serious challenges such as inconsistence and noisy data, difficulty to gathering training data, and high error in training data. Reinforcement Learning (RL) capabilities such as lack of need to training data, training using only a scalar evaluation of efficiency and high degree of exploration have encourage researcher to use it in robot navigation problem. However, RL algorithms are time consuming also have high failure rate in the training phase. Here, a novel idea for utilizing advantages of both above supervised and reinforcement learning algorithms is proposed. A zero order Takagi-Sugeno (T-S) fuzzy controller with some candidate actions for each rule is considered as robot controller. The aim of training is to find appropriate action for each rule. This structure is compatible with Fuzzy Sarsa Learning (FSL) which is used as a continuous RL algorithm. In the first step, the robot is moved in the environment by a supervisor and the training data is gathered. As a hard tuning, the training data is used for initializing the value of each candidate action in the fuzzy rules. Afterwards, FSL fine-tunes the parameters of conclusion parts of the fuzzy controller online. The simulation results in KiKS simulator show that the proposed approach significantly improves the learning time, the number of failures, and the quality of the robot motion. ناوبری ربات ,یادگیری باناظر, یادگیری تقویتی , کنترلگر فازی. Robot navigation, Supervised learning, Reinforcement learning, Fuzzy controller 1 10 http://joc.kntu.ac.ir/browse.php?a_code=A-10-91-2&slc_lang=fa&sid=1 Fateme Fathinezhad فاطمه فتحی نژاد fateme.fathinezhad@stu.yazduni.ac.ir 1003194753284600714 1003194753284600714 Yes دانشگاه یزد Vali Derhami ولی درهمی vderhami@yazduni.ac.ir 1003194753284600715 1003194753284600715 No دانشگاه یزد