<-上一篇/Previous Article 下一篇/Next Article->

[1]张鹏鹏,魏长赟,张恺睿,等.旋翼无人机在移动平台降落的控制参数自学习调节方法[J].智能系统学报,2022,17(5):931-940.[doi:10.11992/tis.202107040]
　ZHANG Pengpeng,WEI Changyun,ZHANG Kairui,et al.Self-learning approach to control parameter adjustment for quadcopter landing on a moving platform[J].CAAI Transactions on Intelligent Systems,2022,17(5):931-940.[doi:10.11992/tis.202107040]

点击复制

旋翼无人机在移动平台降落的控制参数自学习调节方法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第5期页码: 931-940 栏目: 学术论文—机器学习出版日期: 2022-09-05

Title:: Self-learning approach to control parameter adjustment for quadcopter landing on a moving platform

作者:: 张鹏鹏, 魏长赟, 张恺睿, 欧阳勇平; 河海大学机电工程学院，江苏常州 213022

Author(s):: ZHANG Pengpeng, WEI Changyun, ZHANG Kairui, OUYANG Yongping; College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China

关键词:: 自主降落; 强化学习; 路径规划; COACH框架; 确定性策略梯度; 空地协同; 无人机; 最优控制

Keywords:: autonomous landing; reinforcement learning; path planning; COACH frame; deterministic policy gradient; air-ground cooperation; UAV; optimal control

分类号:: TP273+.2

DOI:: 10.11992/tis.202107040

文献标志码:: 2022-05-20

摘要:: 无人机设备能够适应复杂地形，但由于电池容量等原因，无人机无法长时间执行任务。无人机与其他无人系统（无人车、无人船等）协同能够有效提升无人机的工作时间，完成既定任务，当无人机完成任务后，将无人机迅速稳定地降落至移动平台上是一项必要且具有挑战性的工作。针对降落问题，文中提出了基于矫正纠偏COACH(corrective advice communicated humans)方法的深度强化学习比例积分微分(proportional-integral-derivative, PID)方法，为无人机降落至移动平台提供了最优路径。首先在仿真环境中使用矫正纠偏框架对强化学习模型进行训练，然后在仿真环境和真实环境中，使用训练后的模型输出控制参数，最后利用输出参数获得无人机位置控制量。仿真结果和真实无人机实验表明，基于矫正纠偏COACH方法的深度强化学习PID方法优于传统控制方法，且能稳定完成在移动平台上的降落任务。

Abstract:: Unmanned Aerial Vehicle (UAV) is a type of robot that performs well in mapping without being affected by the terrain. However, a UAV cannot perform its tasks for long due to its small battery capacity and several other reasons. The collaboration between UAVs and other unmanned ground vehicles (UGVs) is considered a crucial solution to this concern as it can save up the time taken by UAVs effectively when completing a scheduled task. When deploying a team of UAVs and UGVs, it is both important and challenging to land a UAV on a mobile platform quickly and stably. To circumvent the UAV landing issue, this study proposes a reinforcement learning PID method based on the correction COACH method, thereby providing an optimal path for the UAV to land on a mobile platform. First, the reinforcement learning agent is trained using the rectification framework in a simulated environment. Next, the trained agent is used for output control parameters in the simulated and true environments, and subsequently, the output parameters are utilized to obtain the control variables of the UAV’s position. The simulation and real UAV experiment results show that the deep reinforcement learning PID method based on the correction COACH method is superior to the traditional control method and can accomplish the task of a stable landing on a mobile platform.

参考文献/References:: [1] LIU P, CHEN A Y, HUANG Yinnan, et al. A review of rotorcraft Unmanned Aerial Vehicle (UAV) developments and applications in civil engineering[J]. Smart structures and systems, 2014, 13(6): 1065–1094.
[2] TSOUROS D, BIBI S, SARIGIANNIDIS P. A review on UAV-based applications for precision agriculture[J]. Information (Switzerland), 2019, 10(11): 349.
[3] REN H, ZHAO Y, XIAO W, et al. A review of UAV monitoring in mining areas: current status and future perspectives[J]. International journal of coal science & technology, 2019, 6(3): 320–333.
[4] MICHAEL N, SHEN Shaojie, MOHTA K, et al. Collaborative mapping of an earthquake-damaged building via ground and aerial robots[J]. Journal of field robotics, 2012, 29(5): 832–841.
[5] 王华鲜, 华容, 刘华平, 等. 无人机群多目标协同主动感知的自组织映射方法[J]. 智能系统学报, 2020, 15(3): 609?614.
WANG Huaxian, HUA Rong, LIU Huaping, et al. Self-organizing feature map method for multi-target active perception of unmanned aerial vehicle systems[J]. CAAI transactions on intelligent systems, 2020, 15(3): 609?614.
[6] BACA T, STEPAN P, SPURNY V, et al. Autonomous landing on a moving vehicle with an unmanned aerial vehicle[J]. Journal of field robotics, 2019, 36(5): 874–891.
[7] TALHA M, ASGHAR F, ROHAN A, et al. Fuzzy logic-based robust and autonomous safe landing for UAV quadcopter[J]. Arabian journal for science and engineering, 2019, 44(3): 2627–2639.
[8] FENG Yi, ZHANG Cong, BAEK S, et al. Autonomous landing of a UAV on a moving platform using model predictive control[J]. Drones, 2018, 2(4): 34.
[9] RODRIGUEZ-RAMOS A, SAMPEDRO C, BAVLE H, et al. A deep reinforcement learning technique for vision-based autonomous multirotor landing on a moving platform[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, IEEE, 2018: 1010?1017.
[10] SHAKER M, SMITH M N R, YUE Shigang, et al. Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning[C]//2010 International Conference on Emerging Security Technologies. Canterbury, IEEE, 2010: 183?188.
[11] RODRIGUEZ-RAMOS A, SAMPEDRO C, BAVLE H, et al. A deep reinforcement learning strategy for UAV autonomous landing on a moving platform[J]. Journal of intelligent & robotic systems, 2019, 93(1/2): 351–366.
[12] LEE S, SHIM T, KIM S, et al. Vision-based autonomous landing of a multi-copter unmanned aerial vehicle using reinforcement learning[C]//2018 International Conference on Unmanned Aircraft Systems (ICUAS). Dallas, IEEE, 2018: 108?114.
[13] ARULKUMARAN K, DEISENROTH M, BRUNDAGE M, et al. Deep reinforcement learning: a brief survey[J]. IEEE signal processing magazine, 2017, 34: 26–38.
[14] HESSEL M, SOYER H, ESPEHOLT L, et al. Multi-task deep reinforcement learning with PopArt[J]. Proceedings of the AAAI conference on artificial intelligence, 2019, 33: 3796–3803.
[15] SEDIGHIZADEH M, REZAZADEH A. Adaptive PID controller based on reinforcement learning for wind turbine control[J]. World academy of science, engineering and technology, international journal of computer, electrical, automation, control and information engineering, 2008, 2: 124–129.
[16] WANG Shuti, YIN Xunhe, LI Peng, et al. Trajectory tracking control for mobile robots using reinforcement learning and PID[J]. Iranian journal of science and technology, transactions of electrical engineering, 2020, 44(3): 1059–1068.
[17] ASADI K, KALKUNTE SURESH A, ENDER A, et al. An integrated UGV-UAV system for construction site data collection[J]. Automation in construction, 2020, 112: 103068.
[18] ERGINER B, ALTUG E. Modeling and PD control of a quadrotor VTOL vehicle[C]//2007 IEEE Intelligent Vehicles Symposium. Istanbul, IEEE, 2007: 894?899.
[19] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2015?01?01)[2021?01?01]. https: //arxiv. org/abs/1509.02971.
[20] CARLUCHO I, DE PAULA M, VILLAR S A, et al. Incremental Q-learning strategy for adaptive PID control of mobile robots[J]. Expert systems with applications, 2017, 80: 183–199.
[21] CHOI J, CHEON D, LEE J. Robust landing control of a quadcopter on a slanted surface[J]. International journal of precision engineering and manufacturing, 2021, 22(6): 1147–1156.
[22] KIM J, JUNG Y, LEE D, et al. Landing control on a mobile platform for multi-copters using an omnidirectional image sensor[J]. Journal of intelligent & robotic systems, 2016, 84(1/2/3/4): 529–541.
[23] CELEMIN C, RUIZ-DEL-SOLAR J. An interactive framework for learning continuous actions policies based on corrective feedback[J]. Journal of intelligent & robotic systems, 2019, 95(1): 77–97.
[24] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484?489.
[25] GRIGORESCU S, TRASNEA B, COCIAS T, et al. A survey of deep learning techniques for autonomous driving[J]. Journal of field robotics, 2020, 37(3): 362?386.
[26] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[EB/OL].(2017?01?01)[2021?01?01]. https: //arxiv. org/abs/1710.02298.
[27] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge, Mass: MIT Press, 1998.
[28] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279–292.
[29] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529?533.
[30] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[EB/OL]. (2015?05?01)[2020?12?20].https://arxiv. org/abs/1509.06461v3.
[31] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]// International conference on machine learning. PMLR, 2016: 1995?2003.
[32] KOUBAA A. Robot operating system (ROS): The complete reference[M]. volume 1. Cham: Springer, 2016.

相似文献/References:: [1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
　LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[2]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
　LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[3]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
　ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[4]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[5]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202.[doi:10.11992/tis.201609017]
　ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13():202.[doi:10.11992/tis.201609017]
[6]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(1):93.[doi:10.11992/tis.201807010]
　XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14():93.[doi:10.11992/tis.201807010]
[7]郭宪,方勇纯.仿生机器人运动步态控制：强化学习方法综述[J].智能系统学报,2020,15(1):152.[doi:10.11992/tis.201907052]
　GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15():152.[doi:10.11992/tis.201907052]
[8]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317.[doi:10.11992/tis.201809033]
　SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():317.[doi:10.11992/tis.201809033]
[9]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[10]莫宏伟,田朋.基于注意力融合的图像描述生成方法[J].智能系统学报,2020,15(4):740.[doi:10.11992/tis.201910039]
　MO Hongwei,TIAN Peng.An image caption generation method based on attention fusion[J].CAAI Transactions on Intelligent Systems,2020,15():740.[doi:10.11992/tis.201910039]

备注/Memo

收稿日期:2021-07-20。
基金项目:国家自然科学基金项目（61703138）；中央高校基本科研业务费项目（B200202224）.
作者简介:张鹏鹏，硕士研究生，主要研究方向为空地协同系统、智能无人系统;魏长赟，副教授，博士，博士毕业于荷兰代尔夫特理工大学人工智能专业，英国卡迪夫大学机器人及自主系统实验室访问学者，主要研究方向是自主智能无人系统。以第一作者发表学术论文20余篇，出版英文专著1本;张恺睿，本科，主要研究方向为智能无人系统。
通讯作者:魏长赟. E-mail：c.wei@hhu.edu.cn

更新日期/Last Update: 1900-01-01

旋翼无人机在移动平台降落的控制参数自学习调节方法 PDF下载HTML

备注/Memo

旋翼无人机在移动平台降落的控制参数自学习调节方法

PDF下载 HTML