[1]郭宪,方勇纯.仿生机器人运动步态控制:强化学习方法综述[J].智能系统学报,2020,15(1):152-159.[doi:10.11992/tis.201907052]
 GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15(1):152-159.[doi:10.11992/tis.201907052]
点击复制

仿生机器人运动步态控制:强化学习方法综述(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年1期
页码:
152-159
栏目:
人工智能院长论坛
出版日期:
2020-01-01

文章信息/Info

Title:
Locomotion gait control for bionic robots: a review of reinforcement learning methods
作者:
郭宪 方勇纯
南开大学 人工智能学院, 天津 300350
Author(s):
GUO Xian FANG Yongchun
College of Artificial Intelligence, Nankai University, Tianjin 300350, China
关键词:
仿生机器人运动步态控制方法强化学习数据驱动多关节非线性欠驱动
Keywords:
bionic robotlocomotion gaitcontrol methodreinforcement learningdata-drivenmulti-jointnonlinearunderactuated
分类号:
TP18
DOI:
10.11992/tis.201907052
摘要:
仿生机器人是一类典型的多关节非线性欠驱动系统,其步态控制是一个非常具有挑战性的问题。对于该问题,传统的控制和规划方法需要针对具体的运动任务进行专门设计,需要耗费大量时间和精力,而且所设计出来的控制器往往没有通用性。基于数据驱动的强化学习方法能对不同的任务进行自主学习,且对不同的机器人和运动任务具有良好的通用性。因此,近年来这种基于强化学习的方法在仿生机器人运动步态控制方面获得了不少应用。针对这方面的研究,本文从问题形式化、策略表示方法和策略学习方法3个方面对现有的研究情况进行了分析和总结,总结了强化学习应用于仿生机器人步态控制中尚待解决的问题,并指出了后续的发展方向。
Abstract:
The bionic robot is a typical multi-joint, nonlinear, underactuated system, for which locomotion gait control is of much challenge. For this problem, traditional control and planning methods need to be carefully designed for specific locomotion tasks, which takes a lot of time and efforts, yet lacks generality. On the contrary, data-driven reinforcement learning method can autonomously learn the controller for different locomotion tasks, and it presents the advantage of good generality for different bionic robots and locomotions. Therefore, in recent years, this reinforcement learning-based method has been widely used in the field of bionic robots to construct various locomotion gait controllers. In this paper, the current research status of reinforcement learning-based methods for the locomotion control of bionic robots is comprehensively analyzed, respectively from the following three aspects: formulation of the problem, policy representation, and policy learning. Finally, the problems to be solved in the field are and summarized, and the possible future research directions are provided.

参考文献/References:

[1] GEHRING C, COROS S, HUTTER M, et al. Practice makes perfect: an optimization-based approach to controlling agile motions for a quadruped robot[J]. IEEE robotics & automation magazine, 2016, 23(1): 34–43.
[2] APGAR T, CLARY P, GREEN K, et al. Fast online trajectory optimization for the bipedal robot Cassie[C]//Proceedings of Robotics: Science and Systems 2018. Pittsburgh, USA, 2018.
[3] RAIBERT M, BLANKESPOOR K, NELSON G, et al. BigDog, the rough-terrain quadruped robot[C]//Proceedings of the 17th World Congress of the International Federation of Automatic Control. Seoul, Korea, 2008: 10822?10825.
[4] Spotmini autonomous navigation[EB/OL].[2018-08-11]. https://ucrazy.ru/video/1526182828-spotmini-autonomous-navigation.html.
[5] PARK H W, PARK S, KIM S. Variable-speed quadrupedal bounding using impulse planning: Untethered high-speed 3D running of MIT Cheetah 2[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation. Seattle, USA, 2015: 5163?5170.
[6] HIROSE S, YAMADA H. Snake-like robots: machine design of biologically inspired robots[J]. IEEE robotics and automation magazine, 2009, 16(1): 88–98.
[7] HATTON R L, CHOSET H. Generating gaits for snake robots by annealed chain fitting and keyframe wave extraction[C]//Proceedings of 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, USA, 2009: 840?845.
[8] TAKEMORI T, TANAKA M, MATSUNO F. Gait design for a snake robot by connecting curve segments and experimental demonstration[J]. IEEE transactions on robotics, 2018, 34(5): 1384–1391.
[9] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533.
[10] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354–359.
[11] LEVINE S, KOLTUN V. Learning complex neural network policies with trajectory optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014: 829?837.
[12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Lille, France, 2015: 1889?1897.
[13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28). https://arxiv.org/abs/1707.06347.
[14] PENG Xuebin, BERSETH G, YIN Kangkang, et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning[J]. ACM transactions on graphics, 2017, 36(4): 1–13.
[15] ABDOLMALEKI A, SPRINGENBERG J T, TASSA Y, et al. Maximum a posteriori policy optimisation[EB/OL]. (2018-06-14). https://arxiv.org/abs/1806.06920.
[16] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[EB/OL]. (2019-01-29). https://arxiv.org/abs/1812.05905.
[17] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholmsm?ssan, Sweden, 2018: 1587?1596.
[18] HWANGBO J, LEE J, DOSOVITSKIY A, et al. Learning agile and dynamic motor skills for legged robots[J]. Science robotics, 2019, 4(26): 5872–5880.
[19] HAARNOJA T, HA S, ZHOU A, et al. Learning to walk via deep reinforcement learning[EB/OL]. (2019-06-19). https://arxiv.org/abs/1812.11103
[20] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
[21] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer science, 2015, 8(6): A187.
[22] BOHEZ S, ABDOLMALEKI A, NEUNERT M, et al. Value constrained model-free continuous control[EB/OL]. (2019-02-12). https://arxiv.org/abs/1902.04623.
[23] ALTMAN E. Constrained Markov decision processes[M]. London: Chapman and Hall, 1999.
[24] DELCOMYN F. Neural basis of rhythmic behavior in animals[J]. Science, 1980, 210(4469): 492–498.
[25] MATSUOKA K. Sustained oscillations generated by mutually inhibiting neurons with adaptation[J]. Biological cybernetics, 1985, 52(6): 367–376.
[26] COHEN A H, HOLMES P J, RAND R H. The nature of the coupling between segmental oscillators of the lamprey spinal generator for locomotion: a mathematical model[J]. Journal of mathematical biology, 1982, 13(3): 345–369.
[27] BAY J S, HEMAMI H. Modeling of a neural pattern generator with coupled nonlinear oscillators[J]. IEEE transactions on biomedical engineering, 1987, BME-34(4): 297–306.
[28] ENDO G, MORIMOTO J, MATSUBARA T, et al. Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot[J]. The international journal of robotics research, 2008, 27(2): 213–228.
[29] MATSUBARA T, MORIMOTO J, NAKANISHI J, et al. Learning CPG-based biped locomotion with a policy gradient method[C]//Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots. Tsukuba, Japan, 2005.
[30] DOYA K. Reinforcement learning in continuous time and space[J]. Neural computation, 2000, 12(1): 219–245.
[31] SARTORETTI G, PAIVINE W, SHI Yunfei, et al. Distributed learning of decentralized control policies for articulated mobile robots[J]. IEEE transactions on robotics, 2019, 35(5): 1109–1122.
[32] 方勇纯, 朱威, 郭宪. 基于路径积分强化学习方法的蛇形机器人目标导向运动[J]. 模式识别与人工智能, 2019, 32(1): 1–9
FANG Yongchun, ZHU Wei, GUO Xian. Target-directed locomotion of a snake-like robot based on path integral reinforcement learning[J]. Pattern recognition and artificial intelligence, 2019, 32(1): 1–9
[33] IJSPEERT A J, SCHAAL S. Learning attractor landscapes for learning motor primitives[M]//THRUN S, SAUL L K, SCHOLKOPF B. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002: 1547?1554.
[34] SCHAAL S, PETERS J, NAKANISHI J, et al. Learning movement primitives[M]//DARIORAJA P, CHATILA R. Robotics Research. The Eleventh International Symposium. Berlin, Germany: Springer, 2005.
[35] YU Wenhao, TURK G, LIU C K. Learning symmetric and low-energy locomotion[J]. ACM transactions on graphics, 2018, 37(4): 144–150.
[36] PENG Xuebin, BERSETH G, VAN DE PANNE M. Terrain-adaptive locomotion skills using deep reinforcement learning[J]. ACM transactions on graphics, 2016, 35(4): 81–88.
[37] BING Zhenshan, LEMKE C, JIANG Zhuangyi, et al. Energy-efficient slithering gait exploration for a snake-like robot based on reinforcement learning[EB/OL]. (2019-04-16). https://arxiv.org/abs/1904.07788v1.
[38] PENG Xuebin, VAN DE PANNE M. Learning locomotion skills using DeepRL: does the choice of action space matter?[C]//Proceeding of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Los Angeles, USA, 2017: 12?20.
[39] VAN HASSELT H. Double q-learning[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook, USA, 2010: 2613?2621.
[40] HA D, SCHMIDHUBER J. World Models[EB/OL]. (2018-05-09). https://arxiv.org/abs/1803.10122.
[41] EBERT F, FINN C, DASARI S, et al. Visual foresight: model-based deep reinforcement learning for vision-based robotic control[EB/OL]. (2018-12-03). https://arxiv.org/abs/1812.00568.
[42] FINN C, RAJESWARAN A, KAKADE S, et al. Online meta-learning[EB/OL]. (2019-07-03). https://arxiv.org/abs/1902.08438.
[43] MAHJOURIAN R, MⅡKKULAINEN R, LAZIC N, et al. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play[EB/OL]. (2019-02-17). https://arxiv.org/abs/1811.12927?context=cs.

相似文献/References:

[1]刘羽婷,郭健,孙珊,等.新型仿生球形两栖子母机器人系统设计[J].智能系统学报,2019,14(03):582.[doi:10.11992/tis.201710025]
 LIU Yuting,GUO Jian,SUN Shan,et al.Novel bionic spherical amphibious mother-son robot system design[J].CAAI Transactions on Intelligent Systems,2019,14(1):582.[doi:10.11992/tis.201710025]

备注/Memo

备注/Memo:
收稿日期:2019-07-29。
基金项目:国家自然科学基金项目(61603200);天津市自然科学基金青年项目(19JCQNJC03200)
作者简介:郭宪,讲师,博士,主要研究方向为仿生机器人设计与智能运动控制。主持国家自然科学基金项目1项,省部级项目2项;方勇纯,教授,博士生导师,南开大学人工智能学院院长,长江学者特聘教授,主要研究方向为机器人视觉控制、欠驱动吊运系统控制、仿生机器人运动控制和微纳米操作。主持国家重点研发计划项目、国家基金重点项目、“十二五”国家技术支撑计划课题、国家基金仪器专项等项目。获国家杰出青年科学基金资助,吴文俊人工智能自然科学奖一等奖、天津市专利奖金奖、天津市自然科学一等奖、高等教育教学成果一等奖等多项奖励,发表学术论文100余篇。
通讯作者:方勇纯.E-mail:fangyc@nankai.edu.cn
更新日期/Last Update: 1900-01-01