GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15(1):152-159.[doi:10.11992/tis.201907052]





Locomotion gait control for bionic robots: a review of reinforcement learning methods
郭宪 方勇纯
南开大学 人工智能学院, 天津 300350
GUO Xian FANG Yongchun
College of Artificial Intelligence, Nankai University, Tianjin 300350, China
bionic robotlocomotion gaitcontrol methodreinforcement learningdata-drivenmulti-jointnonlinearunderactuated
The bionic robot is a typical multi-joint, nonlinear, underactuated system, for which locomotion gait control is of much challenge. For this problem, traditional control and planning methods need to be carefully designed for specific locomotion tasks, which takes a lot of time and efforts, yet lacks generality. On the contrary, data-driven reinforcement learning method can autonomously learn the controller for different locomotion tasks, and it presents the advantage of good generality for different bionic robots and locomotions. Therefore, in recent years, this reinforcement learning-based method has been widely used in the field of bionic robots to construct various locomotion gait controllers. In this paper, the current research status of reinforcement learning-based methods for the locomotion control of bionic robots is comprehensively analyzed, respectively from the following three aspects: formulation of the problem, policy representation, and policy learning. Finally, the problems to be solved in the field are and summarized, and the possible future research directions are provided.


[1] GEHRING C, COROS S, HUTTER M, et al. Practice makes perfect: an optimization-based approach to controlling agile motions for a quadruped robot[J]. IEEE robotics & automation magazine, 2016, 23(1): 34–43.
[2] APGAR T, CLARY P, GREEN K, et al. Fast online trajectory optimization for the bipedal robot Cassie[C]//Proceedings of Robotics: Science and Systems 2018. Pittsburgh, USA, 2018.
[3] RAIBERT M, BLANKESPOOR K, NELSON G, et al. BigDog, the rough-terrain quadruped robot[C]//Proceedings of the 17th World Congress of the International Federation of Automatic Control. Seoul, Korea, 2008: 10822?10825.
[4] Spotmini autonomous navigation[EB/OL].[2018-08-11]. https://ucrazy.ru/video/1526182828-spotmini-autonomous-navigation.html.
[5] PARK H W, PARK S, KIM S. Variable-speed quadrupedal bounding using impulse planning: Untethered high-speed 3D running of MIT Cheetah 2[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation. Seattle, USA, 2015: 5163?5170.
[6] HIROSE S, YAMADA H. Snake-like robots: machine design of biologically inspired robots[J]. IEEE robotics and automation magazine, 2009, 16(1): 88–98.
[7] HATTON R L, CHOSET H. Generating gaits for snake robots by annealed chain fitting and keyframe wave extraction[C]//Proceedings of 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, USA, 2009: 840?845.
[8] TAKEMORI T, TANAKA M, MATSUNO F. Gait design for a snake robot by connecting curve segments and experimental demonstration[J]. IEEE transactions on robotics, 2018, 34(5): 1384–1391.
[9] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533.
[10] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354–359.
[11] LEVINE S, KOLTUN V. Learning complex neural network policies with trajectory optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014: 829?837.
[12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Lille, France, 2015: 1889?1897.
[13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28). https://arxiv.org/abs/1707.06347.
[14] PENG Xuebin, BERSETH G, YIN Kangkang, et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning[J]. ACM transactions on graphics, 2017, 36(4): 1–13.
[15] ABDOLMALEKI A, SPRINGENBERG J T, TASSA Y, et al. Maximum a posteriori policy optimisation[EB/OL]. (2018-06-14). https://arxiv.org/abs/1806.06920.
[16] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[EB/OL]. (2019-01-29). https://arxiv.org/abs/1812.05905.
[17] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholmsm?ssan, Sweden, 2018: 1587?1596.
[18] HWANGBO J, LEE J, DOSOVITSKIY A, et al. Learning agile and dynamic motor skills for legged robots[J]. Science robotics, 2019, 4(26): 5872–5880.
[19] HAARNOJA T, HA S, ZHOU A, et al. Learning to walk via deep reinforcement learning[EB/OL]. (2019-06-19). https://arxiv.org/abs/1812.11103
[20] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
[21] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer science, 2015, 8(6): A187.
[22] BOHEZ S, ABDOLMALEKI A, NEUNERT M, et al. Value constrained model-free continuous control[EB/OL]. (2019-02-12). https://arxiv.org/abs/1902.04623.
[23] ALTMAN E. Constrained Markov decision processes[M]. London: Chapman and Hall, 1999.
[24] DELCOMYN F. Neural basis of rhythmic behavior in animals[J]. Science, 1980, 210(4469): 492–498.
[25] MATSUOKA K. Sustained oscillations generated by mutually inhibiting neurons with adaptation[J]. Biological cybernetics, 1985, 52(6): 367–376.
[26] COHEN A H, HOLMES P J, RAND R H. The nature of the coupling between segmental oscillators of the lamprey spinal generator for locomotion: a mathematical model[J]. Journal of mathematical biology, 1982, 13(3): 345–369.
[27] BAY J S, HEMAMI H. Modeling of a neural pattern generator with coupled nonlinear oscillators[J]. IEEE transactions on biomedical engineering, 1987, BME-34(4): 297–306.
[28] ENDO G, MORIMOTO J, MATSUBARA T, et al. Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot[J]. The international journal of robotics research, 2008, 27(2): 213–228.
[29] MATSUBARA T, MORIMOTO J, NAKANISHI J, et al. Learning CPG-based biped locomotion with a policy gradient method[C]//Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots. Tsukuba, Japan, 2005.
[30] DOYA K. Reinforcement learning in continuous time and space[J]. Neural computation, 2000, 12(1): 219–245.
[31] SARTORETTI G, PAIVINE W, SHI Yunfei, et al. Distributed learning of decentralized control policies for articulated mobile robots[J]. IEEE transactions on robotics, 2019, 35(5): 1109–1122.
[32] 方勇纯, 朱威, 郭宪. 基于路径积分强化学习方法的蛇形机器人目标导向运动[J]. 模式识别与人工智能, 2019, 32(1): 1–9
FANG Yongchun, ZHU Wei, GUO Xian. Target-directed locomotion of a snake-like robot based on path integral reinforcement learning[J]. Pattern recognition and artificial intelligence, 2019, 32(1): 1–9
[33] IJSPEERT A J, SCHAAL S. Learning attractor landscapes for learning motor primitives[M]//THRUN S, SAUL L K, SCHOLKOPF B. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002: 1547?1554.
[34] SCHAAL S, PETERS J, NAKANISHI J, et al. Learning movement primitives[M]//DARIORAJA P, CHATILA R. Robotics Research. The Eleventh International Symposium. Berlin, Germany: Springer, 2005.
[35] YU Wenhao, TURK G, LIU C K. Learning symmetric and low-energy locomotion[J]. ACM transactions on graphics, 2018, 37(4): 144–150.
[36] PENG Xuebin, BERSETH G, VAN DE PANNE M. Terrain-adaptive locomotion skills using deep reinforcement learning[J]. ACM transactions on graphics, 2016, 35(4): 81–88.
[37] BING Zhenshan, LEMKE C, JIANG Zhuangyi, et al. Energy-efficient slithering gait exploration for a snake-like robot based on reinforcement learning[EB/OL]. (2019-04-16). https://arxiv.org/abs/1904.07788v1.
[38] PENG Xuebin, VAN DE PANNE M. Learning locomotion skills using DeepRL: does the choice of action space matter?[C]//Proceeding of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Los Angeles, USA, 2017: 12?20.
[39] VAN HASSELT H. Double q-learning[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook, USA, 2010: 2613?2621.
[40] HA D, SCHMIDHUBER J. World Models[EB/OL]. (2018-05-09). https://arxiv.org/abs/1803.10122.
[41] EBERT F, FINN C, DASARI S, et al. Visual foresight: model-based deep reinforcement learning for vision-based robotic control[EB/OL]. (2018-12-03). https://arxiv.org/abs/1812.00568.
[42] FINN C, RAJESWARAN A, KAKADE S, et al. Online meta-learning[EB/OL]. (2019-07-03). https://arxiv.org/abs/1902.08438.
[43] MAHJOURIAN R, MⅡKKULAINEN R, LAZIC N, et al. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play[EB/OL]. (2019-02-17). https://arxiv.org/abs/1811.12927?context=cs.


 LIU Yuting,GUO Jian,SUN Shan,et al.Novel bionic spherical amphibious mother-son robot system design[J].CAAI Transactions on Intelligent Systems,2019,14(1):582.[doi:10.11992/tis.201710025]


更新日期/Last Update: 1900-01-01