[1]郭宪,方勇纯.仿生机器人运动步态控制:强化学习方法综述[J].智能系统学报,2020,15(1):152-159.[doi:10.11992/tis.201907052]
 GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15(1):152-159.[doi:10.11992/tis.201907052]
点击复制

仿生机器人运动步态控制:强化学习方法综述

参考文献/References:
[1] GEHRING C, COROS S, HUTTER M, et al. Practice makes perfect: an optimization-based approach to controlling agile motions for a quadruped robot[J]. IEEE robotics & automation magazine, 2016, 23(1): 34–43.
[2] APGAR T, CLARY P, GREEN K, et al. Fast online trajectory optimization for the bipedal robot Cassie[C]//Proceedings of Robotics: Science and Systems 2018. Pittsburgh, USA, 2018.
[3] RAIBERT M, BLANKESPOOR K, NELSON G, et al. BigDog, the rough-terrain quadruped robot[C]//Proceedings of the 17th World Congress of the International Federation of Automatic Control. Seoul, Korea, 2008: 10822?10825.
[4] Spotmini autonomous navigation[EB/OL].[2018-08-11]. https://ucrazy.ru/video/1526182828-spotmini-autonomous-navigation.html.
[5] PARK H W, PARK S, KIM S. Variable-speed quadrupedal bounding using impulse planning: Untethered high-speed 3D running of MIT Cheetah 2[C]//Proceedings of 2015 IEEE International Conference on Robotics and Automation. Seattle, USA, 2015: 5163?5170.
[6] HIROSE S, YAMADA H. Snake-like robots: machine design of biologically inspired robots[J]. IEEE robotics and automation magazine, 2009, 16(1): 88–98.
[7] HATTON R L, CHOSET H. Generating gaits for snake robots by annealed chain fitting and keyframe wave extraction[C]//Proceedings of 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, USA, 2009: 840?845.
[8] TAKEMORI T, TANAKA M, MATSUNO F. Gait design for a snake robot by connecting curve segments and experimental demonstration[J]. IEEE transactions on robotics, 2018, 34(5): 1384–1391.
[9] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533.
[10] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354–359.
[11] LEVINE S, KOLTUN V. Learning complex neural network policies with trajectory optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014: 829?837.
[12] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 31st International Conference on Machine Learning. Lille, France, 2015: 1889?1897.
[13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28). https://arxiv.org/abs/1707.06347.
[14] PENG Xuebin, BERSETH G, YIN Kangkang, et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning[J]. ACM transactions on graphics, 2017, 36(4): 1–13.
[15] ABDOLMALEKI A, SPRINGENBERG J T, TASSA Y, et al. Maximum a posteriori policy optimisation[EB/OL]. (2018-06-14). https://arxiv.org/abs/1806.06920.
[16] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[EB/OL]. (2019-01-29). https://arxiv.org/abs/1812.05905.
[17] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholmsm?ssan, Sweden, 2018: 1587?1596.
[18] HWANGBO J, LEE J, DOSOVITSKIY A, et al. Learning agile and dynamic motor skills for legged robots[J]. Science robotics, 2019, 4(26): 5872–5880.
[19] HAARNOJA T, HA S, ZHOU A, et al. Learning to walk via deep reinforcement learning[EB/OL]. (2019-06-19). https://arxiv.org/abs/1812.11103
[20] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
[21] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer science, 2015, 8(6): A187.
[22] BOHEZ S, ABDOLMALEKI A, NEUNERT M, et al. Value constrained model-free continuous control[EB/OL]. (2019-02-12). https://arxiv.org/abs/1902.04623.
[23] ALTMAN E. Constrained Markov decision processes[M]. London: Chapman and Hall, 1999.
[24] DELCOMYN F. Neural basis of rhythmic behavior in animals[J]. Science, 1980, 210(4469): 492–498.
[25] MATSUOKA K. Sustained oscillations generated by mutually inhibiting neurons with adaptation[J]. Biological cybernetics, 1985, 52(6): 367–376.
[26] COHEN A H, HOLMES P J, RAND R H. The nature of the coupling between segmental oscillators of the lamprey spinal generator for locomotion: a mathematical model[J]. Journal of mathematical biology, 1982, 13(3): 345–369.
[27] BAY J S, HEMAMI H. Modeling of a neural pattern generator with coupled nonlinear oscillators[J]. IEEE transactions on biomedical engineering, 1987, BME-34(4): 297–306.
[28] ENDO G, MORIMOTO J, MATSUBARA T, et al. Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot[J]. The international journal of robotics research, 2008, 27(2): 213–228.
[29] MATSUBARA T, MORIMOTO J, NAKANISHI J, et al. Learning CPG-based biped locomotion with a policy gradient method[C]//Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots. Tsukuba, Japan, 2005.
[30] DOYA K. Reinforcement learning in continuous time and space[J]. Neural computation, 2000, 12(1): 219–245.
[31] SARTORETTI G, PAIVINE W, SHI Yunfei, et al. Distributed learning of decentralized control policies for articulated mobile robots[J]. IEEE transactions on robotics, 2019, 35(5): 1109–1122.
[32] 方勇纯, 朱威, 郭宪. 基于路径积分强化学习方法的蛇形机器人目标导向运动[J]. 模式识别与人工智能, 2019, 32(1): 1–9
FANG Yongchun, ZHU Wei, GUO Xian. Target-directed locomotion of a snake-like robot based on path integral reinforcement learning[J]. Pattern recognition and artificial intelligence, 2019, 32(1): 1–9
[33] IJSPEERT A J, SCHAAL S. Learning attractor landscapes for learning motor primitives[M]//THRUN S, SAUL L K, SCHOLKOPF B. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002: 1547?1554.
[34] SCHAAL S, PETERS J, NAKANISHI J, et al. Learning movement primitives[M]//DARIORAJA P, CHATILA R. Robotics Research. The Eleventh International Symposium. Berlin, Germany: Springer, 2005.
[35] YU Wenhao, TURK G, LIU C K. Learning symmetric and low-energy locomotion[J]. ACM transactions on graphics, 2018, 37(4): 144–150.
[36] PENG Xuebin, BERSETH G, VAN DE PANNE M. Terrain-adaptive locomotion skills using deep reinforcement learning[J]. ACM transactions on graphics, 2016, 35(4): 81–88.
[37] BING Zhenshan, LEMKE C, JIANG Zhuangyi, et al. Energy-efficient slithering gait exploration for a snake-like robot based on reinforcement learning[EB/OL]. (2019-04-16). https://arxiv.org/abs/1904.07788v1.
[38] PENG Xuebin, VAN DE PANNE M. Learning locomotion skills using DeepRL: does the choice of action space matter?[C]//Proceeding of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Los Angeles, USA, 2017: 12?20.
[39] VAN HASSELT H. Double q-learning[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Red Hook, USA, 2010: 2613?2621.
[40] HA D, SCHMIDHUBER J. World Models[EB/OL]. (2018-05-09). https://arxiv.org/abs/1803.10122.
[41] EBERT F, FINN C, DASARI S, et al. Visual foresight: model-based deep reinforcement learning for vision-based robotic control[EB/OL]. (2018-12-03). https://arxiv.org/abs/1812.00568.
[42] FINN C, RAJESWARAN A, KAKADE S, et al. Online meta-learning[EB/OL]. (2019-07-03). https://arxiv.org/abs/1902.08438.
[43] MAHJOURIAN R, MⅡKKULAINEN R, LAZIC N, et al. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play[EB/OL]. (2019-02-17). https://arxiv.org/abs/1811.12927?context=cs.
相似文献/References:
[1]刘羽婷,郭健,孙珊,等.新型仿生球形两栖子母机器人系统设计[J].智能系统学报,2019,14(3):582.[doi:10.11992/tis.201710025]
 LIU Yuting,GUO Jian,SUN Shan,et al.Novel bionic spherical amphibious mother-son robot system design[J].CAAI Transactions on Intelligent Systems,2019,14(1):582.[doi:10.11992/tis.201710025]

备注/Memo

收稿日期:2019-07-29。
基金项目:国家自然科学基金项目(61603200);天津市自然科学基金青年项目(19JCQNJC03200)
作者简介:郭宪,讲师,博士,主要研究方向为仿生机器人设计与智能运动控制。主持国家自然科学基金项目1项,省部级项目2项;方勇纯,教授,博士生导师,南开大学人工智能学院院长,主要研究方向为机器人视觉控制、欠驱动吊运系统控制、仿生机器人运动控制和微纳米操作。主持国家重点研发计划项目、国家基金重点项目、“十二五”国家技术支撑计划课题、国家基金仪器专项等项目。获吴文俊人工智能自然科学奖一等奖、天津市专利奖金奖、天津市自然科学一等奖、高等教育教学成果一等奖等多项奖励,发表学术论文100余篇.
通讯作者:方勇纯.E-mail:fangyc@nankai.edu.cn

更新日期/Last Update: 1900-01-01
Copyright @ 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134