<-上一篇/Previous Article 下一篇/Next Article->

[1]王竣禾,姜勇.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2-11.[doi:10.11992/tis.202201006]
　WANG Junhe,JIANG Yong.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18(1):2-11.[doi:10.11992/tis.202201006]

点击复制

基于深度强化学习的动态装配算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第1期页码: 2-11 栏目: 学术论文—机器学习出版日期: 2023-01-05

Title:: Dynamic assembly algorithm based on deep reinforcement learning

作者:: 王竣禾^1,2,3, 姜勇^1,2; 1. 中国科学院沈阳自动化研究所机器人学国家重点实验室，辽宁沈阳 110016;
2. 中国科学院机器人与智能制造创新研究院，辽宁沈阳 110169;
3. 中国科学院大学，北京 100049

Author(s):: WANG Junhe^1,2,3, JIANG Yong^1,2; 1. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
2. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China

关键词:: 柔索模型; 动态噪声; 动态装配; 深度强化学习; 长短时记忆网络; 序列贴现因子; 带有资格迹的时序差分算法; 预训练

Keywords:: flexible cable model; dynamic noise; dynamic assembly; deep reinforcement learning; long short-term memory; sequential discount factor; temporal difference(λ); pre-training

分类号:: TP242.6

DOI:: 10.11992/tis.202201006

摘要:: 针对动态装配环境中存在的复杂、动态的噪声扰动，提出一种基于深度强化学习的动态装配算法。将一段时间内的接触力作为状态，通过长短时记忆网络进行运动特征提取；定义序列贴现因子，对之前时刻的分奖励进行加权得到当前时刻的奖励值；模型输出的动作为笛卡尔空间位移，使用逆运动学调整机器人到达期望位置。与此同时，提出一种对带有资格迹的时序差分算法改进的神经网络参数更新方法，可缩短模型训练时间。在实验部分，首先在圆孔–轴的简单环境中进行预训练，随后在真实场景下继续训练。实验证明提出的方法可以很好地适应动态装配任务中柔性、动态的装配环境。

Abstract:: A dynamic assembly algorithm based on deep reinforcement learning is proposed for complex dynamic noise perturbations in the dynamic assembly environment. Taking the contact force in a period of time as a state, the motion features are extracted through the long short-term memory. Define the sequence discount factor, and obtain the reward value at a certain moment through weighting the sub-reward at the previous moment. The robot can be adjusted to the desired position using inverse kinematics, with the action of model output as the Cartesian space displacement. In the meanwhile, an improved neural network parameter update method is proposed based on the temporal difference (λ) algorithm to shorten the model training time. Experimentally, training was conducted in the real scene upon pre-training in the simple environment with the circular hole-axis. According to the experiments, the proposed algorithm can well adapt to the flexible and dynamic assembly environment in a dynamic assembly task.

参考文献/References:: [1] LI Fengming. Robot skill acquisition in assembly process using deep reinforcement learning[J]. Neurocomputing, 2019, 345: 92–102.
[2] WANG Zichen, YANG Xiansheng, HU Haopeng, et al. Actor-critic method-based search strategy for high precision peg-in-hole tasks[C]//2019 IEEE International Conference on Real-time Computing and Robotics. Irkutsk: IEEE, 2019: 458?463.
[3] TE Tang, LIN H C, ZHAO Yu, et al. Autonomous alignment of peg and hole by force/torque measurement for robotic assembly[C]//2016 IEEE International Conference on Automation Science and Engineering. Fort Worth: IEEE, 2016: 162?167.
[4] ROURKE J M, WHITNEY D E. Remote center compliance device: US4556203[P]. 1985?12?03.
[5] LEE S. Development of a new variable remote center compliance (VRCC) with modified elastomer shear pad (ESP) for robot assembly[J]. IEEE transactions on automation science and engineering, 2005, 2(2): 193–197.
[6] MOL N, SMISEK J, BABU?KA R, et al. Nested compliant admittance control for robotic mechanical assembly of misaligned and tightly toleranced parts[C]//2016 IEEE International Conference on Systems, Man, and Cybernetics. Budapest: IEEE, 2016: 2717?2722.
[7] HE Gang, SHI Shicai, WANG Da, et al. A strategy for large workpiece assembly based on hybrid impedance control[C]//2019 IEEE International Conference on Mechatronics and Automation. Tianjin: IEEE, 2019: 799?804.
[8] FORTE D, UDE A, KOS A. Robot learning by Gaussian process regression[C]//19th International Workshop on Robotics in Alpe-Adria-Danube Region. Budapest: IEEE, 2010: 303?308.
[9] BHATTACHARYA S, DUTTA S, MAITI T K, et al. Machine learning algorithm for autonomous control of walking robot[C]//2018 International Symposium on Devices, Circuits and Systems. Howrah: IEEE, 2018: 1?4.
[10] FINN C, LEVINE S. Deep visual foresight for planning robot motion[C]//2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017: 2786?2793.
[11] NEMEC B, ?LAJPAH L, UDE A. Door opening by joining reinforcement learning and intelligent control[C]//2017 18th International Conference on Advanced Robotics. Hong Kong: IEEE, 2017: 222?228.
[12] XU Jing, HOU Zhimin, WANG Wei, et al. Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks[J]. IEEE transactions on industrial informatics, 2019, 15(3): 1658–1667.
[13] HE Fujun, WANG Xiaozheng, LIU Kai. Research on axle-hole assembly method based on improved DDPG algorithm[C]//2021 5th International Conference on Robotics and Automation Sciences. Wuhan: IEEE, 2021: 182?186.
[14] ROVEDA L, PALLUCCA G, PEDROCCHI N, et al. Iterative learning procedure with reinforcement for high-accuracy force tracking in robotized tasks[J]. IEEE transactions on industrial informatics, 2018, 14(4): 1753–1763.
[15] ZHOU Zhenning, NI Peiyuan, ZHU Xiaoxiao, et al. Compliant robotic assembly based on deep reinforcement learning[C]//2021 International Conference on Machine Learning and Intelligent Systems Engineering. Chongqing: IEEE, 2021: 6?9.
[16] NAGURI C R, BUNESCU R C. Recognition of dynamic hand gestures from 3D motion data using LSTM and CNN architectures[C]//2017 16th IEEE International Conference on Machine Learning and Applications. Cancun: IEEE, 2017: 1130?1133.
[17] WU Zhixuan, MA Nan, CHEUNG Y M, et al. Improved spatio-temporal convolutional neural networks for traffic police gestures recognition[C]//2020 16th International Conference on Computational Intelligence and Security. Guangxi: IEEE, 2020: 109?115.
[18] ZHANG Weihu, LIU Chang. Research on human abnormal behavior detection based on deep learning[C]//2020 International Conference on Virtual Reality and Intelligent Systems. Zhangjiajie: IEEE, 2020: 973?978.
[19] INOUE T, DE MAGISTRIS G, MUNAWAR A, et al. Deep reinforcement learning for high precision assembly tasks[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver: IEEE, 2017: 819?825.
[20] MA Yanqin, XU De, QIN Fangbo. Efficient insertion control for precision assembly based on demonstration learning and reinforcement learning[J]. IEEE transactions on industrial informatics, 2021, 17(7): 4492–4502.
[21] SUTTON Richard Stuart. Temporal credit assignment in rein-forcement learning[D]. Amherst: University of Mas-sachusetts Amherst, 1984: 93?118.
[22] 车立新, 杨汝清, 顾毅. 220/330kV变电设备高压带电清扫机器人设计[J]. 机器人, 2005, 27(2): 102–107
CHE Lixin, YANG Ruqing, GU Yi. Design of high-voltage hot-line sweeping robot used in 220/330kV substation[J]. Robot, 2005, 27(2): 102–107
[23] 周松. 高压输电线内力及变形分析的有限元法[J]. 四川电力技术, 1995, 18(2): 6–10
ZHOU Song. Finite element method for analysis of internal force and deformation of High voltage transmission lines[J]. Sichuan electric power technology, 1995, 18(2): 6–10
[24] 魏永乐, 房立金. 双臂巡检机器人沿输电线路行走特性研究[J]. 北京理工大学学报, 2019, 39(8): 813–818
WEI Yongle, FANG Lijin. Research on dual-arms inspection robots walking along transmission line[J]. Transactions of Beijing Institute of Technology, 2019, 39(8): 813–818
[25] MATEUS C, BARATA F A, LUíS R. Effects of broken skirts and pollution on voltage distribution for cap and pin glass insulators[C]//2020 IEEE 14th International Conference on Compatibility, Power Electronics and Power Engineering. Setubal: IEEE, 2020: 30?35.

备注/Memo

收稿日期:2022-01-04。
基金项目:国家自然科学基金项目(52075531).
作者简介:王竣禾,硕士研究生,主要研究方向为强化学习、智能机器人;姜勇,研究员,主要研究方向为机器人智能控制、适于复杂环境的机器人遥操作、嵌入式控制系统与应用、多传感器融合与系统健康管理、人机协同控制理论与方法、特种机器人控制系统设计与集成。负责及参加完成了国家863重点项目、国家自然科学基金青年及面上项目、中科院知识创新工程重大项目、辽宁省自然科学基金项目、机器人学重点实验室项目、国网及南网重点项目等20多项,申请国家发明专利3项,实用新型专利4项,登记软件著作权2项。参加编写专著2部,发表学术论文20多篇
通讯作者:姜勇.E-mail:jiangyong@sia.cn

更新日期/Last Update: 1900-01-01

基于深度强化学习的动态装配算法 PDF下载HTML

备注/Memo

基于深度强化学习的动态装配算法

PDF下载 HTML