[1]王竣禾,姜勇.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2-11.[doi:10.11992/tis.202201006]
WANG Junhe,JIANG Yong.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18(1):2-11.[doi:10.11992/tis.202201006]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
2-11
栏目:
学术论文—机器学习
出版日期:
2023-01-05
- Title:
-
Dynamic assembly algorithm based on deep reinforcement learning
- 作者:
-
王竣禾1,2,3, 姜勇1,2
-
1. 中国科学院沈阳自动化研究所 机器人学国家重点实验室,辽宁 沈阳 110016;
2. 中国科学院机器人与智能制造创新研究院,辽宁 沈阳 110169;
3. 中国科学院大学,北京 100049
- Author(s):
-
WANG Junhe1,2,3, JIANG Yong1,2
-
1. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
2. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China
-
- 关键词:
-
柔索模型; 动态噪声; 动态装配; 深度强化学习; 长短时记忆网络; 序列贴现因子; 带有资格迹的时序差分算法; 预训练
- Keywords:
-
flexible cable model; dynamic noise; dynamic assembly; deep reinforcement learning; long short-term memory; sequential discount factor; temporal difference(λ); pre-training
- 分类号:
-
TP242.6
- DOI:
-
10.11992/tis.202201006
- 摘要:
-
针对动态装配环境中存在的复杂、动态的噪声扰动,提出一种基于深度强化学习的动态装配算法。将一段时间内的接触力作为状态,通过长短时记忆网络进行运动特征提取;定义序列贴现因子,对之前时刻的分奖励进行加权得到当前时刻的奖励值;模型输出的动作为笛卡尔空间位移,使用逆运动学调整机器人到达期望位置。与此同时,提出一种对带有资格迹的时序差分算法改进的神经网络参数更新方法,可缩短模型训练时间。在实验部分,首先在圆孔–轴的简单环境中进行预训练,随后在真实场景下继续训练。实验证明提出的方法可以很好地适应动态装配任务中柔性、动态的装配环境。
- Abstract:
-
A dynamic assembly algorithm based on deep reinforcement learning is proposed for complex dynamic noise perturbations in the dynamic assembly environment. Taking the contact force in a period of time as a state, the motion features are extracted through the long short-term memory. Define the sequence discount factor, and obtain the reward value at a certain moment through weighting the sub-reward at the previous moment. The robot can be adjusted to the desired position using inverse kinematics, with the action of model output as the Cartesian space displacement. In the meanwhile, an improved neural network parameter update method is proposed based on the temporal difference (λ) algorithm to shorten the model training time. Experimentally, training was conducted in the real scene upon pre-training in the simple environment with the circular hole-axis. According to the experiments, the proposed algorithm can well adapt to the flexible and dynamic assembly environment in a dynamic assembly task.
备注/Memo
收稿日期:2022-01-04。
基金项目:国家自然科学基金项目(52075531).
作者简介:王竣禾,硕士研究生,主要研究方向为强化学习、智能机器人;姜勇,研究员,主要研究方向为机器人智能控制、适于复杂环境的机器人遥操作、嵌入式控制系统与应用、多传感器融合与系统健康管理、人机协同控制理论与方法、特种机器人控制系统设计与集成。负责及参加完成了国家863重点项目、国家自然科学基金青年及面上项目、中科院知识创新工程重大项目、辽宁省自然科学基金项目、机器人学重点实验室项目、国网及南网重点项目等20多项,申请国家发明专利3项,实用新型专利4项,登记软件著作权2项。参加编写专著2部,发表学术论文20多篇
通讯作者:姜勇.E-mail:jiangyong@sia.cn
更新日期/Last Update:
1900-01-01