[1]陶鑫钰,王艳,纪志成.基于深度强化学习的节能工艺路线发现方法[J].智能系统学报,2023,18(1):23-35.[doi:10.11992/tis.202112030]
TAO Xinyu,WANG Yan,JI Zhicheng.Energy-saving process route discovery method based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18(1):23-35.[doi:10.11992/tis.202112030]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
23-35
栏目:
学术论文—机器学习
出版日期:
2023-01-05
- Title:
-
Energy-saving process route discovery method based on deep reinforcement learning
- 作者:
-
陶鑫钰1,2, 王艳1,2, 纪志成1,2
-
1. 江南大学 轻工过程先进控制教育部重点实验室, 江苏 无锡 214122;
2. 江南大学 物联网工程学院, 江苏 无锡 214122
- Author(s):
-
TAO Xinyu1,2, WANG Yan1,2, JI Zhicheng1,2
-
1. China Key Laboratory of Advanced Process Control for Light Industry Ministry of Education, Jiangnan University, Wuxi 214122, China;
2. School of the Internet of Things Engineering, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
深度强化学习; 深度Q网络; 动态加工环境; 工艺路线; 马尔可夫决策过程; 智能体决策; 双Q网络; 启发式算法
- Keywords:
-
deep reinforcement learning; deep Q network; dynamic machining environment; process planning; Markov decision process; agent decision making; double Q network; heuristic algorithm
- 分类号:
-
TP273
- DOI:
-
10.11992/tis.202112030
- 摘要:
-
由于传统基于固定加工环境的工艺路线制定规则,无法快速响应加工环境的动态变化制定节能工艺路线。因此提出了基于深度Q网络(deep Q network,DQN)的节能工艺路线发现方法。基于马尔可夫决策过程,定义状态向量、动作空间、奖励函数,建立节能工艺路线模型,并将加工环境动态变化的节能工艺路线规划问题,转化为DQN智能体决策问题,利用决策经验的可复用性和可扩展性,进行求解,同时为了提高DQN的收敛速度和解的质量,提出了基于S函数探索机制和加权经验池,并使用了双Q网络。仿真结果表明,相比较改进前,改进后的算法在动态加工环境中能够更快更好地发现节能工艺路线;与遗传算法、模拟退火算法以及粒子群算法相比,改进后的算法不仅能够以最快地速度发现节能工艺路线,而且能得到相同甚至更高精度的解。
- Abstract:
-
Due to the traditional process route formulation rules based on the fixed processing environment, it is unable to quickly respond to the dynamic changes of the processing environment to formulate energy-saving process routes. Therefore, an energy-saving process route discovery method based on deep Q network (DQN) is proposed in this paper. Based on the Markov decision process, we define the state vector, action space, and reward function, establish an energy-saving process route model, and transform the energy-saving process route planning problem with dynamic changes in the processing environment into a DQN agent decision-making problem, which uses the reusable and extensible decision-making experience to solve the problem. At the same time, an exploration mechanism based on the S function, a weighted experience pool, and a double-Q network are used to improve the convergence speed and solution quality of DQN. The simulation results show that compared with that before improvement, the improved algorithm can find energy-saving process routes faster and better in the dynamic processing environment; and compared with genetic algorithm, simulated annealing algorithm, as well as particle swarm algorithm, the improved algorithm can not only discover energy-saving process routes at the fastest speed, but also obtain the same or even higher precision solutions.
备注/Memo
收稿日期:2021-12-14。
基金项目:国家重点研发计划项目(2018YFB1701903).
作者简介:陶鑫钰,硕士研究生,主要研究方向为深度强化学习在工艺路线中的应用;王艳,教授,博士生导师,工业物联网技术集成应用方向技术带头人,主要研究方向为基于大数据知识自动化的离散制造能耗网络协同优化。承担国家自然科学基金项目2项、中国博士后特别资助项目1项、江苏省自然科学基金项目1项、教育部人文社科规划基金项目1项,发表学术论文近百篇;纪志成,教授,博士生导师,主要研究方向为制造物联集成与优化。申请及授权发明专利40余项,登记软件著作权100余项,发表学术论文200余篇,出版学术著作1部
通讯作者:王艳.E-mail:wangyan88@jiangnan.edu.cn
更新日期/Last Update:
1900-01-01