<-Previous Article Next Article->

[1]TAO Xinyu,WANG Yan,JI Zhicheng.Energy-saving process route discovery method based on deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2023,18(1):23-35.[doi:10.11992/tis.202112030]

Copy

Energy-saving process route discovery method based on deep reinforcement learning

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 18 Number of periods: 2023 1 Page number: 23-35 Column: 学术论文—机器学习 Public date: 2023-01-05

Title:: Energy-saving process route discovery method based on deep reinforcement learning

Author(s):: TAO Xinyu¹; 2; WANG Yan¹; 2; JI Zhicheng¹; 2; 1. China Key Laboratory of Advanced Process Control for Light Industry Ministry of Education, Jiangnan University, Wuxi 214122, China;
2. School of the Internet of Things Engineering, Jiangnan University, Wuxi 214122, China

Keywords:: deep reinforcement learning; deep Q network; dynamic machining environment; process planning; Markov decision process; agent decision making; double Q network; heuristic algorithm

CLC:: TP273

DOI:: 10.11992/tis.202112030

Abstract:: Due to the traditional process route formulation rules based on the fixed processing environment, it is unable to quickly respond to the dynamic changes of the processing environment to formulate energy-saving process routes. Therefore, an energy-saving process route discovery method based on deep Q network (DQN) is proposed in this paper. Based on the Markov decision process, we define the state vector, action space, and reward function, establish an energy-saving process route model, and transform the energy-saving process route planning problem with dynamic changes in the processing environment into a DQN agent decision-making problem, which uses the reusable and extensible decision-making experience to solve the problem. At the same time, an exploration mechanism based on the S function, a weighted experience pool, and a double-Q network are used to improve the convergence speed and solution quality of DQN. The simulation results show that compared with that before improvement, the improved algorithm can find energy-saving process routes faster and better in the dynamic processing environment; and compared with genetic algorithm, simulated annealing algorithm, as well as particle swarm algorithm, the improved algorithm can not only discover energy-saving process routes at the fastest speed, but also obtain the same or even higher precision solutions.

References:: [1] HALIM A H, ISMAIL I. Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem[J]. Archives of computational methods in engineering, 2019, 26(2): 367–380.
[2] REZOUG A, BADER-EL-DEN M, BOUGHACI D. Guided genetic algorithm for the multidimensional knapsack problem[J]. Memetic computing, 2018, 10(1): 29–42.
[3] KIEFFER E, DANOY G, BRUST M R, et al. Tackling large-scale and combinatorial bi-level problems with a genetic programming hyper-heuristic[J]. IEEE transactions on evolutionary computation, 2020, 24(1): 44–56.
[4] 陈科胜, 鲜思东, 郭鹏. 求解旅行商问题的自适应升温模拟退火算法[J]. 控制理论与应用, 2021, 38(2): 245–254
CHEN Kesheng, XIAN Sidong, GUO Peng. Adaptive heating simulation annealing algorithm for solving the traveling salesman problem[J]. Control theory & applications, 2021, 38(2): 245–254
[5] 何庆, 吴意乐, 徐同伟. 改进遗传模拟退火算法在TSP优化中的应用[J]. 控制与决策, 2018, 33(2): 219–225
HE Qing, WU YIle, XU Tongwei. Improve the application of genetic simulation annealing algorithm in TSP optimization[J]. Control and decision, 2018, 33(2): 219–225
[6] JOY J, RAJEEV S, ABRAHAM E C. Particle swarm optimization for multi resource constrained project scheduling problem with varying resource levels[J]. Materials today:proceedings, 2021, 47: 5125–5129.
[7] PETROVI? M, VUKOVI? N, MITI? M, et al. Integration of process planning and scheduling using chaotic particle swarm optimization algorithm[J]. Expert systems with applications, 2016, 64: 569–588.
[8] VAFADAR A, HAYWARD K, TOLOUEI-RAD M. Drilling reconfigurable machine tool selection and process parameters optimization as a function of product demand[J]. Journal of manufacturing systems, 2017, 45: 58–69.
[9] WU Xiuli, LI Jing. Two layered approaches integrating harmony search with genetic algorithm for the integrated process planning and scheduling problem[J]. Computers & industrial engineering, 2021, 155: 107194.
[10] MA G H, ZHANG Y F, NEE A Y C. A simulated annealing-based optimization algorithm for process planning[J]. International journal of production research, 2000, 38(12): 2671–2687.
[11] 施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021, 47(7): 1610–1623
SHI Wei, FENG Yanghe, CHENG Guangquan, et al. Research on multi-aircraft collaborative air combat method based on deep reinforcement learning[J]. Acta automatica sinica, 2021, 47(7): 1610–1623
[12] ZHOU Wenhong, LIU Zhihong, LI Jie, et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning[J]. Neurocomputing, 2021, 466: 285–297.
[13] 王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制[J]. 自动化学报, 2019, 45(12): 2366–2377
WANG Yunpeng, GUO Ge. Tram signal priority control based on deep reinforcement learning[J]. Acta automatica sinica, 2019, 45(12): 2366–2377
[14] GUO Ge, WANG Yunpeng. An integrated MPC and deep reinforcement learning approach to trams-priority active signal control[J]. Control engineering practice, 2021, 110(5): 104758.
[15] PENG Bile, KESKIN M F, Kulcsár B, et al. Connected autonomous vehicles for improving mixed traffic efficiency in unsignalized intersections with deep reinforcement learning[J]. Communications in transportation research, 2021, 1: 100017.
[16] 吴晓光, 刘绍维, 杨磊, 等. 基于深度强化学习的双足机器人斜坡步态控制方法[J]. 自动化学报, 2021, 47(8): 1976–1987
WU Xiaoguang LIU Shaowei, YANG Lei, et al. A slope gait control method for bipedal robots based on deep reinforcement learning[J]. Acta automatica sinica, 2021, 47(8): 1976–1987
[17] JIANG Rong, WANG Zhipeng, HE Bin, et al. A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill[J]. Neurocomputing, 2021, 462: 389–401.
[18] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529: 484–489.
[19] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575: 350–354.
[20] BERNER C, et al. Dota 2 with large scale deep reinforcement learning[EB/OL]. (2019?10?1)[2021?12?14].https://arxiv.org/abs/1912.06680.
[21] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL]. (2013?12?19) [2021?12?14]. https://arxiv.org/abs/1312.5602.
[22] LUO Shu, ZHANG Linxuan, FAN Yushun. Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning[J]. Computers & industrial engineering, 2021, 159: 107489.
[23] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. (2016?2?25) [2021?12?14]. https://arxiv.org/abs/1511.05952.
[24] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[EB/OL]. (2015?12?8)[2021?12?14]. https://arxiv.org/abs/1509.06461.
[25] LIU Xiaojun, YI Hong, NI Zhonghua. Application of ant colony optimization algorithm in process planning optimization[J]. Journal of intelligent manufacturing, 2013, 24(1): 1–13.

Similar References:

Memo

Last Update: 1900-01-01

Energy-saving process route discovery method based on deep reinforcement learning PDF DownloadHTML

Memo

Energy-saving process route discovery method based on deep reinforcement learning

PDF Download HTML