<-上一篇/Previous Article 下一篇/Next Article->

[1]王国磊,钟诗胜,林琳.面向多机动态调度问题的两层Q学习算法[J].智能系统学报,2009,4(3):239-244.
　WANG Guo-lei,ZHONG Shi-sheng,LIN Lin.Bilevel Qlearning algorithm for dynamic multimachinescheduling problems[J].CAAI Transactions on Intelligent Systems,2009,4(3):239-244.

点击复制

面向多机动态调度问题的两层Q学习算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 4 期数: 2009年第3期页码: 239-244 栏目: 学术论文—智能系统出版日期: 2009-06-25

Title:: Bilevel Qlearning algorithm for dynamic multimachinescheduling problems

文章编号:: 1673-4785(2009)03-0239-06

作者:: 王国磊，钟诗胜，林琳; 哈尔滨工业大学机电工程学院，黑龙江哈尔滨 150001

Author(s):: WANG Guo-lei, ZHONG Shi-sheng, LIN Lin; School of Mechanical Engineering, Harbin Institute of Technology, Harbin 150001, China

关键词:: 动态多机调度; Q学习; 动作集; 状态空间划分; 奖惩函数

Keywords:: dynamic multimachine scheduling; Qlearning; action set; state space division; reward function

分类号:: TP273

文献标志码:: A

摘要:: 对于单机动态调度问题十分有效的Q学习，在多机动态调度环境下却由于缺乏全局眼光而效果欠佳，因此提出了一种双层Q学习算法.底层Q学习着眼于局部，以最小化设备空闲和作业平均流经时间为目标，学习单机调度策略；而顶层Q学习则着眼于全局，以平衡机器负载、最小化整体拖期值为目标，学习如何分配作业到合适机器.文中分别给出了两层Q学习的动作集、状态空间划分方式和奖惩函数设计，并通过对多机动态调度问题的仿真实验表明，提出的双层Q学习能够很好地解决改善动态环境下多机调度问题.

Abstract:: Traditional Qlearning is very effective in dynamic singlemachine scheduling problems, yet sometimes it cannot get optimal results for dynamic multimachine scheduling problems due to its lack of global vision. To resolve this, a twolayer Qlearning algorithm was put forward. The bottomlevel of Qlearning was focused on localized targets in order to learn the optimal scheduling policy which can minimize machine idleness and the mean flow time of single machines. On the other hand, the toplevel of Qlearning was focused on global targets in order to find the dispatching policy which can balance machine loads and minimize the overall tardiness of all jobs. The scheduling and dispatching rules of agents, the method for dividing state space and the reward functions were all examined. Simulation results showed that the proposed twolayer Qlearning algorithm can improve the results of dynamic multimachine scheduling problems.

参考文献/References:: ［1］严浙平, 李锋, 黄宇峰. 多智能体Q学习在多AUV协调中的应用研究［J］. 应用科技, 2008, 35(1): 5760.
YAN Zheping, LI Feng, HUANG Yufeng. Research on application of multiagent Qlearnling in multiAUV coordination［J］. Applied Science and Technology, 2008, 35(1): 5760.
［2］潘燕春, 冯允成, 周泓,等. 强化学习和仿真相结合的车间作业排序系统［J］. 控制与决策, 2007, 22(6): 675679.
PAN Yanchun, FENG Yuncheng, ZHOU Hong, et al. Reinforcement learning integrated with simulation for jobshop scheduling system［J］. Control and Decision, 2007, 22(6): 675679.
［3］AYDIN M E,〖AKO¨〗ZTEMEL E. Dynamic jobshop scheduling using reinforcement learning agents［J］. Robotics and Autonomous Systems, 2000, 33(2/3): 169178.
［4］WANG Y C, USHER J M. Application of reinforcement learning for agentbased production scheduling［J］. Engineering Applications of Artificial Intelligence, 2005, 18(1): 7382.
［5］WANG Y C, USHER J M. Learning policies for single machine job dispatching［J］. Robotics and Computer Integrated Manufacturing, 2004, 20(6): 553562.
［6］魏英姿,赵明扬. 强化学习算法中启发式回报函数的设计及其收敛性分析［J］. 计算机科学, 2005, 32(3):190193.
WEI Yingzi, ZHAO Mingyang. Design and convergence analysis of a heuristic reward function for reinforcement learning algorithms［J］. Computer Science, 2005, 32(3): 190193.
［7］王世进,孙晟,周炳海,等. 基于Q学习的动态单机调度［J］. 上海交通大学学报, 2007, 41(8): 12271232.
WANG Shijin, SUN Sheng, ZHOU Binghai, et al. Qlearning based dynamic single machine scheduling［J］. Journal of Shanghai Jiaotong University, 2007, 41(8):12271232.
［8］杨宏兵,严洪森. 知识化制造系统中动态调度的自适应策略研究［J］. 控制与决策, 2007, 22(12): 13351340.
YANG Hongbing, YAN Hongsen. Adaptive strategy of dynamic scheduling in knowledgeable manufacturing system［J］. Control and Decision, 2007, 22(12): 13351340.
［9］WATKINS C, DAYAN P. Technical note: Qlearning［J］. Machine Learning, 1992, 8(3/4): 279292.

相似文献/References:: [1]赵玉新,杜登辉,成小会,等.基于强化学习的海洋移动观测网络观测路径规划方法[J].智能系统学报,2022,17(1):192.[doi:10.11992/tis.202106004]
　ZHAO Yuxin,DU Denghui,CHENG Xiaohui,et al.Path planning for mobile ocean observation network based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17():192.[doi:10.11992/tis.202106004]

备注/Memo

收稿日期：2008-10-03.
基金项目：国家“863”计划资助项目(2008AA04Z401).
通信作者：王国磊. E-mail: Wanggl_hit@163.com.
作者简介：王国磊，男，1982年生，博士研究生.主要研究方向为生产计划和车间调度等，发表学术论文10余篇. 
钟诗胜，男，1964年生，教授，博士生导师.哈尔滨工业大学威海分校副校长、中国机械工程学会机械设计分会理事、中国人工智能学会可拓学专业委员会常务理事、中国工程图学学会应用图学专业委员会委员、全国工业自动化系统与集成标准化技术委员会委员、国防科工委信息技术应用标准化技术委员会委员.主要研究方向为数字化设计与制造、人工智能理论与应用、数控设备研发等.国家863/CIMS重大应用示范工程项目——“HEC-CIMS II工程”的副总设计师，主持国家自然科学基金项目2项、国家863计划项目2项，参与国家863计划项目1项、国家自然科学基金项目1项，承担欧盟科技计划项目（英国、中国、西班牙联合承担）1项，多项省(部)级科技项目和企业横向项目.曾获省部级科技进步二等奖1项、三等奖2项，专利1个和国家自主版权登记软件3套，被评为黑龙江省CIMS应用示范先进个人.发表学术论文140余篇，出版专著1部. 
林琳，女，1973年生，副教授，硕士生导师.主要研究方向为智能设计和产品数据管理等.发表学术论文20余篇.

更新日期/Last Update: 2009-08-31

面向多机动态调度问题的两层Q学习算法 PDF下载HTML

备注/Memo

面向多机动态调度问题的两层Q学习算法

PDF下载 HTML