<-上一篇/Previous Article 下一篇/Next Article->

[1]严家政,专祥涛.基于强化学习的参数自整定及优化算法[J].智能系统学报,2022,17(2):341-347.[doi:10.11992/tis.202012038]
　YAN Jiazheng,ZHUAN Xiangtao.Parameter self-tuning and optimization algorithm based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2022,17(2):341-347.[doi:10.11992/tis.202012038]

点击复制

基于强化学习的参数自整定及优化算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第2期页码: 341-347 栏目: 学术论文—智能系统出版日期: 2022-03-05

Title:: Parameter self-tuning and optimization algorithm based on reinforcement learning

作者:: 严家政¹, 专祥涛^1,2; 1. 武汉大学电气与自动化学院，湖北武汉 430072;
2. 武汉大学深圳研究院，广东深圳 518057

Author(s):: YAN Jiazheng¹, ZHUAN Xiangtao^1,2; 1. School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China;
2. Shenzhen Research Institute, Wuhan University, Shenzhen 518057, China

关键词:: 强化学习; 整定; 优化; 学习算法; 时滞; 控制器; 液位控制; 动态响应

Keywords:: reinforcement learning; tuning; optimization; learning algorithm; time delay; controller; level control; dynamic response

分类号:: TP273

DOI:: 10.11992/tis.202012038

摘要:: 传统PID控制算法在非线性时滞系统的应用中，存在参数整定及性能优化过程繁琐、控制效果不理想的问题。针对该问题，提出了一种基于强化学习的控制器参数自整定及优化算法。该算法引入系统动态性能指标计算奖励函数，通过学习周期性阶跃响应的经验数据，无需辨识被控对象模型的具体数据，即可实现控制器参数的在线自整定及优化。以水箱液位控制系统为实验对象，对不同类型的PID控制器使用该算法进行参数整定及优化的对比实验。实验结果表明，相比于传统的参数整定方法，所提出的算法能省去繁琐的人工调参过程，有效优化控制器参数，减少被控量的超调量，提升控制器动态响应性能。

Abstract:: To achieve better control performance in the nonlinear time-delay system, the traditional Proportional-Integral-Derivative (PID) control algorithm requires tuning and optimization, which complicates the controller design. First, we propose a new self-tuning and optimization algorithm for controller parameters based on reinforcement learning. Then, a reward function based on the system dynamic performance index is introduced by this algorithm. This function can learn the empirical data of periodic step response and realize the online optimization of controller parameters without identifying the model data of the controlled object. Finally, the algorithm is tested through experiments on a water tank level control system with different types of PID controllers. Experimental results show that, in contrast to the traditional parameter tuning method, the manual process is eliminated by the proposed algorithm, effectively optimizing the controller parameters, reducing the overshoot of the controlled quantity, and improving the dynamic response performance of the controller.

参考文献/References:: [1] 赵新华, 王璞, 陈晓红. 投球机器人模糊PID控制[J]. 智能系统学报, 2015, 10(3): 399–406
ZHAO Xinhua, WANG Pu, CHEN Xiaohong. Fuzzy PID control of pitching robots[J]. CAAI transactions on intelligent systems, 2015, 10(3): 399–406
[2] YANG Bo, YU Tao, SHU Hongchun, et al. Perturbation observer based fractional-order PID control of photovoltaics inverters for solar energy harvesting via Yin-Yang-Pair optimization[J]. Energy conversion and management, 2018, 171: 170–187.
[3] JAISWAL S, CHILUKA S K, SEEPANA M M, et al. Design of fractional order PID controller using genetic algorithm optimization technique for nonlinear system[J]. Chemical product and process modeling, 2020, 15(2): 20190072.
[4] 陈增强, 黄朝阳, 孙明玮, 等. 基于大变异遗传算法进行参数优化整定的负荷频率自抗扰控制[J]. 智能系统学报, 2020, 15(1): 41–49
CHEN Zengqiang, HUANG Zhaoyang, SUN Mingwei, et al. Active disturbance rejection control of load frequency based on big probability variation’s genetic algorithm for parameter optimization[J]. CAAI transactions on intelligent systems, 2020, 15(1): 41–49
[5] WEI Wei, CHEN Nan, ZHANG Zhiyuan, et al. U-model-based active disturbance rejection control for the dissolved oxygen in a wastewater treatment process[J]. Mathematical problems in engineering, 2020: 3507910.
[6] 胡越, 罗东阳, 花奎, 等. 关于深度学习的综述与讨论[J]. 智能系统学报, 2019, 14(1): 1–19
HU Yue, LUO Dongyang, HUA Kui, et al. Review and discussion on deep learning[J]. CAAI transactions on intelligent systems, 2019, 14(1): 1–19
[7] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484–489.
[8] 李超, 张智, 夏桂华, 等. 基于强化学习的学习变阻抗控制[J]. 哈尔滨工程大学学报, 2019, 40(2): 304–311
LI Chao, ZHANG Zhi, XIA Guihua, et al. Learning variable impedance control based on reinforcement learning[J]. Journal of Harbin Engineering University, 2019, 40(2): 304–311
[9] 王念滨, 何鸣, 王红滨, 等. 适用于水下目标识别的快速降维卷积模型[J]. 哈尔滨工程大学学报, 2019, 40(7): 1327–1333
WANG Nianbin, HE Ming, WANG Hongbin, et al. Fast dimensional-reduction convolution model for underwater target recognition[J]. Journal of Harbin Engineering University, 2019, 40(7): 1327–1333
[10] 黄立威, 江碧涛, 吕守业, 等. 基于深度学习的推荐系统研究综述[J]. 计算机学报, 2018, 41(7): 1619–1647
HUANG Liwei, JIANG Bitao, LYU Shouye, et al. A review of recommendation systems based on deep learning[J]. Chinese journal of computers, 2018, 41(7): 1619–1647
[11] GHEISARNEJAD M, KHOOBAN M H. An intelligent non-integer PID controller-based deep reinforcement learning: implementation and experimental results[J]. IEEE transactions on industrial electronics, 2021, 68(4): 3609–3618.
[12] BUSONIU L, DE BRUIN T, TOLI? D, et al. Reinforcement learning for control: performance, stability, and deep approximators[J]. Annual reviews in control, 2018, 46: 8–28.
[13] 袁兆麟, 何润姿, 姚超, 等. 基于强化学习的浓密机底流浓度在线控制算法[J]. 自动化学报, 2021, 47(7): 1558–1571
YUAN Zhaolin, HE Runzi, YAO Chao, et al. Online reinforcement learning control algorithm for concentration of thickener underflow[J]. Acta automatica sinica, 2021, 47(7): 1558–1571
[14] NIAN R, LIU J, HUANG B. A review on reinforcement learning: introduction and applications in industrial process control[J]. Computers and chemical engineering, 2020: 106886.
[15] PANG B, JIANG Z P, MAREELS I. Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems[J]. Automatica, 2020, 118: 109035.
[16] 殷昌盛, 杨若鹏, 朱巍, 等. 多智能体分层强化学习综述[J]. 智能系统学报, 2020, 15(4): 646–655
YIN Changsheng, YANG Ruopeng, ZHU Wei, et al. A survey on multi-agent hierarchical reinforcement learning[J]. CAAI transactions on intelligent systems, 2020, 15(4): 646–655
[17] 高瑞娟, 吴梅. 基于改进强化学习的PID参数整定原理及应用[J]. 现代电子技术, 2014, 37(4): 1–4
GAO Ruijuan, WU Mei. Principle and application of PID parameter tuning based on improved reinforcement learning[J]. Modern electronics technique, 2014, 37(4): 1–4
[18] ALDEMIR A, HAPO?LU H. Comparison of PID tuning methods for wireless temperature control[J]. Journal of polytechnic, 2016, 19(1): 9–19.
[19] 蔡聪仁, 向凤红. 基于遗传算法优化PID的板球系统位置控制[J]. 电子测量技术, 2019, 42(23): 97–101
CAI Congren, XIANG Fenghong. Position control of cricket system based on genetic algorithm optimized PID[J]. Electronic measurement technology, 2019, 42(23): 97–101
[20] 么洪飞, 王宏健, 王莹, 等. 基于遗传算法DDBN参数学习的UUV威胁评估[J]. 哈尔滨工程大学学报, 2018, 39(12): 1972–1978
YAO Hongfei, WANG Hongjian, WANG Ying, et al. UUV threat assessment based on genetic algorithm DDBN parameter learning[J]. Journal of Harbin Engineering University, 2018, 39(12): 1972–1978
[21] 胡勤丰, 陈威振, 邱攀峰, 等. 适用于连续加减速的永磁同步电机模糊增益自调整PI控制研究[J]. 中国电机工程学报, 2017, 37(3): 907–914
HU Qinfeng, CHEN Weizhen, QIU Panfeng, et al. Research on fuzzy self-tuning gain PI control for accelerating and decelerating based on permanent magnet synchronous motor[J]. Proceedings of the CSEE, 2017, 37(3): 907–914
[22] 叶政. PID控制器参数整定方法研究及其应用[D]. 北京: 北京邮电大学, 2016.
YE Zheng. Research on PID controller parameter tuning method and its application [D]. Beijing: Beijing University of Posts and Telecommunications, 2016.
[23] 刘志林, 李国胜, 张军. 有横摇约束的欠驱动船舶航迹跟踪预测控制[J]. 哈尔滨工程大学学报, 2019, 40(2): 312–317
LIU Zhilin, LI Guosheng, ZHANG Jun. Predictive control of underactuated ship track tracking with roll constraint[J]. Journal of Harbin Engineering University, 2019, 40(2): 312–317
[24] 朱芮, 吴迪, 陈继峰, 等. 电机系统模型预测控制研究综述[J]. 电机与控制应用, 2019, 46(8): 1–10,30
ZHU Rui, WU Di, CHEN Jifeng, et al. A review of model predictive control for motor systems[J]. Electric machines and control application, 2019, 46(8): 1–10,30
[25] PU Z, WANG Y, CHANG N, et al. A deep reinforcement learning framework for optimizing fuel economy of hybrid electric vehicles[C]//2018 23rd Asia and South Pacific Design Automation Conference. Jeju Island, Korea, 2018.
[26] 张法帅, 李宝安, 阮子涛. 基于深度强化学习的无人艇航行控制[J]. 计测技术, 2018, 38(A01): 5
ZHANG Fashuai, LI Baoan, RUAN Zitao. Navigation control of unmanned vehicle based on deep reinforcement learning[J]. Metrology and measurement technology, 2018, 38(A01): 5
[27] 唐振韬, 邵坤, 赵冬斌, 等. 深度强化学习进展: 从AlphaGo到AlphaGo Zero[J]. 控制理论与应用, 2017, 34(12): 18
TANG Zhentao, SHAO Kun, ZHAO Dongbin, et al. Progress in deep reinforcement learning: from AlphaGo to AlphaGo Zero[J]. Control theory and applications, 2017, 34(12): 18

相似文献/References:: [1]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
　LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[2]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
　LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[3]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
　ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[4]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[5]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202.[doi:10.11992/tis.201609017]
　ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13():202.[doi:10.11992/tis.201609017]
[6]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(1):93.[doi:10.11992/tis.201807010]
　XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14():93.[doi:10.11992/tis.201807010]
[7]郭宪,方勇纯.仿生机器人运动步态控制：强化学习方法综述[J].智能系统学报,2020,15(1):152.[doi:10.11992/tis.201907052]
　GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15():152.[doi:10.11992/tis.201907052]
[8]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317.[doi:10.11992/tis.201809033]
　SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():317.[doi:10.11992/tis.201809033]
[9]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]
[10]莫宏伟,田朋.基于注意力融合的图像描述生成方法[J].智能系统学报,2020,15(4):740.[doi:10.11992/tis.201910039]
　MO Hongwei,TIAN Peng.An image caption generation method based on attention fusion[J].CAAI Transactions on Intelligent Systems,2020,15():740.[doi:10.11992/tis.201910039]

备注/Memo

收稿日期:2020-12-23。
基金项目:深圳市知识创新计划项目(JCYJ20170818144449801)
作者简介:严家政，硕士研究生，主要研究方向为深度强化学习、最优控制;专祥涛，教授，博士生导师，IEEE会员，湖北省自动化学会常务理事，主要研究方向为载体运动过程建模与控制、新能源系统规划与运行、资源优化分配、智能控制与数据分析。发表学术论文30余篇
通讯作者:专祥涛.E-mail:xtzhuan@whu.edu.cn

更新日期/Last Update: 1900-01-01

基于强化学习的参数自整定及优化算法 PDF下载HTML

备注/Memo

基于强化学习的参数自整定及优化算法

PDF下载 HTML