[1]周娴玮,王宇翔,罗仕鑫,等.基于自适应分位数的离线强化学习算法[J].智能系统学报,2025,20(5):1093-1102.[doi:10.11992/tis.202410016]
 ZHOU Xianwei,WANG Yuxiang,LUO Shixin,et al.Offline reinforcement learning with adaptive quantile[J].CAAI Transactions on Intelligent Systems,2025,20(5):1093-1102.[doi:10.11992/tis.202410016]
点击复制

基于自适应分位数的离线强化学习算法

参考文献/References:
[1] SINGLA A, RAFFERTY A N, RADANOVIC G, et al. Reinforcement learning for education: opportunities and challenges[EB/OL]. (2021-07-15)[2024-10-12]. https://arxiv.org/abs/2107.08828v1.
[2] LIU Siqi, SEE K C, NGIAM K Y, et al. Reinforcement learning for clinical decision support in critical care: comprehensive review[J]. Journal of medical Internet research, 2020, 22(7): e18477.
[3] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354.
[4] 李霞丽, 王昭琦, 刘博, 等. 麻将博弈AI构建方法综述[J]. 智能系统学报, 2023, 18(6): 1143-1155.
LI Xiali, WANG Zhaoqi, LIU Bo, et al. Survey of Mahjong game AI construction methods[J]. CAAI transactions on intelligent systems, 2023, 18(6): 1143-1155.
[5] 朱少凯, 孟庆浩, 金晟, 等. 基于深度强化学习的室内视觉局部路径规划[J]. 智能系统学报, 2022, 17(5): 908-918.
ZHU Shaokai, MENG Qinghao, JIN Sheng, et al. Indoor visual local path planning based on deep reinforcement learning[J]. CAAI transactions on intelligent systems, 2022, 17(5): 908-918.
[6] 赵玉新, 杜登辉, 成小会, 等. 基于强化学习的海洋移动观测网络观测路径规划方法[J]. 智能系统学报, 2022, 17(1): 192-200.
ZHAO Yuxin, DU Denghui, CHENG Xiaohui, et al. Path planning for mobile ocean observation network based on reinforcement learning[J]. CAAI transactions on intelligent systems, 2022, 17(1): 192-200.
[7] 张晓明, 高士杰, 姚昌瑀, 等. 强化学习及其在机器人任务规划中的进展与分析[J]. 模式识别与人工智能, 2023, 36(10): 902-917.
ZHANG Xiaoming, GAO Shijie, YAO Changyu, et al. Reinforcement learning and its application in robot task planning: a survey[J]. Pattern recognition and artificial intelligence, 2023, 36(10): 902-917.
[8] 郭宪, 方勇纯. 仿生机器人运动步态控制: 强化学习方法综述[J]. 智能系统学报, 2020, 15(1): 152-159.
GUO Xian, FANG Yongchun. Locomotion gait control for bionic robots: a review of reinforcement learning methods[J]. CAAI transactions on intelligent systems, 2020, 15(1): 152-159.
[9] 乌兰, 刘全, 黄志刚, 等. 离线强化学习研究综述[J]. 计算机学报, 2025, 48(1): 156-187.
WU Lan, LIU Quan, HUANG Zhigang, et al. A review of research on offline reinforcement learning[J]. Chinese journal of computers, 2025, 48(1): 156-187.
[10] LEVINE S, KUMAR A, TUCKER G, et al. Offline reinforcement learning: tutorial, review, and perspectives on open problems[EB/OL]. (2020-05-04)[2024-10-12]. https://arxiv.org/pdf/2005.01643.
[11] 陈锶奇, 耿婕, 汪云飞, 等. 基于离线强化学习的研究综述[J]. 无线电通信技术, 2024, 50(5): 831-842.
CHEN Siqi, GENG Jie, WANG Yunfei, et al. Survey of research on offline reinforcement learning[J]. Radio communications technology, 2024, 50(5): 831-842.
[12] FUJIMOTO S, MEGER D, Precup D. Off-policy deep reinforcement learning without exploration[C]//International Conference on Machine Learning. Los Angeles: PMLR, 2019: 2052-2062.
[13] WU Yifan, TUCKER G, NACHUM O. Behavior regularized offline reinforcement learning[EB/OL]. (2019-11-26)[2024-10-12]. https://arxiv.org/abs/1911.11361v1.
[14] KUMAR A, FU J, SOH M, et al. Stabilizing off-policy Q-learning via bootstrapping error reduction[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019: 11784-11794.
[15] WANG Z, NOVIKOV A, ZOLNA K, et al. Critic regularized regression[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020: 7768-7778.
[16] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]// International Conference on Machine Learning. Stockholm: PMLR, 2018: 1587-1596.
[17] FU J, KUMAR A, NACHUM O, et al. D4RL: datasets for deep data-driven reinforcement learning[EB/OL]. (2021-02-06)[2024-10-12]. https://arxiv.org/abs/2004.07219v4.
[18] FIGUEIREDO PRUDENCIO R, MAXIMO M R O A, COLOMBINI E L. A survey on offline reinforcement learning: taxonomy, review, and open problems[J]. IEEE transactions on neural networks and learning systems, 2024, 35(8): 10237-10257.
[19] FUJIMOTO S, GU S S. A minimalist approach to offline reinforcement learning[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 20132-20145.
[20] PENG Zhiyong, HAN Changlin, LIU Yadong, et al. Weighted policy constraints for offline reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington DC: AAAI, 2023: 9435-9443.
[21] CHEN Xinyue, ZHOU Zijian, WANG Zheng, et al. Bail: best-action imitation learning for batch deep reinforcement learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020: 18353-18363.
[22] SIEGEL N Y, SPRINGENBERG J T, BERKENKAMP F, et al. Keep doing what worked: behavioral modelling priors for offline reinforcement learning[C]//International Conference on Learning Representations. [S. l. ]: OpenReview.net, 2020: 1-21.
[23] ABDOLMALEKI A, SPRINGENBERG J T, TASSA Y, et al. Maximum a posteriori policy optimisation[C]//International Conference on Learning Representations. Vancouver: OpenReview.net, 2018: 1-23.
[24] BRANDFONBRENER D, WHITNEY W F, RANGANATH R, et al. Quantile filtered imitation learning[EB/OL]. (2021-12-02)[2024-10-12]. https://arxiv.org/abs/2112.00950v1.
[25] KOENKER R, HALLOCK K F. Quantile regression[J]. Journal of economic perspectives, 2001, 15(4): 143-156.
[26] AGARWAL R, SCHUURMANS D, NOROUZI M. An optimistic perspective on offline reinforcement learning[C]//International Conference on Machine Learning. [S. l. ]: PMLR, 2020: 104–114.
[27] KOSTRIKOV I, NAIR A, LEVINE S. Offline reinforcement learning with implicit Q-learning[EB/OL]. (2021-10-12)[2024-10-12]. https://arxiv.org/abs/2110.06169v1.
[28] KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020: 1179-1191.
[29] EMMONS S, EYSENBACH B, KOSTRIKOV I, et al. RvS: what is essential for offline RL via supervised learning? [EB/OL]. (2022-05-11)[2024-10-12]. https://arxiv.org/abs/2112.10751v2.
[30] CHEN Lili, LU K, RAJESWARAN A, et al. Decision transformer: reinforcement learning via sequence modeling[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2021: 15084-15097.

备注/Memo

收稿日期:2024-10-12。
基金项目:广东省应用型科技研发重大专项(2016B020244003);广东省企业科技特派员项目(GDKTP2020014000);广东省基础与应用基础研究基金项目(2020B1515120089,2020A1515110783).
作者简介:周娴玮,讲师,博士,主要研究方向为强化学习、机器人技术和多传感信息融合。E-mail:20871147@qq.com。;王宇翔,硕士研究生,主要研究方向为深度强化学习和离线强化学习。E-mail:2023024285@m.scnu.edu.cn。;余松森,教授,博士后,主要研究方向为智能感知与信息处理。主持国家自然科学基金面上项目1项、科技部星火计划面上项目2项、广东省基础与应用基础研究重点项目1项。参与制定广东省高端新型电子信息产业地方标准,获得发明专利授权53项,发表学术论文40余篇。E-mail:yss8109@163.com。
通讯作者:余松森. E-mail:yss8109@163.com

更新日期/Last Update: 2025-09-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com