[1]伍锡如,沈可扬.基于人工势场的防疫机器人改进近端策略优化算法[J].智能系统学报,2025,20(3):689-698.[doi:10.11992/tis.202407026]
WU Xiru,SHEN Keyang.Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields[J].CAAI Transactions on Intelligent Systems,2025,20(3):689-698.[doi:10.11992/tis.202407026]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第3期
页码:
689-698
栏目:
学术论文—智能系统
出版日期:
2025-05-05
- Title:
-
Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields
- 作者:
-
伍锡如, 沈可扬
-
桂林电子科技大学 电子工程与自动化学院, 广西 桂林 541004
- Author(s):
-
WU Xiru, SHEN Keyang
-
College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China
-
- 关键词:
-
PPO算法; 人工势场; 路径规划; 防疫机器人; 深度强化学习; 动态环境; 安全性; 奖励函数
- Keywords:
-
PPO algorithm; artificial potential field; path planning; epidemic prevention robot; deep reinforcement learning; dynamic environment; safety; reward function
- 分类号:
-
TP183; TP391.41
- DOI:
-
10.11992/tis.202407026
- 摘要:
-
针对防疫机器人在复杂医疗环境中的路径规划与避障效果差、学习效率低的问题,提出一种基于人工势场的改进近端策略优化(proximal policy optimization, PPO)路径规划算法。根据人工势场法(artificial potential field, APF)构建障碍物和目标节点的势场,定义防疫机器人的动作空间与安全运动范围,解决防疫机器人运作中避障效率低的问题。为解决传统PPO算法的奖励稀疏问题,将人工势场因子引入PPO算法的奖励函数,提升算法运行中的奖励反馈效率。改进PPO算法网络模型,增加隐藏层和Previous Actor网络,提高了防疫机器人的灵活性与学习感知能力。最后,在静态和动态仿真环境中对算法进行对比实验,结果表明本算法能更快到达奖励峰值,减少冗余路径,有效完成避障和路径规划决策。
- Abstract:
-
This paper presents an improved proximal policy optimization (PPO) path planning algorithm based on artificial potential fields (APFs) to address poor path planning, obstacle avoidance effectiveness, and low learning efficiency of epidemic prevention robots in complex medical environments. The potential fields of obstacles and target nodes are constructed using the APF method, defining the action space and safe motion range for epidemic prevention robots to resolve the low obstacle avoidance efficiency during operation. To tackle the sparse reward issue in traditional PPO algorithms, APF factors are incorporated into the reward function of the PPO algorithm to enhance the feedback efficiency of reward mechanisms during algorithm execution. The network model of the PPO algorithm is improved by adding hidden layers and a previous actor network, thereby enhancing the flexibility and learning perception capabilities of epidemic prevention robots. Finally, comparative experiments conducted in static and dynamic simulation environments demonstrate that the proposed algorithm achieves faster attainment of reward peaks, reduces redundant path segments, and effectively completes obstacle avoidance and path planning decisions.
备注/Memo
收稿日期:2024-7-24。
基金项目:国家自然科学基金项目(62263005);广西自然科学基金重点项目(2020GXNSFDA238029);广西高校人工智能与信息处理重点实验室开放基金重点项目(2022GXZDSY004);桂林电子科技大学研究生教育创新计划项目(2024YCXS119,2024YCXS131).
作者简介:伍锡如,教授,博士生导师,主要研究方向为深度学习、复杂网络、路径规划。主持国家自然科学基金项目1项、广西壮族自治区自然科学基金项目1项、广西高校人工智能与信息处理重点实验室开放基金重点项目1项。获发明专利授权6项,发表学术论文50余篇,出版专著1部。E-mail:xiruwu520@163.com。;沈可扬,硕士研究生,主要研究方向为路径规划。E-mail:1341391239@qq.com。
通讯作者:伍锡如. E-mail:xiruwu520@163.com
更新日期/Last Update:
1900-01-01