<-上一篇/Previous Article 下一篇/Next Article->

[1]伍锡如,沈可扬.基于人工势场的防疫机器人改进近端策略优化算法[J].智能系统学报,2025,20(3):689-698.[doi:10.11992/tis.202407026]
　WU Xiru,SHEN Keyang.Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields[J].CAAI Transactions on Intelligent Systems,2025,20(3):689-698.[doi:10.11992/tis.202407026]

点击复制

基于人工势场的防疫机器人改进近端策略优化算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第3期页码: 689-698 栏目: 学术论文—智能系统出版日期: 2025-05-05

Title:: Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields

作者:: 伍锡如, 沈可扬; 桂林电子科技大学电子工程与自动化学院, 广西桂林 541004

Author(s):: WU Xiru, SHEN Keyang; College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China

关键词:: PPO算法; 人工势场; 路径规划; 防疫机器人; 深度强化学习; 动态环境; 安全性; 奖励函数

Keywords:: PPO algorithm; artificial potential field; path planning; epidemic prevention robot; deep reinforcement learning; dynamic environment; safety; reward function

分类号:: TP183; TP391.41

DOI:: 10.11992/tis.202407026

摘要:: 针对防疫机器人在复杂医疗环境中的路径规划与避障效果差、学习效率低的问题，提出一种基于人工势场的改进近端策略优化(proximal policy optimization, PPO)路径规划算法。根据人工势场法(artificial potential field, APF)构建障碍物和目标节点的势场，定义防疫机器人的动作空间与安全运动范围，解决防疫机器人运作中避障效率低的问题。为解决传统PPO算法的奖励稀疏问题，将人工势场因子引入PPO算法的奖励函数，提升算法运行中的奖励反馈效率。改进PPO算法网络模型，增加隐藏层和Previous Actor网络，提高了防疫机器人的灵活性与学习感知能力。最后，在静态和动态仿真环境中对算法进行对比实验，结果表明本算法能更快到达奖励峰值，减少冗余路径，有效完成避障和路径规划决策。

Abstract:: This paper presents an improved proximal policy optimization (PPO) path planning algorithm based on artificial potential fields (APFs) to address poor path planning, obstacle avoidance effectiveness, and low learning efficiency of epidemic prevention robots in complex medical environments. The potential fields of obstacles and target nodes are constructed using the APF method, defining the action space and safe motion range for epidemic prevention robots to resolve the low obstacle avoidance efficiency during operation. To tackle the sparse reward issue in traditional PPO algorithms, APF factors are incorporated into the reward function of the PPO algorithm to enhance the feedback efficiency of reward mechanisms during algorithm execution. The network model of the PPO algorithm is improved by adding hidden layers and a previous actor network, thereby enhancing the flexibility and learning perception capabilities of epidemic prevention robots. Finally, comparative experiments conducted in static and dynamic simulation environments demonstrate that the proposed algorithm achieves faster attainment of reward peaks, reduces redundant path segments, and effectively completes obstacle avoidance and path planning decisions.

参考文献/References:: [1] 张帆, 谭跃刚. 生成式预训练模型机器人及其潜力与挑战[J]. 中国机械工程, 2024, 35(7): 1241-1252.
ZHANG Fan, TAN Yuegang. Generative pre-trained model robot: potential and challenges[J]. China mechanical engineering, 2024, 35(7): 1241-1252.
[2] 鞠庆, 刘飞飞, 李光昌, 等. 室内环境自主消毒防疫机器人系统设计[J]. 传感器与微系统, 2023, 42(12): 103-106.
JU Qing, LIU Feifei, LI Guangchang, et al. Design of autonomous disinfection and prevention robot system for indoor environment[J]. Sensors and microsystems, 2023, 42(12): 103-106.
[3] JULIAN R E, BERNARDINO C T, STEFANO D G, et al. An algorithm for dynamic obstacle avoidance applied to UAVs[J]. Robotics and autonomous systems, 2025, 186: 104907.
[4] 黄郑, 谢彧颖, 张欣, 等. 基于运动预测与改进APF的无人机路径规划方法[J]. 电子测量技术, 2023, 46(24): 103-111.
HUANG Zhen, XIE Yuying, ZHANG Xin, et al. UAV path planning method based on motion prediction and improved APF[J]. Electronic measurement technology, 2023, 46(24): 103-111.
[5] 鲜斌, 宋宁. 基于模型预测控制与改进人工势场法的多无人机路径规划[J]. 控制与决策, 2024, 39(7): 2133-2141.
XIAN Bin, SONG Ning. Path planning of multi-UAV based on model predictive control and improved artificial potential field method[J]. Control and decision, 2024, 39(7): 2133-2141.
[6] 邓冬冬, 许建民, 孟寒, 等. 基于蚁群算法与人工势场法融合的移动机器人路径规划[J]. 仪器仪表学报, 2025, 3(1): 1-16.
DENG Dongdong, XU Jianmin, MENG Han, et al. Path planning of mobile robot based on fusion of ant colony algorithm and artificial potential field method[J]. Chinese journal of scientific instrument, 2025, 3(1): 1-16.
[7] 孙传禹, 张雷, 辛山, 等. 结合APF和改进DDQN的动态环境机器人路径规划方法[J]. 小型微型计算机系统, 2023, 44(9): 1940-1946.
SUN Chuanyu, ZHANG Lei, XIN shan, et al. Combining APF and improved DDQN for robot path planning in dynamic environments[J]. Journal of Chinese computer systems, 2023, 44(9): 1940-1946.
[8] 张扬, 彭鹏菲, 曹杰. 基于改进APF算法的水面无人艇局部路径规划[J]. 兵器装备工程学报, 2023, 44(9): 42-48.
ZHANG Yang, PENG Pengfei, CAO Jie. Local path planning for surface unmanned craft based on improved APF algorithm[J]. Journal of ordnance equipment engineering, 2023, 44(9): 42-48.
[9] YANG Chaopeng, PAN Jiacai, WEI Kai, et al. A novel unmanned surface vehicle path-planning algorithm based on A* and artificial potential field in ocean currents[J]. Journal of marine science and engineering, 2024, 12(2): 285-310.
[10] YU Jiabin, WU Jiguang, XU Jiping, et al. A novel planning and tracking approach for mobile robotic arm in obstacle environment[J]. Machines, 2023, 12(1): 19-35.
[11] 朱少凯, 孟庆浩, 金晟, 等. 基于深度强化学习的室内视觉局部路径规划[J]. 智能系统学报, 2022, 17(5): 908-918.
ZHU Shaokai, MENG Qinghao, JIN Sheng, et al. Indoor visual local path planning based on deep reinforcement learning[J]. CAAI transactions on intelligent systems, 2022, 17(5): 908-918.
[12] 赵玉新, 杜登辉, 成小会, 等. 基于强化学习的海洋移动观测网络观测路径规划方法[J]. 智能系统学报, 2022, 17(1): 192-200.
ZHAO Yuxin, DU Denghui, CHENG Xiaohui, et al. Reinforcement learning based observation path planning method for marine mobile observation networks[J]. CAAI transactions on intelligent systems, 2022, 17(1): 192-200.
[13] SUN Aijing, SUN Chi, DU Jianbo, et al. Optimizing energy efficiency in UAV-Assisted wireless sensor networks with reinforcement learning PPO2 algorithm[J]. IEEE sensors journal, 2023, 23(23): 29705-29721.
[14] CAI Peide, WANG Heng, HUANG Huaiyang, et al. Vision-based autonomous car racing using deep imitative reinforcement learning[J]. IEEE robot, 2021, 6(4): 7262-7269.
[15] GU Zhixin, JIA Keyi, XU Kaihong. Three-dimensional path planning method of agent based on fluid disturbance algorithm and PPO[J]. IAENG international journal of computer science, 2025, 52(2): 365-373.
[16] ZHU Zeyu, ZHAO Huijing. A survey of deep RL and IL for autonomous driving policy learning[J]. IEEE transactions on intelligent transportation systems, 2022, 23(9): 14043-14065.
[17] 沈骁, 赵彤洲. 基于DDQN的无人机区域覆盖路径规划策略[J]. 电子测量技术, 2023, 46(14): 30-36.
SHEN Xiao, ZHAO Tongzhou. DDQN-based path planning strategy for UAV area coverage[J]. Electronic measurement technology, 2023, 46(14): 30-36.
[18] XING Bowen, WANG Xiao, LIU Zhenchong. The wide-area coverage path planning strategy for deep-sea mining vehicle cluster based on deep reinforcement learning[J]. Journal of marine science and engineering, 2024, 12(2): 316-332.
[19] GUAN Yang, REN Yangang, SUN Qi, et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. 2020, IEEE transactions on vehicular technology, 69(11), 12597–12608.
[20] GUO Hongda, XU Youchun, MA Yulin, et al. Pursuit path planning for multiple unmanned ground vehicles based on deep reinforcement learning[J]. Electronics, 2023, 12(23): 4759-4778.
[21] HUANG Xiangxiang, WANG Wei, JI Zhaokang, et al. Representation enhancement-based proximal policy optimization for UAV path planning and obstacle avoidance[J]. International journal of aerospace engineering, 2023, 2023: 1-15.
[22] GUAN Wei, CUI Zhewen, ZHANG Xianku. Intelligent smart marine autonomous surface ship decision system based on improved PPO algorithm[J]. Sensors, 2022, 22(15): 5732-5765.
[23] LIU Jinyuan FU Minglei, LIU Andong, et al. A homotopy invariant based on convex dissection topology and a distance optimal path planning algorithm[J]. IEEE robotics and automation letters, 2023, 8(11): 7695-7702.
[24] 邓修朋, 崔建明, 李敏, 等. 深度强化学习在机器人路径规划中的应用[J]. 电子测量技术, 2023, 46(6): 1-8.
DENG Xiuming, CUI Jianming, LI Min, et al. Deep reinforcement learning in robot path planning[J]. Electronic measurement technology, 2023, 46(6): 1-8.
[25] WU Haixiao, ZHANG Yong, HUANG Linxiong, et al. Research on vehicle obstacle avoidance path planning based on APF-PSO[J]. Proceedings of the institution of mechanical engineers, 2023, 237(6): 1391-1405.
[26] YAN Xun, JIANG Dapeng, MIAO Runlong, et al. Formation control and obstacle avoidance algorithm of a multi-USV system based on virtual structure and artificial potential field[J]. Journal of marine science and engineering, 2021, 9(2): 1-17.
[27] XU Haotian, YAN Zheng, XUAN Junyu, et al. Improving proximal policy optimization with alpha divergence[J]. Neuro computing, 2023, 534(C): 94-105.
[28] QIN Yunhui, ZHANG Zhongshan, LI Xulong, et al. Deep reinforcement learning based resource allocation and trajectory planning in integrated sensing and communications UAV network[J]. IEEE transactions on wireless communications, 2023, 22(11): 8158-8169.
[29] AN Haonan, WANG Lin. Robust topology generation of internet of things based on PPO algorithm using discrete action space[J]. IEEE transactions on industrial informatics., 2023, 20(4): 5406-5414.
[30] XU Yahao, WEI Yiran, WANG Di, et al. Multi-UAV path planning in GPS and communication denial environment[J]. Sensors (Basel, Switzerland), 2023, 23(6): 2997-3012.

相似文献/References:: [1]黄彦文,曹其新.RoboCup比赛环境下足球机器人路径规划研究[J].智能系统学报,2007,2(4):52.
　HUANG Yan-wen,CAO Qin-xin.Path planning for robot soccer in the RoboCup environment[J].CAAI Transactions on Intelligent Systems,2007,2():52.
[2]黄维,田彦涛,樊泽华,等.自主车辆安全系数估计与T形路口避碰规划[J].智能系统学报,2013,8(5):408.[doi:10.3969/j.issn.1673-4785.201301021]
　HUANG Wei,TIAN Yantao,FAN Zehua,et al.Estimation of the safety coefficient of autonomous vehicles and collision avoidance planning at the T-shape intersection[J].CAAI Transactions on Intelligent Systems,2013,8():408.[doi:10.3969/j.issn.1673-4785.201301021]
[3]贺超,刘华平,孙富春,等.采用Kinect的移动机器人目标跟踪与避障[J].智能系统学报,2013,8(5):426.[doi:10.3969/j.issn.1673-4785.201301028]
　HE Chao,LIU Huaping,SUN Fuchun,et al.Target tracking and obstacle avoidance of mobile robot using Kinect[J].CAAI Transactions on Intelligent Systems,2013,8():426.[doi:10.3969/j.issn.1673-4785.201301028]
[4]王奎民,赵玉飞,侯恕萍,等.一种改进人工势场的UUV动碍航物规避方法[J].智能系统学报,2014,9(1):47.[doi:10.3969/j.issn.1673-4785.201309038]
　WANG Kuimin,ZHAO Yufei,HOU Shuping,et al.Dynamic obstacle avoidance for unmanned underwater vehiclebased on an improved artificial potential field[J].CAAI Transactions on Intelligent Systems,2014,9():47.[doi:10.3969/j.issn.1673-4785.201309038]
[5]王文彬,秦小林,张力戈,等.基于滚动时域的无人机动态航迹规划[J].智能系统学报,2018,13(4):524.[doi:10.11992/tis.201708031]
　WANG Wenbin,QIN Xiaolin,ZHANG Lige,et al.Dynamic UAV trajectory planning based on receding horizon[J].CAAI Transactions on Intelligent Systems,2018,13():524.[doi:10.11992/tis.201708031]

备注/Memo

收稿日期:2024-7-24。
基金项目:国家自然科学基金项目（62263005）；广西自然科学基金重点项目（2020GXNSFDA238029）；广西高校人工智能与信息处理重点实验室开放基金重点项目（2022GXZDSY004）；桂林电子科技大学研究生教育创新计划项目（2024YCXS119，2024YCXS131).
作者简介:伍锡如，教授，博士生导师，主要研究方向为深度学习、复杂网络、路径规划。主持国家自然科学基金项目1项、广西壮族自治区自然科学基金项目1项、广西高校人工智能与信息处理重点实验室开放基金重点项目1项。获发明专利授权6项，发表学术论文50余篇，出版专著1部。E-mail：xiruwu520@163.com。;沈可扬，硕士研究生，主要研究方向为路径规划。E-mail：1341391239@qq.com。
通讯作者:伍锡如. E-mail：xiruwu520@163.com

更新日期/Last Update: 1900-01-01

基于人工势场的防疫机器人改进近端策略优化算法 PDF下载HTML

备注/Memo

基于人工势场的防疫机器人改进近端策略优化算法

PDF下载 HTML