[1]WU Xiru,SHEN Keyang.Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields[J].CAAI Transactions on Intelligent Systems,2025,20(3):689-698.[doi:10.11992/tis.202407026]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 3
Page number:
689-698
Column:
学术论文—智能系统
Public date:
2025-05-05
- Title:
-
Improved proximal policy optimization algorithm for epidemic prevention robots based on artificial potential fields
- Author(s):
-
WU Xiru; SHEN Keyang
-
College of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China
-
- Keywords:
-
PPO algorithm; artificial potential field; path planning; epidemic prevention robot; deep reinforcement learning; dynamic environment; safety; reward function
- CLC:
-
TP183; TP391.41
- DOI:
-
10.11992/tis.202407026
- Abstract:
-
This paper presents an improved proximal policy optimization (PPO) path planning algorithm based on artificial potential fields (APFs) to address poor path planning, obstacle avoidance effectiveness, and low learning efficiency of epidemic prevention robots in complex medical environments. The potential fields of obstacles and target nodes are constructed using the APF method, defining the action space and safe motion range for epidemic prevention robots to resolve the low obstacle avoidance efficiency during operation. To tackle the sparse reward issue in traditional PPO algorithms, APF factors are incorporated into the reward function of the PPO algorithm to enhance the feedback efficiency of reward mechanisms during algorithm execution. The network model of the PPO algorithm is improved by adding hidden layers and a previous actor network, thereby enhancing the flexibility and learning perception capabilities of epidemic prevention robots. Finally, comparative experiments conducted in static and dynamic simulation environments demonstrate that the proposed algorithm achieves faster attainment of reward peaks, reduces redundant path segments, and effectively completes obstacle avoidance and path planning decisions.