[1]欧阳勇平,魏长赟,蔡帛良.动态环境下分布式异构多机器人避障方法研究[J].智能系统学报,2022,17(4):752-763.[doi:10.11992/tis.202106044]
OUYANG Yongping,WEI Changyun,CAI Boliang.Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J].CAAI Transactions on Intelligent Systems,2022,17(4):752-763.[doi:10.11992/tis.202106044]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
17
期数:
2022年第4期
页码:
752-763
栏目:
学术论文—智能系统
出版日期:
2022-07-05
- Title:
-
Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments
- 作者:
-
欧阳勇平1, 魏长赟1, 蔡帛良1,2
-
1. 河海大学 机电工程学院,江苏 常州 213022;
2. 英国卡迪夫大学 工学院,威尔士 卡迪夫 CF10 3A
- Author(s):
-
OUYANG Yongping1, WEI Changyun1, CAI Boliang1,2
-
1. College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China;
2. School of Engineering, Cardiff University, Cardiff CF10 3AT, UK
-
- 关键词:
-
异构多机器人; 深度强化学习; 非结构环境; 多特征策略梯度; 动态避障; 自学习; 分布式控制; 控制策略
- Keywords:
-
heterogeneous multi-robot systems; deep reinforcement learning; non-structural environment; multi-feature policy gradients; dynamic collision avoidance; self-learning; distributed control; control policy
- 分类号:
-
TP273+.2
- DOI:
-
10.11992/tis.202106044
- 摘要:
-
多机器人系统在联合搜救、智慧车间、智能交通等领域得到了日益广泛的应用。目前,多个机器人之间、机器人与动态环境之间的路径规划和导航避障仍需依赖精确的环境地图,给多机器人系统在非结构环境下的协调与协作带来了挑战。针对上述问题,本文提出了不依赖精确地图的分布式异构多机器人导航避障方法,建立了基于深度强化学习的多特征策略梯度优化算法,并考虑了人机协同环境下的社会范式,使分布式机器人能够通过与环境的试错交互,学习最优的导航避障策略;并在Gazebo仿真环境下进行了最优策略的训练学习,同时将模型移植到多个异构实体机器人上,将机器人控制信号解码,进行真实环境测试。实验结果表明:本文提出的多特征策略梯度优化算法能够通过自学习获得最优的导航避障策略,为分布式异构多机器人在动态环境下的应用提供了一种技术参考。
- Abstract:
-
Multirobot systems have been widely used in cooperative search and rescue missions, intelligent warehouses, intelligent transportation, and other fields. At present, the path planning and collision avoidance problems between multiple robots and the dynamic environment still rely on accurate maps, which brings challenges to the coordination and cooperation of multirobot systems in unstructured environments. To address the above problem, this paper presents a navigation and collision avoidance approach that does not require accurate maps and is based on the deep reinforcement learning framework. A multifeatured policy gradients algorithm is proposed in this work, and social norms are also integrated so that the learning agent can obtain the optimal control policy via trial-and-error interactions with the environment. The optimal policy is trained and obtained in the Gazebo environment, and afterward, the optimal policy is transferred to several heterogeneous real robots by decoding the control signals. The experimental results show that the multifeature policy gradients algorithm proposed can obtain the optimal navigation collision avoidance policy through self-learning, and it provides a technical reference for the application of distributed heterogeneous multirobot systems in dynamic environments.
更新日期/Last Update:
1900-01-01