<-上一篇/Previous Article 下一篇/Next Article->

[1]陆升阳,赵怀林,刘华平.场景图谱驱动目标搜索的多智能体强化学习[J].智能系统学报,2023,18(1):207-215.[doi:10.11992/tis.202111034]
　LU Shengyang,ZHAO Huailin,LIU Huaping.Multi-agent reinforcement learning for scene graph-driven target search[J].CAAI Transactions on Intelligent Systems,2023,18(1):207-215.[doi:10.11992/tis.202111034]

点击复制

场景图谱驱动目标搜索的多智能体强化学习

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第1期页码: 207-215 栏目: 吴文俊人工智能科学技术奖论坛出版日期: 2023-01-05

Title:: Multi-agent reinforcement learning for scene graph-driven target search

作者:: 陆升阳¹, 赵怀林¹, 刘华平²; 1. 上海应用技术大学电气与电子工程学院，上海 201418;
2. 清华大学计算机科学与技术系，北京 100084

Author(s):: LU Shengyang¹, ZHAO Huailin¹, LIU Huaping²; 1. School of electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai 201418, China;
2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

关键词:: 多智能体; 强化学习; 视觉语义导航; 场景图谱; 先验知识; 分布式探索; 集中式训练; 目标搜索

Keywords:: multi-agent; reinforcement learning; visual semantic navigation; scene graph; prior knowledge; distributed exploration; centralized training; target search

分类号:: TP391

DOI:: 10.11992/tis.202111034

摘要:: 针对强化学习在视觉语义导航任务中准确率低，导航效率不高，容错率太差，且部分只适用于单智能体等问题，提出一种基于场景先验的多智能体目标搜索算法。该算法利用强化学习，将单智能体系统拓展到多智能体系统上将场景图谱作为先验知识辅助智能体团队进行视觉探索，利用集中式训练分布式探索的多智能体强化学习的方法以大幅度提升智能体团队的准确率和工作效率。通过在AI2THOR中进行训练测试，并与其他算法进行对比证明此方法无论在目标搜索的准确率还是效率上都优先于其他算法。

Abstract:: To solve the problems of reinforcement learning in the visual semantic navigation task, such as low accuracy, low navigation efficiency, poor fault tolerance rate, and the suitability of only some problems for a single agent, we propose a multi-agent target search algorithm based on scene prior. This algorithm extends the single-agent system to a multi-agent system through reinforcement learning. It mainly includes two aspects: first, a scene atlas is used as prior knowledge to assist the agent team in visual exploration; second, the multi-agent reinforcement learning method of centralized training and distributed exploration is used to greatly improve the accuracy and work efficiency of the agent team. Training tests in AI2THOR and comparison with other algorithms prove that this method is superior to other algorithms in target search accuracy and efficiency.

参考文献/References:: [1] ANDERSON P, WU Qi, TENEY D, et al. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3674?3683.
[2] DAS A, DATTA S, GKIOXARI G, et al. Embodied question answering[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1?10.
[3] THOMASON J, MURRAY M, CAKMAK M, et al. Vision-and-dialog navigation[C]//Proceedings of the Conference on Robot Learning. Cambridge MA: JMLR, 2020, 100: 394?406.
[4] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Algarve: IEEE, 2012: 573?580.
[5] MIROWSKI P, PASCANU R, VIOLA F, et al. Learning to navigate in complex environments[EB/OL]. (2016?11-11)[ 2021-11-17].https://arxiv.org/abs/1611.03673.
[6] BABAEIZADEH M, FROSIO I, TYREE S, et al. GA3C: GPU-based A3C for deep reinforcement learning[C]//30th Conference on Neural Information Processing Systems. Barcelona, 2016: 1?6.
[7] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017?07?20)[2021?11?17].https://arxiv.org/abs/1707.06347.
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2015?09?09) [2021?11?17].https://arxiv.org/abs/1509.02971.
[9] ANSCHEL O, BARAM N, SHIMKIN N. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning[C]//International Conference on Machine Learning. Cambridge MA: JMLR, 2017: 176?185.
[10] WU Yi, WU Yuxin, TAMAR A, et al. Bayesian relational memory for semantic visual navigation[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2769?2779.
[11] WORTSMAN M, EHSANI K, RASTEGARI M, et al. Learning to learn how to learn: self-adaptive visual navigation using meta-learning[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6743?6752.
[12] 黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J/OL]. 计算机应用, 2022: 1?7. (2022?07?26).https://kns.cnki.net/kcms/detail/51.1307.TP.20220726.1030.002.html.
HUANG Xiaohui, YANG Kaiming, LING Jiahao. Order dispatch by multi-agent reinforcement learning based on shared attention[J/OL]. Journal of computer applications, 2022: 1?7. (2022?07?26).https://kns.cnki.net/kcms/detail/51.1307.TP.20220726.1030.002.html.
[13] DU Heming, YU Xin, ZHENG Liang. Learning object relation graph and tentative policy for visual navigation[M]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 19?34.
[14] CHEN Boyuan, SONG Shuran, LIPSON H, et al. Visual hide and seek[EB/OL]. (2019-10-15) [2021-11?17].https://arxiv.org/abs/1910.07882.
[15] JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859–865.
[16] 张文旭, 马磊, 贺荟霖, 等. 强化学习的地-空异构多智能体协作覆盖研究[J]. 智能系统学报, 2018, 13(2): 202–207
ZHANG Wenxu, MA Lei, HE Huilin, et al. Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J]. CAAI transactions on intelligent systems, 2018, 13(2): 202–207
[17] 连传强, 徐昕, 吴军, 等. 面向资源分配问题的Q-CF多智能体强化学习[J]. 智能系统学报, 2011, 6(2): 95–100
LIAN Chuanqiang, XU Xin, WU Jun, et al. Q-CF multi-Agent reinforcement learning for resource allocation problems[J]. CAAI transactions on intelligent systems, 2011, 6(2): 95–100
[18] 韩兆荣,钱宇华,刘郭庆.自注意力与强化学习耦合的多智能体通信[J/OL].小型微型计算机系统:1?8. (2022?05?13) [2022?07?31].DOI:10.20009/j.cnki.21-1106/TP.2021-0802.
HAN Zhaorong, QIAN Yuhua, LIU Guoqing. Multi-agent communication coupled with self-attention and reinforcement learning[J/OL]. Journal of Chinese Mini-Micro Computer Systems. 1?8. (2022?05?13) [2022?07?31]. DOI:10.20009/j.cnki.21-1106/TP.2021-0802.
[19] 方维维, 王云鹏, 张昊, 等. 基于多智能体深度强化学习的车联网通信资源分配优化[J]. 北京交通大学学报, 2022, 46(2): 64–72
FANG Weiwei, WANG Yunpeng, ZHANG Hao, et al. Optimized communication resource allocation in vehicular networks based on multi-agent deep reinforcement learning[J]. Journal of Beijing Jiaotong university, 2022, 46(2): 64–72
[20] KIM D, MOON S, HOSTALLERO D, et al. Learning to schedule communication in multi-agent reinforcement learning[EB/OL]. (2019?02?05) [2022?07?31].https://arxiv.org/abs/1902.01554.
[21] DAS A, GERVET T, ROMOFF J, et al. Tarmac: Targeted multi-agent communication[C]//International Conference on Machine Learning. Cambridge MA: JMLR, 2019: 1538?1546.
[22] DING ZILUO, HUANG TIEJUN, LU ZONGQING. Learning individually inferred communication for multi-agent cooperation[EB/OL]. (2020?06?11) [2022?07?31].https://arxiv.org/abs/2006.06455.
[23] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018, 32(1): 2974?2982.
[24] 陈新元, 谢晟祎, 陈庆强, 等. 结合卷积特征提取和路径语义的知识推理[J]. 智能系统学报, 2021, 16(4): 729–738
CHEN Xinyuan, XIE Shengyi, CHEN Qingqiang, et al. Knowledge-based inference on convolutional feature extraction and path semantics[J]. CAAI transactions on intelligent systems, 2021, 16(4): 729–738
[25] YANG WEI, WANG XIAOLONG, FARHADI A, et al. Visual semantic navigation using scene priors[EB/OL]. (2018?10?15) [2022?07?31].https://arxiv.org/abs/1810.06543.
[26] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J/OL]. 控制与决策, 2022: 1?20. (2022?06?14).https://kns.cnki.net/kcms/detail/21.1124.TP.20220613.1041.023.html.
YAN Chao, XIANG Xiaojia, XU Xin, et al. A survey on the scalability and transferability of multi-agent deep reinforcement learning[J/OL]. Control and decision, 2022: 1?20. (2022?06?14).https://kns.cnki.net/kcms/detail/21.1124.TP.20220613.1041.023.html.
[27] QIU YIDING, PAL A, CHRISTENSEN H I. Learning hierarchical relationships for object-goal navigation[EB/OL]. (2020?03?15) [2022?07?31].https://arxiv.org/abs/2003.06749.
[28] CHAPLOT D S, GANDHI D P, GUPTA A, et al. Object goal navigation using goal-oriented semantic exploration[J]. Advances in Neural Information Processing Systems, 2020: 33.
[29] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980?2988.
[30] KOLVE E, MOTTAGHI R, HAN W, et al. AI2-THOR: an interactive 3D environment for visual AI[EB/OL]. (2017?12?14) [2021?11?17].https://arxiv.org/abs/1712.05474.
[31] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2016?09?09) [2021?11?17].https://arxiv.org/abs/1609.02907.
[32] GORDON D, KEMBHAVI A, RASTEGARI M, et al. IQA: visual question answering in interactive environments[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4089?4098.
[33] YU CHAO, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. (2021?03?02) [2021?11?17].https://arxiv.org/abs/2103.01955.

相似文献/References:: [1]谭树彬,刘建昌.Multi-Agent的连续轧制过程控制系统研究[J].智能系统学报,2008,3(2):150.
　TAN Shu-bin,LIU Jian-chang.Research On multiAgent based control system for continuous rolling process[J].CAAI Transactions on Intelligent Systems,2008,3():150.
[2]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
　LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[3]雷明,周超,周绍磊,等.考虑时变时滞的多移动智能体分布式编队控制[J].智能系统学报,2012,7(6):536.
　LEI Ming,ZHOU Chao,ZHOU Shaolei,et al.Decentralized formation control of multiple mobile agents considering timevarying delay[J].CAAI Transactions on Intelligent Systems,2012,7():536.
[4]郭文强,高晓光,侯勇严,等.采用MSBN多智能体协同推理的智能农业车辆环境识别[J].智能系统学报,2013,8(5):453.[doi:10.3969/j.issn.1673-4785.201210057]
　GUO Wenqiang,GAO Xiaoguang,HOU Yongyan,et al.Environment recognition of intelligent agricultural vehicles based on MSBN and multi-agent coordinative inference[J].CAAI Transactions on Intelligent Systems,2013,8():453.[doi:10.3969/j.issn.1673-4785.201210057]
[5]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
　LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[6]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
　ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[7]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202.[doi:10.11992/tis.201609017]
　ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13():202.[doi:10.11992/tis.201609017]
[8]郭宪,方勇纯.仿生机器人运动步态控制：强化学习方法综述[J].智能系统学报,2020,15(1):152.[doi:10.11992/tis.201907052]
　GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15():152.[doi:10.11992/tis.201907052]
[9]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317.[doi:10.11992/tis.201809033]
　SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():317.[doi:10.11992/tis.201809033]
[10]莫宏伟,田朋.基于注意力融合的图像描述生成方法[J].智能系统学报,2020,15(4):740.[doi:10.11992/tis.201910039]
　MO Hongwei,TIAN Peng.An image caption generation method based on attention fusion[J].CAAI Transactions on Intelligent Systems,2020,15():740.[doi:10.11992/tis.201910039]
[11]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
　ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[12]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(1):93.[doi:10.11992/tis.201807010]
　XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14():93.[doi:10.11992/tis.201807010]
[13]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
　YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]

备注/Memo

收稿日期:2021-11-17。
基金项目:国家自然科学基金项目(U1613212)
作者简介:陆升阳,硕士研究生,主要研究方向为多智能体系统,多智能体强化学习;赵怀林,教授,博士,主要研究方向为机器人学、多智能体系统和人工智能;刘华平,副教授,博士生导师,博士,中国人工智能学会理事、中国人工智能学会认知系统与信息处理专业委员会秘书长,主要研究方向为机器人感知、学习与控制、多模态信息融合。获吴文俊人工智能科技进步奖二等奖,主持国家自然科学基金重点项目2项
通讯作者:刘华平.E-mail:hpliu@tsinghua.edu.cn

更新日期/Last Update: 1900-01-01

场景图谱驱动目标搜索的多智能体强化学习 PDF下载HTML

备注/Memo

场景图谱驱动目标搜索的多智能体强化学习

PDF下载 HTML