[1]陆升阳,赵怀林,刘华平.场景图谱驱动目标搜索的多智能体强化学习[J].智能系统学报,2023,18(1):207-215.[doi:10.11992/tis.202111034]
 LU Shengyang,ZHAO Huailin,LIU Huaping.Multi-agent reinforcement learning for scene graph-driven target search[J].CAAI Transactions on Intelligent Systems,2023,18(1):207-215.[doi:10.11992/tis.202111034]
点击复制

场景图谱驱动目标搜索的多智能体强化学习

参考文献/References:
[1] ANDERSON P, WU Qi, TENEY D, et al. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3674?3683.
[2] DAS A, DATTA S, GKIOXARI G, et al. Embodied question answering[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1?10.
[3] THOMASON J, MURRAY M, CAKMAK M, et al. Vision-and-dialog navigation[C]//Proceedings of the Conference on Robot Learning. Cambridge MA: JMLR, 2020, 100: 394?406.
[4] STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Algarve: IEEE, 2012: 573?580.
[5] MIROWSKI P, PASCANU R, VIOLA F, et al. Learning to navigate in complex environments[EB/OL]. (2016?11-11)[ 2021-11-17].https://arxiv.org/abs/1611.03673.
[6] BABAEIZADEH M, FROSIO I, TYREE S, et al. GA3C: GPU-based A3C for deep reinforcement learning[C]//30th Conference on Neural Information Processing Systems. Barcelona, 2016: 1?6.
[7] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017?07?20)[2021?11?17].https://arxiv.org/abs/1707.06347.
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2015?09?09) [2021?11?17].https://arxiv.org/abs/1509.02971.
[9] ANSCHEL O, BARAM N, SHIMKIN N. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning[C]//International Conference on Machine Learning. Cambridge MA: JMLR, 2017: 176?185.
[10] WU Yi, WU Yuxin, TAMAR A, et al. Bayesian relational memory for semantic visual navigation[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2769?2779.
[11] WORTSMAN M, EHSANI K, RASTEGARI M, et al. Learning to learn how to learn: self-adaptive visual navigation using meta-learning[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6743?6752.
[12] 黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J/OL]. 计算机应用, 2022: 1?7. (2022?07?26).https://kns.cnki.net/kcms/detail/51.1307.TP.20220726.1030.002.html.
HUANG Xiaohui, YANG Kaiming, LING Jiahao. Order dispatch by multi-agent reinforcement learning based on shared attention[J/OL]. Journal of computer applications, 2022: 1?7. (2022?07?26).https://kns.cnki.net/kcms/detail/51.1307.TP.20220726.1030.002.html.
[13] DU Heming, YU Xin, ZHENG Liang. Learning object relation graph and tentative policy for visual navigation[M]//Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 19?34.
[14] CHEN Boyuan, SONG Shuran, LIPSON H, et al. Visual hide and seek[EB/OL]. (2019-10-15) [2021-11?17].https://arxiv.org/abs/1910.07882.
[15] JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859–865.
[16] 张文旭, 马磊, 贺荟霖, 等. 强化学习的地-空异构多智能体协作覆盖研究[J]. 智能系统学报, 2018, 13(2): 202–207
ZHANG Wenxu, MA Lei, HE Huilin, et al. Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J]. CAAI transactions on intelligent systems, 2018, 13(2): 202–207
[17] 连传强, 徐昕, 吴军, 等. 面向资源分配问题的Q-CF多智能体强化学习[J]. 智能系统学报, 2011, 6(2): 95–100
LIAN Chuanqiang, XU Xin, WU Jun, et al. Q-CF multi-Agent reinforcement learning for resource allocation problems[J]. CAAI transactions on intelligent systems, 2011, 6(2): 95–100
[18] 韩兆荣,钱宇华,刘郭庆.自注意力与强化学习耦合的多智能体通信[J/OL].小型微型计算机系统:1?8. (2022?05?13) [2022?07?31].DOI:10.20009/j.cnki.21-1106/TP.2021-0802.
HAN Zhaorong, QIAN Yuhua, LIU Guoqing. Multi-agent communication coupled with self-attention and reinforcement learning[J/OL]. Journal of Chinese Mini-Micro Computer Systems. 1?8. (2022?05?13) [2022?07?31]. DOI:10.20009/j.cnki.21-1106/TP.2021-0802.
[19] 方维维, 王云鹏, 张昊, 等. 基于多智能体深度强化学习的车联网通信资源分配优化[J]. 北京交通大学学报, 2022, 46(2): 64–72
FANG Weiwei, WANG Yunpeng, ZHANG Hao, et al. Optimized communication resource allocation in vehicular networks based on multi-agent deep reinforcement learning[J]. Journal of Beijing Jiaotong university, 2022, 46(2): 64–72
[20] KIM D, MOON S, HOSTALLERO D, et al. Learning to schedule communication in multi-agent reinforcement learning[EB/OL]. (2019?02?05) [2022?07?31].https://arxiv.org/abs/1902.01554.
[21] DAS A, GERVET T, ROMOFF J, et al. Tarmac: Targeted multi-agent communication[C]//International Conference on Machine Learning. Cambridge MA: JMLR, 2019: 1538?1546.
[22] DING ZILUO, HUANG TIEJUN, LU ZONGQING. Learning individually inferred communication for multi-agent cooperation[EB/OL]. (2020?06?11) [2022?07?31].https://arxiv.org/abs/2006.06455.
[23] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018, 32(1): 2974?2982.
[24] 陈新元, 谢晟祎, 陈庆强, 等. 结合卷积特征提取和路径语义的知识推理[J]. 智能系统学报, 2021, 16(4): 729–738
CHEN Xinyuan, XIE Shengyi, CHEN Qingqiang, et al. Knowledge-based inference on convolutional feature extraction and path semantics[J]. CAAI transactions on intelligent systems, 2021, 16(4): 729–738
[25] YANG WEI, WANG XIAOLONG, FARHADI A, et al. Visual semantic navigation using scene priors[EB/OL]. (2018?10?15) [2022?07?31].https://arxiv.org/abs/1810.06543.
[26] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J/OL]. 控制与决策, 2022: 1?20. (2022?06?14).https://kns.cnki.net/kcms/detail/21.1124.TP.20220613.1041.023.html.
YAN Chao, XIANG Xiaojia, XU Xin, et al. A survey on the scalability and transferability of multi-agent deep reinforcement learning[J/OL]. Control and decision, 2022: 1?20. (2022?06?14).https://kns.cnki.net/kcms/detail/21.1124.TP.20220613.1041.023.html.
[27] QIU YIDING, PAL A, CHRISTENSEN H I. Learning hierarchical relationships for object-goal navigation[EB/OL]. (2020?03?15) [2022?07?31].https://arxiv.org/abs/2003.06749.
[28] CHAPLOT D S, GANDHI D P, GUPTA A, et al. Object goal navigation using goal-oriented semantic exploration[J]. Advances in Neural Information Processing Systems, 2020: 33.
[29] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980?2988.
[30] KOLVE E, MOTTAGHI R, HAN W, et al. AI2-THOR: an interactive 3D environment for visual AI[EB/OL]. (2017?12?14) [2021?11?17].https://arxiv.org/abs/1712.05474.
[31] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2016?09?09) [2021?11?17].https://arxiv.org/abs/1609.02907.
[32] GORDON D, KEMBHAVI A, RASTEGARI M, et al. IQA: visual question answering in interactive environments[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4089?4098.
[33] YU CHAO, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. (2021?03?02) [2021?11?17].https://arxiv.org/abs/2103.01955.
相似文献/References:
[1]谭树彬,刘建昌.Multi-Agent的连续轧制过程控制系统研究[J].智能系统学报,2008,3(2):150.
 TAN Shu-bin,LIU Jian-chang.Research On multiAgent based control system for continuous rolling process[J].CAAI Transactions on Intelligent Systems,2008,3():150.
[2]连传强,徐昕,吴军,等.面向资源分配问题的Q-CF多智能体强化学习[J].智能系统学报,2011,6(2):95.
 LIAN Chuanqiang,XU Xin,WU Jun,et al.Q-CF multiAgent reinforcement learningfor resource allocation problems[J].CAAI Transactions on Intelligent Systems,2011,6():95.
[3]雷明,周超,周绍磊,等.考虑时变时滞的多移动智能体分布式编队控制[J].智能系统学报,2012,7(6):536.
 LEI Ming,ZHOU Chao,ZHOU Shaolei,et al.Decentralized formation control of multiple mobile agents considering timevarying delay[J].CAAI Transactions on Intelligent Systems,2012,7():536.
[4]郭文强,高晓光,侯勇严,等.采用MSBN多智能体协同推理的智能农业车辆环境识别[J].智能系统学报,2013,8(5):453.[doi:10.3969/j.issn.1673-4785.201210057]
 GUO Wenqiang,GAO Xiaoguang,HOU Yongyan,et al.Environment recognition of intelligent agricultural vehicles based on MSBN and multi-agent coordinative inference[J].CAAI Transactions on Intelligent Systems,2013,8():453.[doi:10.3969/j.issn.1673-4785.201210057]
[5]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016,11(2):149.[doi:10.11992/tis.201510031]
 LIANG Shuang,CAO Qixin,WANG Wenshan,et al.An automatic switching method for multiple location components based on reinforcement learning[J].CAAI Transactions on Intelligent Systems,2016,11():149.[doi:10.11992/tis.201510031]
[6]周文吉,俞扬.分层强化学习综述[J].智能系统学报,2017,12(5):590.[doi:10.11992/tis.201706031]
 ZHOU Wenji,YU Yang.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2017,12():590.[doi:10.11992/tis.201706031]
[7]张文旭,马磊,贺荟霖,等.强化学习的地-空异构多智能体协作覆盖研究[J].智能系统学报,2018,13(2):202.[doi:10.11992/tis.201609017]
 ZHANG Wenxu,MA Lei,HE Huilin,et al.Air-ground heterogeneous coordination for multi-agent coverage based on reinforced learning[J].CAAI Transactions on Intelligent Systems,2018,13():202.[doi:10.11992/tis.201609017]
[8]郭宪,方勇纯.仿生机器人运动步态控制:强化学习方法综述[J].智能系统学报,2020,15(1):152.[doi:10.11992/tis.201907052]
 GUO Xian,FANG Yongchun.Locomotion gait control for bionic robots: a review of reinforcement learning methods[J].CAAI Transactions on Intelligent Systems,2020,15():152.[doi:10.11992/tis.201907052]
[9]申翔翔,侯新文,尹传环.深度强化学习中状态注意力机制的研究[J].智能系统学报,2020,15(2):317.[doi:10.11992/tis.201809033]
 SHEN Xiangxiang,HOU Xinwen,YIN Chuanhuan.State attention in deep reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():317.[doi:10.11992/tis.201809033]
[10]莫宏伟,田朋.基于注意力融合的图像描述生成方法[J].智能系统学报,2020,15(4):740.[doi:10.11992/tis.201910039]
 MO Hongwei,TIAN Peng.An image caption generation method based on attention fusion[J].CAAI Transactions on Intelligent Systems,2020,15():740.[doi:10.11992/tis.201910039]
[11]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017,12(1):82.[doi:10.11992/tis.201604008]
 ZHANG Wenxu,MA Lei,WANG Xiaodong.Reinforcement learning for event-triggered multi-agent systems[J].CAAI Transactions on Intelligent Systems,2017,12():82.[doi:10.11992/tis.201604008]
[12]徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制[J].智能系统学报,2019,14(1):93.[doi:10.11992/tis.201807010]
 XU Peng,XIE Guangming,WEN Jiayan,et al.Event-triggered reinforcement learning formation control for multi-agent[J].CAAI Transactions on Intelligent Systems,2019,14():93.[doi:10.11992/tis.201807010]
[13]殷昌盛,杨若鹏,朱巍,等.多智能体分层强化学习综述[J].智能系统学报,2020,15(4):646.[doi:10.11992/tis.201909027]
 YIN Changsheng,YANG Ruopeng,ZHU Wei,et al.A survey on multi-agent hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems,2020,15():646.[doi:10.11992/tis.201909027]

备注/Memo

收稿日期:2021-11-17。
基金项目:国家自然科学基金项目(U1613212)
作者简介:陆升阳,硕士研究生,主要研究方向为多智能体系统,多智能体强化学习;赵怀林,教授,博士,主要研究方向为机器人学、多智能体系统和人工智能;刘华平,副教授,博士生导师,博士,中国人工智能学会理事、中国人工智能学会认知系统与信息处理专业委员会秘书长,主要研究方向为机器人感知、学习与控制、多模态信息融合。获吴文俊人工智能科技进步奖二等奖,主持国家自然科学基金重点项目2项
通讯作者:刘华平.E-mail:hpliu@tsinghua.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com