[1]WANG Yefei,GE Quanbo,LIU Huaping,et al.A perceptual manipulation system for audio-visual fusion of robots[J].CAAI Transactions on Intelligent Systems,2023,18(2):381-389.[doi:10.11992/tis.202111036]
Copy

A perceptual manipulation system for audio-visual fusion of robots

References:
[1] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980?2988.
[2] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018?04?08)[2021?01?01].https://arxiv.org/abs/1804.02767.
[3] 周俊佐, 朱宗奎, 何正球,等. 面向人机对话意图分类的混合神经网络模型[J]. 软件学报, 2019, 30(411): 3313–3325
ZHOU Junzuo, ZHU Zongkui, HE Zhengqiu, et al. Hybridneural network models for human-machine dialogue intention classification[J]. Journal of software, 2019, 30(411): 3313–3325
[4] CHOWDHARY K R. Natural language processing[J]. Fundamentals of artificial intelligence, 2020, 17(6): 603–649.
[5] MUREZ Z, VAN AS T, BARTOLOZZI J, et al. Atlas: end-to-end 3D scene reconstruction from posed images[M]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 414?431.
[6] HODA? T, BARáTH D, MATAS J. EPOS: estimating 6D pose of objects with symmetries[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11700?11709.
[7] CORONA E, PUMAROLA A, ALENYà G, et al. GanHand: predicting human grasp affordances in multi-object scenes[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 5030?5040.
[8] QURESHI A H, SIMEONOV A, BENCY M J, et al. Motion planning networks[C]//2019 International Conference on Robotics and Automation. Montreal: IEEE, 2019: 2118?2124.
[9] QI Yuankai, WU Qi, ANDERSON P, et al. REVERIE: remote embodied visual referring expression in real indoor environments[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9979?9988.
[10] GAO Chen, CHEN Jinyu, LIU Si, et al. Room-and-object aware knowledge reasoning for remote embodied referring expression[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3063?3072.
[11] ZHANG HANBO, LU YUNFAN, YU CUNJUN, et al. INVIGORATE: interactive visual grounding and grasping in clutter[EB/OL]. (2021?08?25)[2021?10?10].https://arxiv.org/abs/2108.11092.
[12] MI Jinpeng, LYU Jianzhi, TANG Song, et al. Interactive natural language grounding via referring expression comprehension and scene graph parsing[J]. Frontiers in neurorobotics, 2020, 14: 43.
[13] ONDRAS J, CELIKTUTAN O, BREMNER P, et al. Audio-driven robot upper-body motion synthesis[J]. IEEE transactions on cybernetics, 2021, 51(11): 5445–5454.
[14] LATHUILIèRE S, MASSé B, MESEJO P, et al. Neural network based reinforcement learning for audio-visual gaze control in human-robot interaction[J]. Pattern recognition letters, 2019, 118: 61–71.
[15] H?NEMANN A, BENNETT C, WAGNER P, et al. Audio-visual synthesized attitudes presented by the German speaking robot SMiRAE[C]//The 15th International Conference on Auditory-Visual Speech Processing. Melbourne: ISCA, 2019: 10?11.
[16] YAMAGUCHI A, ATKESON C G. Recent progress in tactile sensing and sensors for robotic manipulation: can we turn tactile sensing into vision?[J]. Advanced robotics, 2019, 33(14): 661–673.
[17] 朱文霖, 刘华平, 王博文, 等. 开放环境下未知材质的识别技术[J]. 智能系统学报, 2020, 15(1): 33–40
ZHU Wenlin, LIU Huaping, WANG Bowen, et al. An intelligent blind guidance system based on visual-touch cross-modal perception[J]. CAAI transactions on intelligent systems, 2020, 15(1): 33–40
[18] LALONDE J F, VANDAPEL N, HUBER D F, et al. Natural terrain classification using three-dimensional ladar data for ground robot mobility[J]. Journal of field robotics, 2006, 23(10): 839–861.
[19] 张新钰, 邹镇洪, 李志伟, 等. 面向自动驾驶目标检测的深度多模态融合技术[J]. 智能系统学报, 2020, 15(4): 758–771
ZHANG Xinyu, ZOU Zhenhong, LI Zhiwei, et al. Deep multi-modal fusion in object detection for autonomous driving[J]. CAAI transactions on intelligent systems, 2020, 15(4): 758–771
[20] LIU Hongyi, FANG Tongtong, ZHOU Tianyu, et al. Deep learning-based multimodal control interface for human-robot collaboration[J]. Procedia cirp, 2018, 72: 3–8.
[21] YOO Y, LEE C Y, ZHANG B T. Multimodal anomaly detection based on deep auto-encoder for object slip perception of mobile manipulation robots[C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 11443?11449.
[22] GAN Chuang, ZHANG Yiwei, WU Jiajun, et al. Look, listen, and act: towards audio-visual embodied navigation[C]//2020 IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 9701-9707.
[23] JIN Shaowei, LIU Huaping, WANG Bowen, et al. Open-environment robotic acoustic perception for object recognition[J]. Frontiers in neurorobotics, 2019, 13: 96.
[24] JONETZKO Y, FIEDLER N, EPPE M, et al. Multimodal object analysis with auditory and tactile sensing using recurrent neural networks[M]//Communications in Computer and Information Science. Singapore: Springer Singapore, 2021: 253?265.
[25] 靳少卫, 刘华平, 王博文, 等. 开放环境下未知材质的识别技术[J]. 智能系统学报, 2020, 15(5): 1020–1027
JIN Shaowei, LIU Huaping, WANG Bowen, et al. Recognition of unknown materials in an open environment[J]. CAAI transactions on intelligent systems, 2020, 15(5): 1020–1027
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems