<-上一篇/Previous Article 下一篇/Next Article->

[1]张伟男,刘挺.具身智能的研究与应用[J].智能系统学报,2025,20(1):255-262.[doi:10.11992/tis.202406044]
　ZHANG Weinan,LIU Ting.Research and application of embodied intelligence[J].CAAI Transactions on Intelligent Systems,2025,20(1):255-262.[doi:10.11992/tis.202406044]

点击复制

具身智能的研究与应用

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第1期页码: 255-262 栏目: 人工智能院长论坛出版日期: 2025-01-05

Title:: Research and application of embodied intelligence

作者:: 张伟男, 刘挺; 哈尔滨工业大学计算学部, 黑龙江哈尔滨 150001

Author(s):: ZHANG Weinan, LIU Ting; Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China

关键词:: 具身智能; 具身感知; 具身认知; 具身行为优化; 深度学习; 人工智能; 仿真环境; 人形机器人

Keywords:: embodied AI; embodied perception; embodied cognition; embodied action optimization; deep learning; artificial intelligence; simulation environment; humanoid robots

分类号:: TP391; TP242.6

DOI:: 10.11992/tis.202406044

摘要:: 随着深度学习和大模型技术的不断增强，人工智能技术从研究简单、封闭的虚拟场景，发展到研究更为复杂、开放的现实场景。研究焦点也从早期的小规模语料库和网络文本数据集处理，发展到多模态一体化的处理架构和研究范式。与此同时，以OpenAI Sora为代表的物理世界近似和仿真模型的出现，标志着人工智能再次向通用人工智能迈进了一步。然而，若要让人工智能真正达到通用人工智能的标准，成为类人的智能，需要当今的人工智能体具备与物理世界交互学习的能力，即具身智能。因此，本文主要关注具身智能的研究内容和进展，具体包括具身感知、具身认知和具身行为优化3个方面。同时结合近期人形机器人的发展，概述具身智能技术在人形机器人等载体上的应用，并对未来的研究及应用进行展望。

Abstract:: With the continuous enhancement of deep learning and large model technologies, artificial intelligence has evolved from studying simple, closed virtual environments to exploring more complex, open real-world scenarios. The research focus has shifted from early small-scale corpus and web text data processing to multimodal integrated processing architectures and research paradigms. Meanwhile, the emergence of physical world approximation and simulation models, represented by OpenAI Sora, marks another step forward in the progress towards general artificial intelligence (AGI). However, for AI to truly reach the standard of AGI and become human-like intelligence, it must possess the ability to interact and learn from the physical world—this is known as embodied intelligence. Therefore, this paper primarily focuses on the research content and progress of embodied intelligence, specifically including embodied perception, embodied cognition, and embodied action optimization. Additionally, considering recent developments in humanoid robots, this paper outlines the application of embodied intelligence technologies in carriers such as humanoid robots and provides an outlook on research and applications in this field.

参考文献/References:: [1] CAMPBELL M, HOANE J. Deep blue[J]. Artificial intelligence, 2002, 134(1/2): 57-83.
[2] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[3] OpenAI. Introducing ChatGPT[EB/OL]. (2022-11-30)[2024-06-13]. https://openai.com/index/chatgpt.
[4] OpenAI. Video generation models as world simulators [EB/OL]. (2024-02-15)[2024-06-13]. https://openai.com/index/video-generation-models-as-world-simulators.
[5] 唐佩佩, 叶浩生. 作为主体的身体: 从无身认知到具身认知[J]. 心理研究, 2012, 5(3): 3-8.
TANG Peipei, YE Haosheng. Body as the subject: from the disembodied cognition to embodied cognition[J]. Psychological research, 2012, 5(3): 3-8.
[6] 叶浩生. 认知与身体: 理论心理学的视角[J]. 心理学报, 2013, 45(4): 481-488.
YE Haosheng. Cognition and body: a perspective from theoretical psychology[J]. Acta psychologica sinica, 2013, 45(4): 481-488.
[7] 卢策吾, 王鹤. 具身智能(embodied artificial intelligence)[EB/OL]. (2023-07-22)[2024-06-13]. https://www.ccf.org.cn/Media_list/gzwyh/jsjsysdwyh/2023-07-22/794317.shtml.
LU Cewu, WANG He. Embodied AI(embodied artificial intelligence)[EB/OL]. (2023-07-22)[2024-06-13]. https://www.ccf.org.cn/Media_list/gzwyh/jsjsysdwyh/2023-07-22/794317.shtml.
[8] ZHONG Licheng, YANG Lixin, LI Kailin, et al. Color-NeuS: reconstructing neural implicit surfaces with color[EB/OL]. (2023-12-19)[2024-06-13]. https://arxiv.org/abs/2308.06962v2.
[9] LI Kailin, YANG Lixin, ZHEN Haoyu, et al. Chord: category-level hand-held object reconstruction via shape deformation[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 9410-9420.
[10] XU Wenqiang, YU Zhenjun, XUE Han, et al. Visual-tactile sensing for in-hand object reconstruction[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 8803-8812.
[11] LIU Liu, XUE Han, XU Wenqiang, et al. Toward real-world category-level articulation pose estimation[J]. IEEE transactions on image processing, 2022, 31: 1072-1083.
[12] XUE Han, XU Wenqiang, ZHANG Jieyi, et al. GarmentTracking: category-level garment pose tracking[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 21233-21242.
[13] RAMAKRISHNAN S K, JAYARAMAN D, GRAUMAN K. An exploration of embodied visual exploration[J]. International journal of computer vision, 2021, 129(5): 1616-1649.
[14] LYU Jun, YU Qiaojun, SHAO Lin, et al. SAGCI-system: towards sample-efficient, generalizable, compositional, and incremental robot learning[C]//2022 International Conference on Robotics and Automation. Philadelphia: IEEE, 2022: 98-105.
[15] LI Yonglu, LIU Xinpeng, WU Xiaoqian, et al. HAKE: a knowledge engine foundation for human activity understanding[J]. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(7): 8494-8506.
[16] MEES O, HERMANN L, ROSETE-BEAS E, et al. CALVIN: a benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks[J]. IEEE robotics and automation letters, 2022, 7(3): 7327-7334.
[17] MENDEZ-MENDEZ J, KAELBLING L P, LOZANO-PéREZ T. Embodied lifelong learning for task and motion planning[C]//Conference on Robot Learning. New York: PMLR, 2023: 2134-2150.
[18] JIANG Yunfan, GUPTA A, ZHANG Zichen, et al. VIMA: robot manipulation with multimodal prompts[C]// Proceedings of the 40th International Conference on Machine Learning. New York: PMLR, 2023: 14975-15022.
[19] AHN M, BROHAN A, BROWN N, et al. Do As I can, not As I say: grounding language in robotic affordances[EB/OL]. (2022-04-04)[2024-06-13]. https://arxiv.org/abs/2204.01691v2.
[20] DAMEN Dima, DOUGHTY H, FARINELLA G M, et al. Scaling egocentric vision: the ?quation missing\"dataset[C]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 753-771.
[21] DRIESS D, XIA Fei, SAJJADI M S M, et al. PaLM-E: an embodied multimodal language model[C]//International Conference on Machine Learning. New York: PMLR, 2023: 8469-8488.
[22] MA Y J, LIANG W, WANG Guanzhi, et al. Eureka: human-level reward design via coding large language models[EB/OL]. (2023-10-19)[2024-06-13]. https://arxiv.org/abs/2310.12931v2.
[23] BROHAN A, BROWN N, CARBAJAL J, et al. RT-2: vision-language-action models transfer web knowledge to robotic control[C]// Proceedings of the 7th Conference on Robot Learning. New York: PMLR, 2023: 2165-2183.
[24] JAMES S, WOHLHART P, KALAKRISHNAN M, et al. Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12619-12629.
[25] ZAGAL J C, RUIZ-DEL-SOLAR J. Combining simulation and reality in evolutionary robotics[J]. Journal of intelligent and robotic systems, 2007, 50(1): 19-39.
[26] KADIAN A, TRUONG J, GOKASLAN A, et al. Sim2Real predictivity: does evaluation in simulation predict real-world performance?[J]. IEEE robotics and automation letters, 2020, 5(4): 6670-6677.
[27] FANG Haoshu, WANG Chenxi, FANG Hongjie, et al. AnyGrasp: robust and efficient grasp perception in spatial and temporal domains[J]. IEEE transactions on robotics, 2023, 39(5): 3929-3945.
[28] ONGGO B S S, HILL J. Data identification and data collection methods in simulation: a case study at ORH Ltd[J]. Journal of simulation, 2014, 8(3): 195-205.
[29] DEITKE M, HAN W, HERRASTI A, et al. RoboTHOR: an open simulation-to-real embodied AI platform[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 3164-3174.
[30] KROEMER O, NIEKUM S, KONIDARIS G. A review of robot learning for manipulation: challenges, representations, and algorithms[J]. Journal of machine learning research, 2021, 22(1): 1395-1476.
[31] FU Zipeng, ZHAO T Z, FINN C. Mobile ALOHA: learning bimanual mobile manipulation with low-cost whole-body teleoperation[EB/OL]. (2024-01-04)[2024-06-13]. https://arxiv.org/abs/2401.02117v1.
[32] ZHANG Gu, FANG Haoshu, FANG Hongjie, et al. Flexible handover with real-time robust dynamic grasp trajectory generation[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. Detroit: IEEE, 2023: 3192-3199.
[33] LI Shoujie, YU Haixin, DING Wenbo, et al. Visual-tactile fusion for transparent object grasping in complex backgrounds[J]. IEEE transactions on robotics, 2023, 39(5): 3838-3856.
[34] SHEN Bokui, XIA Fei, LI Chengshu, et al. IGibson 1.0: a simulation environment for interactive tasks in large realistic scenes[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Prague: IEEE, 2021: 7520-7527.
[35] JING Mingxuan, MA Xiaojian, HUANG Wenbing, et al. Task transfer by preference-based cost learning[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. New York: ACM, 2019: 2471-2478.
[36] KATO I, OHTERU S, KOBAYASHI H, et al. Information-power machine with senses and limbs[M]//On Theory and Practice of Robots and Manipulators. Vienna: Springer Vienna, 1974: 11-24.
[37] KUSUDA Y. The humanoid robot scene in Japan[J]. Industrial robot, 2002, 29(5): 412-419.

相似文献/References:: [1]李德毅,张天雷,韩威,等.认知机器的结构和激活[J].智能系统学报,2024,19(6):1604.[doi:10.11992/tis.202409024]
　LI Deyi,ZHANG Tianlei,HAN Wei,et al.Structure and activation of cognitive machines[J].CAAI Transactions on Intelligent Systems,2024,19():1604.[doi:10.11992/tis.202409024]

备注/Memo

收稿日期:2024-6-26。
基金项目:国家重点研发计划项目（2022YFF0902100）；国家自然科学基金项目（92470205）；黑龙江省自然科学基金项目（YQ2021F006）.
作者简介:张伟男，长聘教授，博士生导师，哈尔滨工业大学人工智能学院执行院长兼计算学部副主任，黑龙江省中文信息处理重点实验室副主任，中国计算机学会（CCF）理事、CCF哈尔滨分部主席，中国中文信息学会社交媒体处理专委会社交机器人专业组组长，自然语言处理领域顶级国际会议（CCF A类）ACL Dialogue and Interactive Systems资深领域主席。主要研究方向为人工智能、大模型、具身智能、社交机器人。2016年获黑龙江省科技进步一等奖，2020年荣吴文俊人工智能科学科技进步二等奖，2022年获黑龙江省青年科技奖，2024年获黑龙江省科技进步一等奖。主持国家重点研发计划青年科学家项目、国家自然科学基金面上项目，参与科技创新—2030“新一代人工智能”重大项目、国家自然基金重点项目等多项国家、省部级项目。E-mail：wnzhang@ir.hit.edu.cn。;刘挺，长聘教授，博士生导师，哈尔滨工业大学副校长，国家高层次人次，黑龙江省政协教科卫体委员会副主任。工信部高新司“智能机器人”专家组专家，工信部电子信息科学技术委员会信息服务组副组长，教育部人工智能科技创新专家组成员。国家人工智能产教融合创新平台负责人。中国计算机学会会士、中国中文信息学会副理事长，黑龙江省“人工智能”头雁团队带头人。主要研究方向为人工智能、自然语言处理、具身智能。曾主持国家重点研发计划项目、国家重点基础研究发展计划、基金重点项目。获国家科技进步二等奖（排名第4）、省科技进步一等奖（排名第1）2项，吴文俊人工智能科技进步奖二等奖（排名第2）。以第一作者出版教材及译著4部。E-mail：tliu@ir.hit.edu.cn。
通讯作者:刘挺. E-mail：tliu@ir.hit.edu.cn

更新日期/Last Update: 2025-01-05

具身智能的研究与应用 PDF下载HTML

备注/Memo

具身智能的研究与应用

PDF下载 HTML