<-上一篇/Previous Article 下一篇/Next Article->

[1]李沁,陈飞扬,彭晗,等.视觉感知人景互影响的人体动作预测方法[J].智能系统学报,2025,20(4):1010-1023.[doi:10.11992/tis.202411016]
　LI Qin,CHEN Feiyang,PENG Han,et al.Human motion prediction method with visual perception of human-scene mutual influence[J].CAAI Transactions on Intelligent Systems,2025,20(4):1010-1023.[doi:10.11992/tis.202411016]

点击复制

视觉感知人景互影响的人体动作预测方法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第4期页码: 1010-1023 栏目: 学术论文—机器学习出版日期: 2025-08-05

Title:: Human motion prediction method with visual perception of human-scene mutual influence

作者:: 李沁^1,2, 陈飞扬¹, 彭晗¹, 王勇³, 刘利枚¹, 张伟⁴; 1. 湖南工商大学人工智能与先进计算学院, 湖南长沙 410205;
2. 湘江实验室, 湖南长沙 410205;
3. 中南大学自动化学院, 湖南长沙 410083;
4. 字节跳动, 广东深圳 518063

Author(s):: LI Qin^1,2, CHEN Feiyang¹, PENG Han¹, WANG Yong³, LIU Limei¹, ZHANG Wei⁴; 1. School of Artificial Intelligence and Advanced Computing, Hunan Technology and Business University, Changsha 410205, China;
2. Xiangjiang Laboratory, Changsha 410205, China;
3. School of Automation, Central South University, Changsha 410083, China;
4. ByteDance, Shenzhen 518063, China

关键词:: 人体动作预测; 场景信息; 视觉感知; 动作特征; 场景特征; 人景互影响; 场景适应性; 噪声逆扩散

Keywords:: human motion prediction; scene information; visual perception; motion features; scene features; human-scene mutual influence; scene adaptability; noise inverse diffusion

分类号:: TP391

DOI:: 10.11992/tis.202411016

文献标志码:: 2025-2-26

摘要:: 场景信息驱动人类调整动作轨迹，对人体动作预测影响较大。当前研究仅捕获场景信息更新动作特征，忽略了场景与动作的互影响关系。为此，提出一种视觉感知人景互影响的人体动作预测方法。提取动作特征和场景特征，然后循环执行场景信息捕获单元和场景适应度增强单元。前者捕获影响动作的场景信息，后者利用该信息更新动作特征以增强场景适应性。完成循环后，得到场景适应型动作特征。基于该特征执行噪声逆扩散完成动作预测。在3个数据集上进行实验，结果表明本文方法的预测误差低于当前主流方法，验证了其有效性。本文方法将为真实场景中的人体动作预测提供更加可靠的解决方案。

Abstract:: Scene information drives humans to adjust motion trajectories and greatly influences human motion prediction. Current research only updates motion features with scene information and ignores their mutual influences. Hence, a human motion prediction method with visual perception of human-scene mutual influence is proposed in this paper. Motion and scene features are extracted, and scene information capture and adaptability enhancement are iteratively executed. The former captures scene information affecting human motions, whereas the latter updates motion features with the information to enhance their scene adaptability. After the iteration, the scene-adaptive action features are obtained. Noise inverse diffusion is performed based on the features to complete motion prediction. Experiments conducted on three datasets demonstrate that the proposed method has lower prediction error than the current methods, which verifies its effectiveness. The proposed method provides a more reliable solution for human motion prediction in real scenes.

参考文献/References:: [1] RENZ H, KR?MER M, BERTRAM T. Comparing human motion forecasts in moving horizon trajectory planning of collaborative robots[C]//2023 IEEE International Conference on Robotics and Biomimetics. Koh Samui: IEEE, 2023: 1-6.
[2] LEE M L, LIU Wansong, BEHDAD S, et al. Robot-assisted disassembly sequence planning with real-time human motion prediction[J]. IEEE transactions on systems, man, and cybernetics: systems, 2023, 53(1): 438-450.
[3] ZHOU Xiaokang, LIANG Wei, WANG K I, et al. Deep-learning-enhanced human activity recognition for Internet of healthcare things[J]. IEEE internet of things journal, 2020, 7(7): 6429-6438.
[4] LI Qin, WANG Yong. Self-supervised pretraining based on noise-free motion reconstruction and semantic-aware contrastive learning for human motion prediction[J]. IEEE transactions on emerging topics in computational intelligence, 2024, 8(1): 738-751.
[5] URTASUN R, FLEET D J, LAWRENCE N D. Modeling human locomotion with topologically constrained latent variable models[C]//Workshop on Human Motion. Berlin: Springer Berlin Heidelberg, 2007: 104-118.
[6] LEHRMANN A M, GEHLER P V, NOWOZIN S. Efficient nonlinear markov models for human motion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1314-1321.
[7] MARTINEZ J, BLACK M J, ROMERO J. On human motion prediction using recurrent neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu: IEEE, 2017: 2891-2900.
[8] WOLTER M, YAO A. Complex gated recurrent neural networks[C]//32nd Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018: 1-11.
[9] 桑海峰, 陈紫珍, 何大阔. 基于双向GRU和注意力机制模型的人体动作预测[J]. 计算机辅助设计与图形学学报, 2019, 31(7): 1166-1174.
SANG Haifeng, CHEN Zizhen, HE Dakuo. Human motion prediction based on bidirectional-GRU and attention mechanism model[J]. Journal of computer-aided design & computer graphics, 2019, 31(7): 1166-1174.
[10] 王辉, 丁铂栩, 宋佳豪, 等. 基于PointNet和长短时记忆网络的三维人体动作预测[J]. 计算机应用, 2022, 42(S2): 60-66.
WANG Hui, DING Boxu, SONG Jiahao, et al. 3D human action prediction via PointNet and long short-term memory network[J]. Journal of computer applications, 2022, 42(S2): 60-66.
[11] WANG Hongsong, DONG Jian, CHENG Bin, et al. PVRED: a position-velocity recurrent encoder-decoder for human motion prediction[J]. IEEE transactions on image processing, 2021, 30: 6096-6106.
[12] GUO Wen, DU Yuming, SHEN Xi, et al. Back to MLP: a simple baseline for human motion prediction[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 4809-4819.
[13] 张瑞鹏. 基于门控循环单元网络的人体动作预测方法研究[D]. 南京: 南京理工大学, 2021.
ZHANG Ruipeng. Research on human motion prediction method based on gated recurrent unit network[D]. Nanjing: Nanjing University of Science and Technology, 2021.
[14] LIU Xiaoli, YIN Jianqin, LIU Jin, et al. TrajectoryCNN: a new spatio-temporal feature learning network for human motion prediction[J]. IEEE transactions on circuits and systems for video technology, 2021, 31(6): 2133-2146.
[15] TANG Jin, ZHANG Jin, YIN Jianqin. Temporal consistency two-stream CNN for human motion prediction[J]. Neurocomputing, 2022, 468: 245-256.
[16] 张晋, 唐进, 尹建芹. 面向人体动作预测的对称残差网络[J]. 机器人, 2022, 44(3): 291-298.
ZHANG Jin, TANG Jin, YIN Jianqin. Symmetric residual network for human motion prediction[J]. Robot, 2022, 44(3): 291-298.
[17] 贺朵. 基于图卷积网络深度学习的人体动作识别与预测[D]. 西安: 西安理工大学, 2023.
HE Duo. Human action recognition and prediction based on deep learning of graph convolutional networks [D]. Xi’an: Xi’an University of Technology, 2023.
[18] MAO Wei, LIU Miaomiao, SALZMANN M, et al. Learning trajectory dependencies for human motion prediction[C]//Proceedings of the lEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9488-9496.
[19] LI Qin, WANG Yong, LYU Fanbing. Semantic correlation attention-based multiorder multiscale feature fusion network for human motion prediction[J]. IEEE transactions on cybernetics, 2024, 54(2): 825-838.
[20] WANG Xinshun, CUI Qiongjie, CHEN Chen, et al. GCNext: towards the unity of graph convolutions for human motion prediction[J]. Proceedings of the AAAI conference on artificial intelligence. 2024, 38(6): 5642-5650.
[21] 李沁. 基于三维骨架数据的人体动作预测及其应用研究[D]. 长沙: 中南大学, 2022.
LI Qin. Human motion prediction based on 3D skeleton data and its application[D]. Changsha: Central South University, 2022
[22] 代金利, 曹江涛, 姬晓飞. 交互关系超图卷积模型的双人交互行为识别[J]. 智能系统学报, 2024, 19(2): 316-324.
DAI Jinli, CAO Jiangtao, JI Xiaofei. Two-person interaction recognition based on the interactive relationship hypergraph convolution network model[J]. CAAI transactions on intelligent systems, 2024, 19(2): 316-324.
[23] 胡佳慧. 基于时空特征融合与动作序列补全的人体动作预测算法研究[D]. 长春: 吉林大学, 2024.
HU Jiahui. Research on human motion prediction algorithms based on spatiotemporal features fusion and motion sequence completion[D]. Changchun: Jilin University, 2024.
[24] BERMAN M G, JONIDES J, KAPLAN S. The cognitive benefits of interacting with nature[J]. Psychological science, 2008, 19(12): 1207-1212.
[25] CORONA E, PUMAROLA A, ALENYA G, et al. Context-aware human motion prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seattle: IEEE, 2020: 6992-7001.
[26] HASSAN M, CEYLAN D, VILLEGAS R, et al. Stochastic scene-aware motion prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11374-11384.
[27] CAO Zhe, GAO Hang, MANGALAM K, et al. Long-term human motion prediction with scene context[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 387-404.
[28] MAO Wei, HARTLEY R I, SALZMANN M. Contact-aware human motion forecasting[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022: 7356-7367.
[29] SCOFANO L, SAMPIERI A, SCHIELE E, et al. Staged contact-aware global human motion forecasting[C]//The 34th British Machine Vision Conference. Aberdeen: BMVA Press, 2023: 589-594.
[30] XING Chaoyue, MAO Wei, LIU Miaomiao. Scene-aware human motion forecasting via mutual distance prediction[C]//Computer Vision–ECCV 2024. Cham: Springer, 2023: 128–144.
[31] GAO Xuehao, YANG Yang, WU Yang, et al. Multi-condition latent diffusion network for scene-aware neural human motion prediction[J]. IEEE transactions on image processing, 2024, 33: 3907-3920.
[32] LIU Zhenguang, LYU Kedi, WU Shuang, et al. Aggregated multi-GANs for controlled 3D human motion prediction[J]. Proceedings of the AAAI conference on artificial intelligence. 2021, 35(3): 2225–2232.
[33] ZHAO Mengyi, TANG Hao, XIE Pan, et al. Bidirectional Transformer GAN for long-term human motion prediction[J]. ACM transactions on multimedia computing, communications, and applications, 2023, 19(5): 1-19.
[34] BARQUERO G, ESCALERA S, PALMERO C. BeLFusion: latent diffusion for behavior-driven human motion prediction[C]//2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 2317-2327.
[35] TIAN Sibo, ZHENG Minghui, LIANG Xiao. TransFusion: a practical and effective transformer-based diffusion model for 3D human motion prediction[J]. IEEE robotics and automation letters, 2024, 9(7): 6232-6239.
[36] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11534–11542.
[37] HASSAN M, CHOUTAS V, TZIONAS D, et al. Resolving 3D human pose ambiguities with 3D scene constraints[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2282–2292.
[38] BHATNAGAR B L, XIE Xianghui, PETROV I A, et al. BEHAVE: dataset and method for tracking human object interactions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 15914-15925.
[39] YAN Sijie, XIONG Yuanjun, LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 7444–7452.
[40] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations. NewOrleans: ICLR, 2021: 1-22.
[41] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
[42] SUN Quan, FANG Yuxin, WU L, et al. EVA-CLIP: improved training techniques for CLIP at scale[EB/OL]. (2023-03-27)[2024-11-15]. https://arxiv.org/abs/2303.15389.
[43] NARGUND A A, SRA M. SPOTR: spatio-temporal pose Transformers for human motion prediction[EB/OL]. (2023–03–11)[2024–11–15]. https://arxiv.org/abs/2303.06277.

相似文献/References:: [1]田国会,吉艳青,李晓磊.家庭智能空间下基于场景的人的行为理解[J].智能系统学报,2010,5(1):57.
　TIAN Guo-hui,JI Yan-qing,LI Xiao-lei.Human behaviors understanding based on scene knowledge in home intelligent space[J].CAAI Transactions on Intelligent Systems,2010,5():57.

备注/Memo

收稿日期:2024-11-15。
基金项目:国家自然科学基金项目(62202161)；湖南省教育厅科学研究项目(20A125, 22A0460, 23B0597, 24B0584)；湘江实验室重大项目(23XJ01007，23XJ01009)；湖南省自然科学基金项目(2025JJ60384).
作者简介:李沁，讲师，博士，主要研究方向为人机交互和模式识别。E-mail：qinli@hutb.edu.cn。;陈飞扬，主要研究方向为计算机视觉和人机交互。E-mail：1689343195@qq.com。;刘利枚，教授，博士，主要研究方向为人工智能和智能决策。主持国家重点研发计划、国家社会科学基金等省部级以上项目10余项。发表学术论文30余篇，出版专著和教材3部。E-mail：seagullm@163.com。
通讯作者:刘利枚. E-mail：seagullm@163.com

更新日期/Last Update: 1900-01-01

视觉感知人景互影响的人体动作预测方法 PDF下载HTML

备注/Memo

视觉感知人景互影响的人体动作预测方法

PDF下载 HTML