[1]钟秋波,郑彩明,朴松昊.时空域融合的骨架动作识别与交互研究[J].智能系统学报,2020,15(3):601-608.[doi:10.11992/tis.202006029]
 ZHONG Qiubo,ZHENG Caiming,PIAO Songhao.Research on skeleton-based action recognition with spatiotemporal fusion and human–robot interaction[J].CAAI Transactions on Intelligent Systems,2020,15(3):601-608.[doi:10.11992/tis.202006029]
点击复制

时空域融合的骨架动作识别与交互研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年3期
页码:
601-608
栏目:
人工智能院长论坛
出版日期:
2020-05-05

文章信息/Info

Title:
Research on skeleton-based action recognition with spatiotemporal fusion and human–robot interaction
作者:
钟秋波12 郑彩明1 朴松昊3
1. 宁波工程学院 机器人学院,浙江 宁波 315211;
2. 哈尔滨工业大学 机器人系统与技术国家重点实验室,黑龙江 哈尔滨 150001;
3. 哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
Author(s):
ZHONG Qiubo12 ZHENG Caiming1 PIAO Songhao3
1. Robotics Institute, Ningbo University of Technology, Ningbo 315211, China;
2. State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China;
3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
关键词:
动作识别时空关系姿态运动时空域融合图卷积神经网络时域关注度自适应特征增强人体动作交互
Keywords:
action recognitiontemporal and spatial relationshipsposture motionspatiotemporal fusiongraph convolution networktemporal attentionadaptive feature enhancementhuman–robot interaction
分类号:
TP312
DOI:
10.11992/tis.202006029
摘要:
在人体骨架结构动作识别方法中,很多研究工作在提取骨架结构上的空间信息和运动信息后进行融合,没有对具有复杂时空关系的人体动作进行高效表达。本文提出了基于姿态运动时空域融合的图卷积网络模型(PM-STFGCN)。对于在时域上存在大量的干扰信息,定义了一种基于局部姿态运动的时域关注度模块(LPM-TAM),用于抑制时域上的干扰并学习运动姿态的表征。设计了基于姿态运动的时空域融合模块(PM-STF),融合时域运动和空域姿态特征并进行自适应特征增强。通过实验验证,本文提出的方法是有效性的,与其他方法相比,在识别效果上具有很好的竞争力。设计的人体动作交互系统,验证了在实时性和准确率上优于语音交互系统。
Abstract:
Temporal dynamics of postures over time is crucial for sequence-based action recognition. Human actions can be represented by corresponding motions of an articulated skeleton. Skeleton-based action recognition algorithm is used for studying motions of a body. Skeleton-based action recognition uses many methods, and research shows that most of them extract spatial and motion information separately from a skeleton structure and then combine them for further processing. However, this process is not able to efficiently deliver human motion features with complex temporal and spatial relationships. We propose a novel posture motion-based, spatiotemporal fused graph convolution network for skeleton-based action recognition. First, we define a local posture motion-based time attention module, which is used to constrain the disturbance information in temporal domain and learn the representation of motion posture features. Then, we design a posture motion-based, spatiotemporal fusion module. This module fuses spatial motion and temporal attitude features and adaptively enhances the skeleton joint features. Extensive experiments have been performed and the results verified the effectiveness of our proposed method. The proposed method has competitive performance, and it is concluded that the human–robot interaction system based on action recognition is superior to the speech interaction system in real-time and with respect to accuracy.

参考文献/References:

[1] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 568-576.
[2] BAGAUTDINOV T, ALAHI A, FLEURET F, et al. Social scene understanding: end-to-end multi-person action localization and collective activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 3425-3434.
[3] WANG Heng, SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia, 2013: 3551-3558.
[4] CAO Zhe, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 1302-1310.
[5] CHEN Yilun, WANG Zhicheng, PENG Yuxiang, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 7103-7112.
[6] 龚冬颖, 黄敏, 张洪博, 等. RGBD人体行为识别中的自适应特征选择方法[J]. 智能系统学报, 2017, 12(1): 1-7
GONG Dongying, HUANG Min, ZHANG Hongbo, et al. Adaptive feature selection method for action recognition of human body in RGBD data[J]. CAAI transactions on intelligent systems, 2017, 12(1): 1-7
[7] 姬晓飞, 王昌汇, 王扬扬. 分层结构的双人交互行为识别方法[J]. 智能系统学报, 2015, 10(6): 893-900
JI Xiaofei, WANG Changhui, WANG Yangyang. Human interaction behavior-recognition method based on hierarchical structure[J]. CAAI transactions on intelligent systems, 2015, 10(6): 893-900
[8] 庄伟源, 成运, 林贤明, 等. 关键肢体角度直方图的行为识别[J]. 智能系统学报, 2015, 10(1): 20-26
ZHUANG Weiyuan, CHENG Yun, LIN Xianming, et al. Action recognition based on the angle histogram of key parts[J]. CAAI transactions on intelligent systems, 2015, 10(1): 20-26
[9] 徐志通, 骆炎民, 柳培忠. 联合加权重构轨迹与直方图熵的异常行为检测[J]. 智能系统学报, 2018, 13(6): 1015-1026
XU Zhitong, LUO Yanmin, LIU Peizhong. Abnormal behavior detection of joint weighted reconstruction trajectory and histogram entropy[J]. CAAI transactions on intelligent systems, 2018, 13(6): 1015-1026
[10] 吴云鹏, 赵晨阳, 时增林, 等. 基于流密度的多重交互集体行为识别算法[J]. 计算机学报, 2017, 40(11): 2519-2532
WU Yunpeng, ZHAO Chenyang, SHI Zenglin, et al. A flow density based algorithm for detecting coherent motion with multiple interaction[J]. Chinese journal of computers, 2017, 40(11): 2519-2532
[11] 陈婷婷, 阮秋琦, 安高云. 视频中人体行为的慢特征提取算法[J]. 智能系统学报, 2015, 10(3): 381-386
CHEN Tingting, RUAN Qiuqi, AN Gaoyun. Slow feature extraction algorithm of human actions in video[J]. CAAI transactions on intelligent systems, 2015, 10(3): 381-386
[12] 丁重阳, 刘凯, 李光, 等. 基于时空权重姿态运动特征的人体骨架行为识别研究[J]. 计算机学报, 2020, 43(1): 29-40
DING Chongyang, LIU Kai, LI Guang, et al. Spatio-temporal weighted posture motion features for human skeleton action recognition research[J]. Chinese journal of computers, 2020, 43(1): 29-40
[13] 莫宏伟, 汪海波. 基于Faster R-CNN的人体行为检测研究[J]. 智能系统学报, 2018, 13(6): 967-973
MO Hongwei, WANG Haibo. Research on human behavior detection based on Faster R-CNN[J]. CAAI transactions on intelligent systems, 2018, 13(6): 967-973
[14] 姬晓飞, 谢旋, 任艳. 深度学习的双人交互行为识别与预测算法研究[J]. 智能系统学报, DOI: 10.11992/tis. 201812029.
JI Xiaofei, XIE Xuan, Ren Yan. Human interaction recognition and prediction algorithm based on Deep Learning [J]. CAAI transactions on intelligent systems, DOI: 10.11992/tis. 201812029.
[15] 谢昭, 周义, 吴克伟, 等. 基于时空关注度LSTM的行为识别[J/OL]. 计算机学报: (2019-12-17) http://kns.cnki.net/kcms/detail/11.1826.TP.20191227.1658.002.html.
XIE Zhao, ZHOU Yi, WU Kewei, et al. Activity recognition based on spatial-temporal attention LSTM[J/OL] Chinese journal of computers: (2019-12-17) http://kns.cnki.net/kcms/detail/11.1826.TP.20191227.1658.002.html.
[16] 王传旭, 胡小悦, 孟唯佳, 等. 基于多流架构与长短时记忆网络的组群行为识别方法研究[J]. 电子学报, 2020, 48(4): 800-807
WANG Chuanxu, HU Xiaoyue, MENG Weijia, et al. Research on group behavior recognition method based on multi-stream architecture and long short-term memory network[J]. Acta electronica sinica, 2020, 48(4): 800-807
[17] 郑兴华, 孙喜庆, 吕嘉欣, 等. 基于深度学习和智能规划的行为识别[J]. 电子学报, 2019, 47(8): 1661-1668
ZHENG Xinghua, SUN Xiqing, LU Jiaxin, et al. Action recognition based on deep learning and artificial intelligence planning[J]. Acta electronica sinica, 2019, 47(8): 1661-1668
[18] 张冰冰,葛疏雨,王旗龙,等.基于多阶信息融合的行为识别方法研究[J/OL]. 自动化学报, [2020-06-17] DOI: 10.16383/j.aas.c180265.
ZHANG Bingbing, GE Shuyu, WANG Qilong, et al. Multi-order Information Fusion Method for Human Action Recognition[J/OL]. ACTA automatica sinica, [2020-06-17] DOI: 10.16383/j.aas.c180265.
[19] LIU Jun, SHAHROUDY A, XU Dong, et al. Spatio-temporal LSTM with trust gates for 3d human action recognition[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 816-833.
[20] LI Chao, ZHONG Qiaoyong, XIE Di, et al. Skeleton-based action recognition with convolutional neural networks[C]//Proceedings of 2017 IEEE International Conference on Multimedia and Expo Workshops. Hong Kong, China, 2017: 597-600.
[21] YAN Sijie, XIONG Yuanjun, LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, USA, 2018: 7444-7452.
[22] SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J/OL]. [2020-06-01] https://arxiv.org/abs/1912.06971, 2019.
[23] LIU Ziyu, ZHANG Hongwen, CHEN Zhenghao, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA, 2020: 143-152.
[24] PENG W, HONG X, CHEN H, et al. Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching[C]//Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence. New York, USA, 2020: 2669-2676.
[25] OBINATA Y, YAMAMOTO T. Temporal extension module for skeleton-based action recognition[J/OL]. [2020-03-19] http://arxiv.org/abs/2003.08951.
[26] SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Angeles, USA, 2019: 12026-12035.

相似文献/References:

[1]庄伟源,成运,林贤明,等.关键肢体角度直方图的行为识别[J].智能系统学报,2015,10(01):20.[doi:10.3969/j.issn.1673-4785.201410039]
 ZHUANG Weiyuan,CHENG Yun,LIN Xianming,et al.Action recognition based on the angle histogram of key parts[J].CAAI Transactions on Intelligent Systems,2015,10(3):20.[doi:10.3969/j.issn.1673-4785.201410039]
[2]姬晓飞,王昌汇,王扬扬.分层结构的双人交互行为识别方法[J].智能系统学报,2015,10(6):893.[doi:10.11992/tis.201505006]
 JI Xiaofei,WANG Changhui,WANG Yangyang.Human interaction behavior-recognition method based on hierarchical structure[J].CAAI Transactions on Intelligent Systems,2015,10(3):893.[doi:10.11992/tis.201505006]
[3]王策,姬晓飞,李一波.一种简便的视角无关动作识别方法[J].智能系统学报,2014,9(05):577.[doi:10.3969/j.issn.1673-4785.201307057]
 WANG Ce,JI Xiaofei,LI Yibo.Study on a simple view-invariant action recognition method[J].CAAI Transactions on Intelligent Systems,2014,9(3):577.[doi:10.3969/j.issn.1673-4785.201307057]
[4]莫凌飞,蒋红亮,李煊鹏.基于深度学习的视频预测研究综述[J].智能系统学报,2018,13(01):85.[doi:10.11992/tis.201707032]
 MO Lingfei,JIANG Hongliang,LI Xuanpeng.Review of deep learning-based video prediction[J].CAAI Transactions on Intelligent Systems,2018,13(3):85.[doi:10.11992/tis.201707032]

备注/Memo

备注/Memo:
收稿日期:2020-06-17。
基金项目:国家自然科学基金项目(61203360,61502256);浙江省自然科学基金项目(LQ12F03001)
作者简介:钟秋波,副教授,博士,宁波工程学院机器人学院执行副院长,主要研究方向为机器人智能控制、计算机视觉图像处理、机器人运动控制。先后主持和参与横、纵向科研项目20多项。发表学术论文20余篇;郑彩明,硕士研究生,主要研究方向为机器人智能控制、计算机视觉、图像处理、机器人运动控制;朴松昊,教授,博士生导师,中国人工智能学会常务理事,机器人文化艺术专业委员会主任,主要研究方向为机器人环境感知与导航、机器人运动规划、多智能体机器人协作。主持或参加国家自然科学基金、国家“863”计划重点、教育部“985”等多个项目。发表学术论文60余篇
通讯作者:钟秋波.E-mail:zhongqiubo@nbut.edu.cn
更新日期/Last Update: 1900-01-01