[1]姬晓飞,谢旋,任艳.深度学习的双人交互行为识别与预测算法研究[J].智能系统学报,2020,15(3):484-490.[doi:10.11992/tis.201812029]
 JI Xiaofei,XIE Xuan,REN Yan.Human interaction recognition and prediction algorithm based on deep learning[J].CAAI Transactions on Intelligent Systems,2020,15(3):484-490.[doi:10.11992/tis.201812029]
点击复制

深度学习的双人交互行为识别与预测算法研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年3期
页码:
484-490
栏目:
学术论文—智能系统
出版日期:
2020-09-05

文章信息/Info

Title:
Human interaction recognition and prediction algorithm based on deep learning
作者:
姬晓飞 谢旋 任艳
沈阳航空航天大学 自动化学院,辽宁 沈阳 110136
Author(s):
JI Xiaofei XIE Xuan REN Yan
School of Automation, Shenyang Aerospace University, Shenyang 110136, China
关键词:
视频分析行为识别行为预测深度学习卷积神经网络长短期记忆网络UT-interaction数据库SBU Kinect interaction数据库
Keywords:
video analysisaction recognitionaction predictiondeep learningconvolutional neural networklong short term memoryUT-interaction datasetSBU Kinect interaction dataset
分类号:
TP391.4
DOI:
10.11992/tis.201812029
摘要:
基于卷积神经网络的双人交互行为识别算法存在提取的深度特征无法有效表征交互行为序列特性的问题,本文将长短期记忆网络与卷积神经网络模型相结合,提出了一种基于深度学习的双人交互行为识别与预测一体化方法。该方法在训练过程中,完成对卷积神经网络和长短期记忆网络模型的参数训练。在识别与预测过程中,将不同时间比例长度的未知动作类别的视频图像分别送入已经训练好的卷积神经网络模型提取深度特征,再将卷积神经网络提取的深度特征送入长短期记忆网络模型完成对双人交互行为的识别与预测。在国际公开的UT-interaction双人交互行为数据库进行测试的结果表明,该方法在保证计算量适当的同时对交互行为的正确识别率达到了92.31%,并且也可完成对未知动作的初步预测。
Abstract:
A drawback of the human interaction recognition algorithm based on a convolutional neural network (CNN) is that the extracted depth features cannot effectively represent the characteristics of interaction sequences. Instead, this paper proposes a human interaction recognition and prediction algorithm based on deep learning, by combining the Long Short-Term Memory (LSTM) network with the CNN model. In the process, video images of unknown action categories of different time lengths are sent to a trained CNN model to extract the depth features. The depth features are then sent to a trained LSTM model to complete the recognition and prediction of the interaction behavior. When tested on the UT-interaction human interaction behavior dataset, the algorithm demonstrates a 92.31% correct human interaction recognition rate and can complete the preliminary prediction of unknown actions.

参考文献/References:

[1] RYOO M S. Human activity prediction: Early recognition of ongoing activities from streaming videos[C]//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain, 2011: 1036-1043.
[2] XU Kaiping, QIN Zheng, WANG Guolong. Human activities prediction by learning combinatorial sparse representations[C]//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, USA, 2016: 724-728.
[3] RAPTIS M, SIGAL L. Poselet key-framing: a model for human activity recognition[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 2650-2657.
[4] KONG Yu, FU Yun. Max-margin action prediction machine[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(9): 1844-1858.
[5] KUNZE K, LUKOWICZ P. Dealing with sensor displacement in motion-based onbody activity recognition systems[C]//Proceedings of the 10th International Conference on Ubiquitous Computing. Seoul, South Korea, 2008: 20-29.
[6] BULLING A, ROGGEN D. Recognition of visual memory recall processes using eye movement analysis[C]//Proceedings of the 13th International Conference on Ubiquitous Computing. New York, USA, 2011: 455-464.
[7] VAN KASTEREN T, NOULAS A, ENGLEBIENNE G, et al. Accurate activity recognition in a home setting[C]//Proceedings of the 10th International Conference on Ubiquitous Computing. Seoul, South Korea, 2008: 1-9.
[8] CHUNG P C, LIU C D. A daily behavior enabled hidden Markov model for human behavior understanding[J]. Pattern recognition, 2008, 41(5): 1572-1580.
[9] TANG K, LI Feifei, KOLLER D. Learning latent temporal structure for complex event detection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA, 2012: 1025-1257.
[10] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. San Francisco, USA, 2001: 282-289.
[11] ZHANG Jianguo, GONG Shaogang. Action categorization with modified hidden conditional random field[J]. Pattern recognition, 2010, 43(1): 197-203.
[12] SONG Yale, MORENCY L P, DAVIS R. Action recognition by hierarchical sequence summarization[C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 3563-3569.
[13] KE Qiuhong, BENNAMOUN M, AN Senjian, et al. Human interaction prediction using deep temporal features [C]//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 403-414.
[14] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 568-576.
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[16] BACCOUCHE M, MAMALET F, WOLF C, et al. Sequential deep learning for human action recognition[C]//Proceedings of the 2nd International Workshop on Human Behavior Understanding. Amsterdam, The Netherlands, 2011: 29-39.
[17] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 1-9.
[18] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 2818-2826.
[19] RYOO M S, AGGARWAL J K. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities[C]//Proceedings of 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan, 2009: 1593-1600.

相似文献/References:

[1]刘琚,孙建德.独立分量分析的图像/视频分析与应用[J].智能系统学报,2011,6(06):495.
 LIU Ju,SUN Jiande.Independent component analysisbased image/video analysis and applications[J].CAAI Transactions on Intelligent Systems,2011,6(3):495.
[2]梅雪,胡石,许松松,等.基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(06):512.
 MEI Xue,HU Shi,XU Songsong,et al.Multi scale feature based double layer HMM and its application in behavior recognition[J].CAAI Transactions on Intelligent Systems,2012,7(3):512.
[3]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
 HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10(3):769.[doi:10.11992/tis.201410010]

备注/Memo

备注/Memo:
收稿日期:2018-12-26。
基金项目:国家自然科学基金项目(61602321);辽宁省自然科学基金项目(201602557);辽宁省教育厅科学研究服务地方项目(L201708);辽宁省教育厅科学研究青年项目(L201745)
作者简介:姬晓飞,副教授,博士,主要研究方向为视频分析与处理、模式识别理论。承担国家自然科学基金、辽宁省自然科学基金等多项课题研究。发表学术论文40余篇,参与编著英文专著2部。;谢旋,硕士研究生,主要研究方向为生物特征识别与行为分析技术。;任艳,讲师,博士,主要研究方向为基于公理化模糊集的知识发现与表示、图像语义特征提取。承担国家自然科学基金、航空基金、辽宁省自然科学基金等课题研究。发表学术论文25篇
通讯作者:姬晓飞.E-mail:jixiaofei7804@126.com
更新日期/Last Update: 1900-01-01