<-上一篇/Previous Article 下一篇/Next Article->

[1]刘董经典,孟雪纯,张紫欣,等.一种基于2D时空信息提取的行为识别算法[J].智能系统学报,2020,15(5):900-909.[doi:10.11992/tis.201906054]
　LIU Dongjingdian,MENG Xuechun,ZHANG Zixin,et al.A behavioral recognition algorithm based on 2D spatiotemporal information extraction[J].CAAI Transactions on Intelligent Systems,2020,15(5):900-909.[doi:10.11992/tis.201906054]

点击复制

一种基于2D时空信息提取的行为识别算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 15 期数: 2020年第5期页码: 900-909 栏目: 学术论文—机器学习出版日期: 2020-09-05

Title:: A behavioral recognition algorithm based on 2D spatiotemporal information extraction

作者:: 刘董经典, 孟雪纯, 张紫欣, 杨旭, 牛强; 中国矿业大学计算机科学与技术学院，江苏徐州 221008

Author(s):: LIU Dongjingdian, MENG Xuechun, ZHANG Zixin, YANG Xu, NIU Qiang; College of Computer Science & Technology, China University of Mining and Technology , Xuzhou 221008, China

关键词:: 行为识别; 视频分析; 神经网络; 深度学习; 卷积神经网络; 分类; 时空特征提取; 密集连接卷积网络

Keywords:: behavior recognition; video analysis; neural networks; deep learning; convolutional neural networks; classification; spatiotemporal feature; densenet

分类号:: TP391.41

DOI:: 10.11992/tis.201906054

文献标志码:: A

摘要:: 基于计算机视觉的人体行为识别技术是当前的研究热点，其在行为检测、视频监控等领域都有着广泛的应用价值。传统的行为识别方法，计算比较繁琐，时效性不高。深度学习的发展极大提高了行为识别算法准确性，但是此类方法和图像处理领域相比，效果上存在一定的差距。设计了一种基于DenseNet的新颖的行为识别算法，该算法以DenseNet做为网络的架构，通过2D卷积操作进行时空信息的学习，在视频中选取用于表征行为的帧，并将这些帧按时空次序组织到RGB空间上，传入网络中进行训练。在UCF101数据集上进行了大量实验，实验准确率可以达到94.46%。

Abstract:: Human behavior recognition technology based on computer vision is a research hotspot currently. It is widely applied in various fields of social life, such as behavioral detection, video surveillance, etc. Traditional behavior recognition methods are computationally cumbersome and time-sensitive. Therefore, the development of deep learning has greatly improved the accuracy of behavior recognition algorithms. However, compared with the field of image processing, there is a certain gap in the effect of such methods. We introduce a novel behavior recognition algorithm based on DenseNet, which uses DenseNet as the network architecture, learns spatio-temporal information through 2D convolution, selects frames for characterizing behavior in video, organizes these frames into RGB space in time-space order and inputs them into our network to train the network. We have carried out a large number experiments on the UCF101 dataset, and our method can reach an accuracy rate of 94.46%.

参考文献/References:: [1] WANG H, SCHMID C. Action recognition with improved trajectories[C]//2013 IEEE International Conference on Computer Vision. Sydney, AUS, 2013: 3551-3558.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105.
[3] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International journal of computer vision, 2014, 115(3): 211-252.
[4] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 1-9.
[5] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 770-778.
[6] HUANG G, LIN Z, LAURENS V D M, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 2261-2269.
[7] SOOMRO K, ZAMIR A R, SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv: 1212.0402, 2012.
[8] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[9] CHEN P H, LIN C J, Sch?lkopf B. A tutorial on v-support vector machines[J]. Applied stochastic models in business and industry, 2005, 21(2): 111-136.
[10] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005, 1: 886-893.
[11] CHAUDHRY R, RAVICHANDRAN A, HAGER G, et al. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1932-1939.
[12] DALAL N, TRIGGS B, SCHMID C. Human detection using oriented histograms of flow and appearance[C]//European Conference on Computer Vision. Graz, Austria, 2006: 428-441.
[13] WANG H, Kl?ser A, SCHMID C, et al. Action recognition by dense trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. Colorado Springs, USA, 2011: 3169-3176.
[14] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Venice, Italy, 2017: 6299-6308.
[15] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 1933-1941.
[16] NG Y H, HAUSKENCHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 4694-4702.
[17] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 20-36.
[18] LAN Z, ZHU Y, HAUPTMANN A G, et al. Deep local video feature for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Venice, Italy, 2017: 1-7.
[19] 张培浩. 基于姿态估计的行为识别方法研究[D]. 南京: 南京航空航天大学, 2015.
ZHANG Peihao. Research on action recognition based on pose estimation[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015.
[20] 马淼. 视频中人体姿态估计、跟踪与行为识别研究[D]. 山东: 山东大学, 2017.
MA Miao. Study on human pose estimation, tracking and human action recognition in videos[D]. Shandong: Shandong University, 2017.
[21] TRAN D, BOURDEY L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional Networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4489-4497.
[22] HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. Venice, Italy, 2017: 3154-3160.
[23] QIU Z, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017: 5533-5541.
[24] DIBA A, FAYYAZ M, SHARMA V, et al. Temporal 3d convnets: new architecture and transfer learning for video classification[J]. arXiv preprint arXiv: 1711.08200, 2017.
[25] SUN L, JIA K, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4597-4605.
[26] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 6450-6459.
[27] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2014: 568-576.

相似文献/References:: [1]刘琚,孙建德.独立分量分析的图像/视频分析与应用[J].智能系统学报,2011,6(6):495.
　LIU Ju,SUN Jiande.Independent component analysisbased image/video analysis and applications[J].CAAI Transactions on Intelligent Systems,2011,6():495.
[2]梅雪,胡石,许松松,等.基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(6):512.
　MEI Xue,HU Shi,XU Songsong,et al.Multi scale feature based double layer HMM and its application in behavior recognition[J].CAAI Transactions on Intelligent Systems,2012,7():512.
[3]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
　HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10():769.[doi:10.11992/tis.201410010]
[4]申天啸,韩怡园,韩冰,等.基于人类视觉皮层双通道模型的驾驶员眼动行为识别[J].智能系统学报,2022,17(1):41.[doi:10.11992/tis.202106051]
　SHEN Tianxiao,HAN Yiyuan,HAN Bing,et al.Recognition of driver’s eye movement based on the human visual cortex two-stream model[J].CAAI Transactions on Intelligent Systems,2022,17():41.[doi:10.11992/tis.202106051]
[5]代金利,曹江涛,姬晓飞.交互关系超图卷积模型的双人交互行为识别[J].智能系统学报,2024,19(2):316.[doi:10.11992/tis.202208001]
　DAI Jinli,CAO Jiangtao,JI Xiaofei.Two-person interaction recognition based on the interactive relationship hypergraph convolution network model[J].CAAI Transactions on Intelligent Systems,2024,19():316.[doi:10.11992/tis.202208001]
[6]田枫,卫宁彬,刘芳,等.基于时空-动作自适应融合网络的油田作业行为识别[J].智能系统学报,2024,19(6):1407.[doi:10.11992/tis.202309021]
　TIAN Feng,WEI Ningbin,LIU Fang,et al.Oilfield operation behavior recognition based on spatio-temporal and action adaptive fusion network[J].CAAI Transactions on Intelligent Systems,2024,19():1407.[doi:10.11992/tis.202309021]
[7]姬晓飞,谢旋,任艳.深度学习的双人交互行为识别与预测算法研究[J].智能系统学报,2020,15(3):484.[doi:10.11992/tis.201812029]
　JI Xiaofei,XIE Xuan,REN Yan.Human interaction recognition and prediction algorithm based on deep learning[J].CAAI Transactions on Intelligent Systems,2020,15():484.[doi:10.11992/tis.201812029]

备注/Memo

收稿日期:2019-06-28。
基金项目:国家自然科学基金项目（51674255）
作者简介:刘董经典,博士研究生,主要研究方向为行为识别、计算机视觉;张紫欣,硕士研究生,主要研究方向为行为识别、推荐系统、智慧医疗;牛强,教授,主要研究方向为人工智能、数据挖掘和无线传感器网络。发表学术论文40余篇
通讯作者:牛强.E-mail:.niuq@cumt.edu.cn

更新日期/Last Update: 2021-01-15

一种基于2D时空信息提取的行为识别算法 PDF下载HTML

备注/Memo

一种基于2D时空信息提取的行为识别算法

PDF下载 HTML