[1]刘董经典,孟雪纯,张紫欣,等.一种基于2D时空信息提取的行为识别算法[J].智能系统学报,2020,15(5):900-909.[doi:10.11992/tis.201906054]
 LIU Dongjingdian,MENG Xuechun,ZHANG Zixin,et al.A behavioral recognition algorithm based on 2D spatiotemporal information extraction[J].CAAI Transactions on Intelligent Systems,2020,15(5):900-909.[doi:10.11992/tis.201906054]
点击复制

一种基于2D时空信息提取的行为识别算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年5期
页码:
900-909
栏目:
学术论文—机器学习
出版日期:
2020-10-31

文章信息/Info

Title:
A behavioral recognition algorithm based on 2D spatiotemporal information extraction
作者:
刘董经典 孟雪纯 张紫欣 杨旭 牛强
中国矿业大学 计算机科学与技术学院,江苏 徐州 221008
Author(s):
LIU Dongjingdian MENG Xuechun ZHANG Zixin YANG Xu NIU Qiang
College of Computer Science & Technology, China University of Mining and Technology , Xuzhou 221008, China
关键词:
行为识别视频分析神经网络深度学习卷积神经网络分类时空特征提取密集连接卷积网络
Keywords:
behavior recognitionvideo analysisneural networksdeep learningconvolutional neural networksclassificationspatiotemporal featuredensenet
分类号:
TP391.41
DOI:
10.11992/tis.201906054
文献标志码:
A
摘要:
基于计算机视觉的人体行为识别技术是当前的研究热点,其在行为检测、视频监控等领域都有着广泛的应用价值。传统的行为识别方法,计算比较繁琐,时效性不高。深度学习的发展极大提高了行为识别算法准确性,但是此类方法和图像处理领域相比,效果上存在一定的差距。设计了一种基于DenseNet的新颖的行为识别算法,该算法以DenseNet做为网络的架构,通过2D卷积操作进行时空信息的学习,在视频中选取用于表征行为的帧,并将这些帧按时空次序组织到RGB空间上,传入网络中进行训练。在UCF101数据集上进行了大量实验,实验准确率可以达到94.46%。
Abstract:
Human behavior recognition technology based on computer vision is a research hotspot currently. It is widely applied in various fields of social life, such as behavioral detection, video surveillance, etc. Traditional behavior recognition methods are computationally cumbersome and time-sensitive. Therefore, the development of deep learning has greatly improved the accuracy of behavior recognition algorithms. However, compared with the field of image processing, there is a certain gap in the effect of such methods. We introduce a novel behavior recognition algorithm based on DenseNet, which uses DenseNet as the network architecture, learns spatio-temporal information through 2D convolution, selects frames for characterizing behavior in video, organizes these frames into RGB space in time-space order and inputs them into our network to train the network. We have carried out a large number experiments on the UCF101 dataset, and our method can reach an accuracy rate of 94.46%.

参考文献/References:

[1] WANG H, SCHMID C. Action recognition with improved trajectories[C]//2013 IEEE International Conference on Computer Vision. Sydney, AUS, 2013: 3551-3558.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105.
[3] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International journal of computer vision, 2014, 115(3): 211-252.
[4] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 1-9.
[5] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 770-778.
[6] HUANG G, LIN Z, LAURENS V D M, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 2261-2269.
[7] SOOMRO K, ZAMIR A R, SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv: 1212.0402, 2012.
[8] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[9] CHEN P H, LIN C J, Sch?lkopf B. A tutorial on v-support vector machines[J]. Applied stochastic models in business and industry, 2005, 21(2): 111-136.
[10] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005, 1: 886-893.
[11] CHAUDHRY R, RAVICHANDRAN A, HAGER G, et al. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009: 1932-1939.
[12] DALAL N, TRIGGS B, SCHMID C. Human detection using oriented histograms of flow and appearance[C]//European Conference on Computer Vision. Graz, Austria, 2006: 428-441.
[13] WANG H, Kl?ser A, SCHMID C, et al. Action recognition by dense trajectories[C]//Proceedings of the IEEE International Conference on Computer Vision. Colorado Springs, USA, 2011: 3169-3176.
[14] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Venice, Italy, 2017: 6299-6308.
[15] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 1933-1941.
[16] NG Y H, HAUSKENCHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 4694-4702.
[17] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 20-36.
[18] LAN Z, ZHU Y, HAUPTMANN A G, et al. Deep local video feature for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Venice, Italy, 2017: 1-7.
[19] 张培浩. 基于姿态估计的行为识别方法研究[D]. 南京: 南京航空航天大学, 2015.
ZHANG Peihao. Research on action recognition based on pose estimation[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2015.
[20] 马淼. 视频中人体姿态估计、跟踪与行为识别研究[D]. 山东: 山东大学, 2017.
MA Miao. Study on human pose estimation, tracking and human action recognition in videos[D]. Shandong: Shandong University, 2017.
[21] TRAN D, BOURDEY L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional Networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4489-4497.
[22] HARA K, KATAOKA H, SATOH Y. Learning spatio-temporal features with 3D residual networks for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. Venice, Italy, 2017: 3154-3160.
[23] QIU Z, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017: 5533-5541.
[24] DIBA A, FAYYAZ M, SHARMA V, et al. Temporal 3d convnets: new architecture and transfer learning for video classification[J]. arXiv preprint arXiv: 1711.08200, 2017.
[25] SUN L, JIA K, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 4597-4605.
[26] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 6450-6459.
[27] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2014: 568-576.

相似文献/References:

[1]刘琚,孙建德.独立分量分析的图像/视频分析与应用[J].智能系统学报,2011,6(06):495.
 LIU Ju,SUN Jiande.Independent component analysisbased image/video analysis and applications[J].CAAI Transactions on Intelligent Systems,2011,6(5):495.
[2]梅雪,胡石,许松松,等.基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(06):512.
 MEI Xue,HU Shi,XU Songsong,et al.Multi scale feature based double layer HMM and its application in behavior recognition[J].CAAI Transactions on Intelligent Systems,2012,7(5):512.
[3]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
 HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10(5):769.[doi:10.11992/tis.201410010]
[4]姬晓飞,谢旋,任艳.深度学习的双人交互行为识别与预测算法研究[J].智能系统学报,2020,15(3):484.[doi:10.11992/tis.201812029]
 JI Xiaofei,XIE Xuan,REN Yan.Human interaction recognition and prediction algorithm based on deep learning[J].CAAI Transactions on Intelligent Systems,2020,15(5):484.[doi:10.11992/tis.201812029]

备注/Memo

备注/Memo:
收稿日期:2019-06-28。
基金项目:国家自然科学基金项目(51674255)
作者简介:刘董经典,博士研究生,主要研究方向为行为识别、计算机视觉;张紫欣,硕士研究生,主要研究方向为行为识别、推荐系统、智慧医疗;牛强,教授,主要研究方向为人工智能、数据挖掘和无线传感器网络。发表学术论文40余篇
通讯作者:牛强.E-mail:.niuq@cumt.edu.cn
更新日期/Last Update: 2021-01-15