<-上一篇/Previous Article 下一篇/Next Article->

[1]田枫,卫宁彬,刘芳,等.基于时空-动作自适应融合网络的油田作业行为识别[J].智能系统学报,2024,19(6):1407-1418.[doi:10.11992/tis.202309021]
　TIAN Feng,WEI Ningbin,LIU Fang,et al.Oilfield operation behavior recognition based on spatio-temporal and action adaptive fusion network[J].CAAI Transactions on Intelligent Systems,2024,19(6):1407-1418.[doi:10.11992/tis.202309021]

点击复制

基于时空-动作自适应融合网络的油田作业行为识别

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 19 期数: 2024年第6期页码: 1407-1418 栏目: 学术论文—机器感知与模式识别出版日期: 2024-12-05

Title:: Oilfield operation behavior recognition based on spatio-temporal and action adaptive fusion network

作者:: 田枫, 卫宁彬, 刘芳, 韩玉祥, 赵玲, 张思睿, 马贵宝; 东北石油大学计算机与信息技术学院, 黑龙江大庆 163318

Author(s):: TIAN Feng, WEI Ningbin, LIU Fang, HAN Yuxiang, ZHAO Ling, ZHANG Sirui, MA Guibao; School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China

关键词:: 行为识别; ResNet50; 注意力机制; 油田作业; 特征融合; 时空注意力; 动作注意力; 复杂场景

Keywords:: behavior recognition; ResNet50; attention mechanism; oilfield operation; feature fusion; spatio-temporal attention; action attention; complex scenes

分类号:: TP391

DOI:: 10.11992/tis.202309021

摘要:: 为解决油田作业现场复杂环境对行为识别算法造成干扰而引起的误检、漏检问题，提出一种时空-动作自适应融合网络，用于油田作业现场的人员行为识别。构建的网络首先使用稀疏采样的策略对视频进行处理，再通过特征提取网络进行特征提取，其核心模块分别为时空注意力模块、动作强化模块和自适应特征融合模块。时空注意力模块完成特征的时空重要性再分配，建立不同帧之间的时间关联；动作强化模块完成背景的弱化、人体动作的强化，使模型聚焦于人体动作；特征融合模块在二者并行特征强化后进行自适应特征融合，最终通过全连接层和Softmax层来实现行为的分类。为验证所提网络的效果，分别在公共数据集和油田自制数据集上将所提模型与经典网络进行对比，UCF101数据集上的Top-1准确率相较于SlowOnly（SlowFast模型的Slow分支）和TSM（temporal shift module）分别提升了3.33%和1.61%，HMDB51数据集上的Top-1准确率相较于SlowOnly和TSM分别提升了8.56%和1.83%，在油田自制数据集上与TSN（temporal segment network）、TSM、SlowOnly进行对比，结果显示所提模型准确率得到大幅提升，验证了时空-动作自适应融合网络在油田作业现场环境下的有效性，更适用于油田作业环境下的行为识别任务。

Abstract:: A spatiotemporal and action adaptive fusion network is proposed for personnel behavior recognition in oilfield operation sites to address the problems of false positives and negatives caused by the complex environment of oilfield operations interfering with behavior recognition algorithms. First, the videos are processed on the constructed network using a sparse sampling strategy, and features on the feature extraction network are then extracted. The core modules of the network include spatiotemporal attention, action reinforcement, and adaptive feature fusion modules. The spatiotemporal attention module redistributes the spatiotemporal importance of features, establishing temporal correlations between different frames. The action reinforcement module weakens the background and enhances human body movements, allowing the model to focus on human actions. The feature fusion module adaptively combines the parallel features after reinforcement. Finally, behavior classification is achieved through fully connected layers and a SoftMax layer. The model is compared with classic networks on public and self-built oilfield datasets to verify the effectiveness of the proposed network. The Top-1 accuracy on the UCF101 dataset shows a 3.33% improvement over SlowOnly, the Slow branch of the SlowFast model, and a 1.61% improvement over the temporal shift module (TSM). On the HMDB51 dataset, the Top-1 accuracy improves by 8.56% and 1.83% compared to SlowOnly and TSM, respectively. Additionally, when evaluated on the self-built oilfield dataset, the proposed model shows a notable improvement in accuracy over the temporal segment network, TSM, and SlowOnly. This result validates the effectiveness of the spatiotemporal and action adaptive fusion network in oilfield operations and confirms its suitability for behavior recognition tasks in such environments.

参考文献/References:: [1] 富倩. 人体行为识别研究[J]. 信息与电脑(理论版), 2017(24): 146-147.
FU Qian. Analysis of human behavior recognition[J]. China computer & communication（theoretical edition）, 2017(24): 146-147.
[2] 梁绪, 李文新, 张航宁. 人体行为识别方法研究综述[J]. 计算机应用研究, 2022, 39(3): 651-660.
LIANG Xu, LI Wenxin, ZHANG Hangning. Review of research on human action recognition methods[J]. Application research of computers, 2022, 39(3): 651-660.
[3] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 4489-4497.
[4] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4724-4733.
[5] TRAN D, WANG Heng, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6450-6459.
[6] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 568-576.
[7] WANG Limin, XIONG Yuanjun, WANG Zhe, et al. Temporal segment networks: towards good practices for deep action recognition[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016: 20-36.
[8] LIN Ji, GAN Chuang, HAN Song. TSM: temporal shift module for efficient video understanding[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7082-7092.
[9] FEICHTENHOFER C, FAN Haoqi, MALIK J, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6201-6210.
[10] YAN Sijie, XIONG Yuanjun, LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 7444-7452.
[11] DUAN Haodong, ZHAO Yue, CHEN Kai, et al. Revisiting skeleton-based action recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 2959-2968.
[12] 田枫, 孙晓悦, 刘芳, 等. 基于图卷积的作业行为实时检测方法[J]. 计算机工程与设计, 2022, 43(10): 2944-2952.
TIAN Feng, SUN Xiaoyue, LIU Fang, et al. Real time detection method of work behavior based on graph convolution[J]. Computer engineering and design, 2022, 43(10): 2944-2952.
[13] 陆昱翔, 徐冠华, 唐波. 基于视觉Transformer时空自注意力的工人行为识别[J]. 浙江大学学报(工学版), 2023, 57(3): 446-454.
LU Yuxiang, XU Guanhua, TANG Bo. Worker behavior recognition based on temporal and spatial self-attention of vision Transformer[J]. Journal of Zhejiang university (engineering science edition), 2023, 57(3): 446-454.
[14] 饶天荣, 潘涛, 徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化, 2022, 48(10): 48-54.
RAO Tianrong, PAN Tao, XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Industry and mine automation, 2022, 48(10): 48-54.
[15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2022-03-24]. http://arxiv.org/abs/2010.11929.
[16] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[17] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[18] KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: a large video database for human motion recognition[C]//2011 International Conference on Computer Vision. Barcelona: IEEE, 2011: 2556-2563.
[19] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[EB/OL]. (2012-12-03)[2021-11-05]. http://arxiv.org/abs/1212.0402.
[20] KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. (2017-05-19)[2021-11-05]. http://arxiv.org/abs/1705.06950.
[21] FISHER R A. The use of multiple measurements in taxonomic problems[J]. Annals of eugenics, 1936, 7(2): 179-188.
[22] GAMMULLE H, DENMAN S, SRIDHARAN S, et al. Two stream LSTM: a deep fusion framework for human action recognition[C]//2017 IEEE Winter Conference on Applications of Computer Vision. Santa Rosa: IEEE, 2017: 177-186.
[23] JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 221-231.
[24] XIE Saining, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 5987-5995.
[25] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International journal of computer vision, 2020, 128(2): 336-359.

相似文献/References:: [1]梅雪,胡石,许松松,等.基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(6):512.
　MEI Xue,HU Shi,XU Songsong,et al.Multi scale feature based double layer HMM and its application in behavior recognition[J].CAAI Transactions on Intelligent Systems,2012,7():512.
[2]姬晓飞,谢旋,任艳.深度学习的双人交互行为识别与预测算法研究[J].智能系统学报,2020,15(3):484.[doi:10.11992/tis.201812029]
　JI Xiaofei,XIE Xuan,REN Yan.Human interaction recognition and prediction algorithm based on deep learning[J].CAAI Transactions on Intelligent Systems,2020,15():484.[doi:10.11992/tis.201812029]
[3]刘董经典,孟雪纯,张紫欣,等.一种基于2D时空信息提取的行为识别算法[J].智能系统学报,2020,15(5):900.[doi:10.11992/tis.201906054]
　LIU Dongjingdian,MENG Xuechun,ZHANG Zixin,et al.A behavioral recognition algorithm based on 2D spatiotemporal information extraction[J].CAAI Transactions on Intelligent Systems,2020,15():900.[doi:10.11992/tis.201906054]
[4]申天啸,韩怡园,韩冰,等.基于人类视觉皮层双通道模型的驾驶员眼动行为识别[J].智能系统学报,2022,17(1):41.[doi:10.11992/tis.202106051]
　SHEN Tianxiao,HAN Yiyuan,HAN Bing,et al.Recognition of driver’s eye movement based on the human visual cortex two-stream model[J].CAAI Transactions on Intelligent Systems,2022,17():41.[doi:10.11992/tis.202106051]
[5]闫河,李梦雪,张宇宁,等.面向表情识别的重影非对称残差注意力网络模型[J].智能系统学报,2023,18(2):333.[doi:10.11992/tis.202201003]
　YAN He,LI Mengxue,ZHANG Yuning,et al.A ghost asymmetric residual attention network model for facial expression recognition[J].CAAI Transactions on Intelligent Systems,2023,18():333.[doi:10.11992/tis.202201003]
[6]代金利,曹江涛,姬晓飞.交互关系超图卷积模型的双人交互行为识别[J].智能系统学报,2024,19(2):316.[doi:10.11992/tis.202208001]
　DAI Jinli,CAO Jiangtao,JI Xiaofei.Two-person interaction recognition based on the interactive relationship hypergraph convolution network model[J].CAAI Transactions on Intelligent Systems,2024,19():316.[doi:10.11992/tis.202208001]

备注/Memo

收稿日期:2023-9-11。
基金项目:黑龙江省自然科学基金项目（LH2021F004）.
作者简介:田枫，教授，博士生导师，博士，计算机与信息技术学院院长，主要研究方向为智能油气地质、计算机视觉、智能数据分析处理。主持和参与国家自然科学基金项目、国家科技重大专项项目8项，专利授权16项，发表学术论文30余篇。E-mail：tianfeng1980@ 163.com;卫宁彬，硕士研究生，主要研究方向为计算机视觉、智能数据分析处理。E-mail：1205542631@qq.com;刘芳，副教授，博士，主要研究方向为智能油气地质、智慧教育、多媒体与现代教育技术、计算机视觉。获黑龙江省科技进步二等奖1项、大庆市科技进步二等奖1项，主持和参与国家自然科学基金项目、黑龙江省自然科学基金项目6项，发表学术论文20 余篇。E-mail：lfliufang1983@126.com。
通讯作者:刘芳. E-mail：lfliufang1983@126.com

更新日期/Last Update: 2024-11-05

基于时空-动作自适应融合网络的油田作业行为识别 PDF下载HTML

备注/Memo

基于时空-动作自适应融合网络的油田作业行为识别

PDF下载 HTML