<-上一篇/Previous Article 下一篇/Next Article->

[1]孔英会,崔文婷,张珂,等.融合关键区域信息的双流网络视频表情识别[J].智能系统学报,2025,20(3):658-669.[doi:10.11992/tis.202401031]
　KONG Yinghui,CUI Wenting,ZHANG Ke,et al.Two-stream network video expression recognition by fusing key region information[J].CAAI Transactions on Intelligent Systems,2025,20(3):658-669.[doi:10.11992/tis.202401031]

点击复制

融合关键区域信息的双流网络视频表情识别

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第3期页码: 658-669 栏目: 学术论文—机器感知与模式识别出版日期: 2025-05-05

Title:: Two-stream network video expression recognition by fusing key region information

作者:: 孔英会^1,2, 崔文婷¹, 张珂^1,2, 车辚辚^1,2; 1. 华北电力大学电子与通信工程系, 河北保定 071003;
2. 华北电力大学河北省电力物联网技术重点实验室, 河北保定 071003

Author(s):: KONG Yinghui^1,2, CUI Wenting¹, ZHANG Ke^1,2, CHE Linlin^1,2; 1. Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, China;
2. Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, China

关键词:: 视频表情识别; 双流网络; 注意力机制; 光流; 卷积神经网络; 掩模; 特征融合; 面部表情识别

Keywords:: video expression recognition; two-stream network; attention mechanism; optical flow; convolutional neural networks; mask; feature fusion; facial expression recognition

分类号:: TP39

DOI:: 10.11992/tis.202401031

摘要:: 人脸表情识别是计算机视觉领域中的一个重要研究课题，而视频中的表情识别在很多场景下具有实用价值。视频序列包含丰富的帧内空间信息与帧间时间信息，同时面部关键区域的提取也对表情识别结果有重要影响，本文提出一种融合关键区域信息的双流网络表情识别方法。构建空间?时间双流网络，其中空间网络分支结合面部运动单元和CSFA(channel-spatial frame attention)，重点关注影响表情识别结果的面部关键区域，以实现空间特征的有效提取；时间分支通过Farneback提取光流获得帧间的表情运动信息，并借助空间关键区域掩模选取降低光流计算复杂度。对空间?时间双流网络识别结果进行决策融合，得到最终视频表情识别结果。该方法在eNTERFACE’05、CK+数据集上进行实验测试，结果表明本文所提方法可有效提升识别精度，且提高了运行效率。

Abstract:: Facial expression recognition is an important research topic in the field of computer vision, and facial expression recognition in video has practical value in many scenes. Video sequences contain rich intra-frame spatial information and inter-frame temporal information, and key facial regions also have an important impact on the expression recognition results. This paper proposes a two-stream network expression recognition method by fusing key region information. First, a spatial-temporal two-stream network is constructed. The spatial network branch combines the facial motion unit and the CSFA attention mechanism to focus on the key facial regions that affect the expression recognition results, so as to realize the effective extraction of spatial features. The temporal branch extracts the optical flow through Farneback to obtain the expression motion information between frames and uses the spatial key region mask selection to reduce the computational complexity of optical flow. Finally, the final video expression recognition results are obtained by decision fusion of the spatial-temporal two-stream network recognition results. The method is tested on the eNTERFACE’05 and CK+ datasets. The results show that the proposed method can effectively improve the recognition accuracy and operating efficiency.

参考文献/References:: [1] 彭小江, 乔宇. 面部表情分析进展和挑战[J]. 中国图象图形学报, 2020, 25(11): 2337-2348.
PENG Xiaojiang, QIAO Yu. Advances and challenges in facial expression analysis[J]. Journal of image and graphics, 2020, 25(11): 2337-2348.
[2] SHAN Caifeng, GONG Shaogang, MCOWAN P W. Facial expression recognition based on local binary patterns: a comprehensive study[J]. Image and vision computing, 2009, 27(6): 803-816.
[3] ZHAO Guoying, PIETIK?INEN M. Dynamic texture recognition using local binary patterns with an application to facial expressions[J]. IEEE transactions on pattern analysis and machine intelligence, 2007, 29(6): 915-928.
[4] ZHI Ruicong, FLIERL M, RUAN Qiuqi, et al. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition[J]. IEEE transactions on systems, man, and cybernetics Part B, Cybernetics, 2011, 41(1): 38-52.
[5] ZHONG Lin, LIU Qingshan, YANG Peng, et al. Learning active facial patches for expression analysis[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2562-2569.
[6] 应自炉, 张有为, 李景文. 融合人脸局部区域的表情识别[J]. 信号处理, 2009, 25(6): 963-966.
YING Zilu, ZHANG Youwei, LI Jingwen. Facial expression recognition by fusing local facial regions[J]. Journal of signal processing, 2009, 25(6): 963-966.
[7] 何俊, 蔡建峰, 房灵芝. 基于LBP特征的融合脸部关键表情区域的表情识别方法[C]//第27届中国控制与决策会议. 青岛: 信息科技, 2015: 1209-1213.
HE Jun, CAI Jianfeng, FANG Lingzhi. Facial expression recognition method based on LBP feature fusion of key facial expression regions [C]// 27th China Control and Decision Conference. Qingdao: Information Technology, 2015: 1209-1213.
[8] JAIN S, HU Changbo, AGGARWAL J K. Facial expression recognition with temporal modeling of shapes[C]//2011 IEEE International Conference on Computer Vision Workshops. Barcelona: IEEE, 2011: 1642-1649.
[9] SIKKA K, SHARMA G, BARTLETT M. LOMo: latent ordinal model for facial analysis in videos[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5580-5589.
[10] ZHANG Kaihao, HUANG Yongzhen, DU Yong, et al. Facial expression recognition based on deep evolutional spatial-temporal networks[J]. IEEE transactions on image processing, 2017, 26(9): 4193-4203.
[11] FENG Duo, REN Fuji. Dynamic facial expression recognition based on two-stream-CNN with LBP-TOP[C]//2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems. Nanjing: IEEE, 2018: 355-359.
[12] CHEN Tuo, XING Shuai, YANG Wenwu, et al. Spatio-temporal features based human facial expression recognition[J]. Journal of image and graphics, 2022, 27(7): 2185-2198.
[13] HASANI B, MAHOOR M H. Facial expression recognition using enhanced deep 3D convolutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2278-2288.
[14] 张隽睿. 基于深度学习的静态和动态面部表情识别研究[D]. 成都: 电子科技大学, 2022.
ZHANG Junrui. Research on static and dynamic facial expression recognition based on deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2022.
[15] 刘菁菁, 吴晓峰. 基于长短时记忆网络的多模态情感识别和空间标注[J]. 复旦学报(自然科学版), 2020, 59(5): 565-574.
LIU Jingjing, WU Xiaofeng. Real-time multimodal emotion recognition and emotion space labeling using LSTM networks[J]. Journal of Fudan University (natural science), 2020, 59(5): 565-574.
[16] FARHOUDI Z, SETAYESHI S. Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition[J]. Speech communication, 2021, 127: 92-103.
[17] FERNANDEZ P D M, PENA F A G, REN T I, et al. FERAtt: facial expression recognition with attention net[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach: IEEE, 2019: 837-846.
[18] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[M]//Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241.
[19] MENG Debin, PENG Xiaojiang, WANG Kai, et al. Frame attention networks for facial expression recognition in videos[C]//2019 IEEE International Conference on Image Processing. Taipei: IEEE, 2019: 3866-3870.
[20] 李同霞. 基于表征流嵌入网络的动态表情识别[D]. 南京: 南京邮电大学, 2022.
LI Tongxia. Dynamic expression recognition based on representation stream embedding network[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2022.
[21] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[22] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.
[23] WANG Dong, GAO Feng, DONG Junyu, et al. Change detection in synthetic aperture radar images based on convolutional block attention module[C]//2019 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp). Shanghai: IEEE, 2019: 1-4.
[24] SIMONYAN K, ZISSERMAN A, SIMONYAN K, et al. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems- Volume 1. [S.l.]: ACM, 2014: 568-576.
[25] FARNEB?CK G. Two-frame motion estimation based on polynomial expansion[C]//Image Analysis. Berlin: Springer Berlin Heidelberg, 2003: 363-370.
[26] LUCEY P, COHN J F, KANADE T, et al. The extended Cohn-kanade dataset (CK): a complete dataset for action unit and emotion-specified expression[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. San Francisco: IEEE, 2010: 94-101.
[27] OLIVIER M, IRENE K, BENOIT M, et al. The eNTERFACE’05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. [S.l.]: IEEE, 2006: 8-15.
[28] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 618-626.
[29] MA Fei, HUANG Shaolun, ZHANG Lin. An efficient approach for audio-visual emotion recognition with missing labels and missing modalities[C]//2021 IEEE International Conference on Multimedia and Expo. Shenzhen: IEEE, 2021: 1-6.
[30] ZHAO Jianfeng, MAO Xia, ZHANG Jian. Learning deep facial expression features from image and optical flow sequences using 3D CNN[J]. The visual computer, 2018, 34(10): 1461-1475.

相似文献/References:: [1]陈斌,朱晋宁.双流增强融合网络微表情识别[J].智能系统学报,2023,18(2):360.[doi:10.11992/tis.202109036]
　CHEN Bin,ZHU Jinning.Micro-expression recognition based on a dual-stream enhanced fusion network[J].CAAI Transactions on Intelligent Systems,2023,18():360.[doi:10.11992/tis.202109036]

备注/Memo

收稿日期:2024-1-24。
基金项目:国家自然科学基金项目(62076093).
作者简介:孔英会，教授，主要研究方向为机器学习与计算机视觉。发表学术论文50余篇。E-mail：kongyh2005@163.com。;崔文婷，硕士研究生，主要研究方向为计算机视觉。E-mail：1595470319@qq.com。;张珂，教授，主要研究方向为计算机视觉和电力人工智能。发表学术论文100余篇。E-mail：zhangkeit@ncepu.edu.cn。
通讯作者:孔英会. E-mail：kongyh2005@163.com

更新日期/Last Update: 1900-01-01

融合关键区域信息的双流网络视频表情识别 PDF下载HTML

备注/Memo

融合关键区域信息的双流网络视频表情识别

PDF下载 HTML