[1]孔英会,崔文婷,张珂,等.融合关键区域信息的双流网络视频表情识别[J].智能系统学报,2025,20(3):658-669.[doi:10.11992/tis.202401031]
KONG Yinghui,CUI Wenting,ZHANG Ke,et al.Two-stream network video expression recognition by fusing key region information[J].CAAI Transactions on Intelligent Systems,2025,20(3):658-669.[doi:10.11992/tis.202401031]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第3期
页码:
658-669
栏目:
学术论文—机器感知与模式识别
出版日期:
2025-05-05
- Title:
-
Two-stream network video expression recognition by fusing key region information
- 作者:
-
孔英会1,2, 崔文婷1, 张珂1,2, 车辚辚1,2
-
1. 华北电力大学 电子与通信工程系, 河北 保定 071003;
2. 华北电力大学 河北省电力物联网技术重点实验室, 河北 保定 071003
- Author(s):
-
KONG Yinghui1,2, CUI Wenting1, ZHANG Ke1,2, CHE Linlin1,2
-
1. Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, China;
2. Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, China
-
- 关键词:
-
视频表情识别; 双流网络; 注意力机制; 光流; 卷积神经网络; 掩模; 特征融合; 面部表情识别
- Keywords:
-
video expression recognition; two-stream network; attention mechanism; optical flow; convolutional neural networks; mask; feature fusion; facial expression recognition
- 分类号:
-
TP39
- DOI:
-
10.11992/tis.202401031
- 摘要:
-
人脸表情识别是计算机视觉领域中的一个重要研究课题,而视频中的表情识别在很多场景下具有实用价值。视频序列包含丰富的帧内空间信息与帧间时间信息,同时面部关键区域的提取也对表情识别结果有重要影响,本文提出一种融合关键区域信息的双流网络表情识别方法。构建空间?时间双流网络,其中空间网络分支结合面部运动单元和CSFA(channel-spatial frame attention),重点关注影响表情识别结果的面部关键区域,以实现空间特征的有效提取;时间分支通过Farneback提取光流获得帧间的表情运动信息,并借助空间关键区域掩模选取降低光流计算复杂度。对空间?时间双流网络识别结果进行决策融合,得到最终视频表情识别结果。该方法在eNTERFACE’05、CK+数据集上进行实验测试,结果表明本文所提方法可有效提升识别精度,且提高了运行效率。
- Abstract:
-
Facial expression recognition is an important research topic in the field of computer vision, and facial expression recognition in video has practical value in many scenes. Video sequences contain rich intra-frame spatial information and inter-frame temporal information, and key facial regions also have an important impact on the expression recognition results. This paper proposes a two-stream network expression recognition method by fusing key region information. First, a spatial-temporal two-stream network is constructed. The spatial network branch combines the facial motion unit and the CSFA attention mechanism to focus on the key facial regions that affect the expression recognition results, so as to realize the effective extraction of spatial features. The temporal branch extracts the optical flow through Farneback to obtain the expression motion information between frames and uses the spatial key region mask selection to reduce the computational complexity of optical flow. Finally, the final video expression recognition results are obtained by decision fusion of the spatial-temporal two-stream network recognition results. The method is tested on the eNTERFACE’05 and CK+ datasets. The results show that the proposed method can effectively improve the recognition accuracy and operating efficiency.
更新日期/Last Update:
1900-01-01