[1]ZHONG Zhaoman,FAN Jidong,ZHANG Yu,et al.Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating[J].CAAI Transactions on Intelligent Systems,2025,20(4):999-1009.[doi:10.11992/tis.202409012]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 4
Page number:
999-1009
Column:
学术论文—机器学习
Public date:
2025-08-05
- Title:
-
Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating
- Author(s):
-
ZHONG Zhaoman1; 2; FAN Jidong1; ZHANG Yu1; WANG Chen1; LYU Huihui1; ZHANG Liling1
-
1. School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China;
2. Jiangsu Institute of Marine Resources Development, Lianyungang 222005, China
-
- Keywords:
-
multimodal fusion; emotion analysis; emotional relevance; attention mechanism; convolutional cross attention; cross-modal dynamic gating; global feature association; weight fusion
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202409012
- Abstract:
-
In multimodal sentiment analysis tasks, ignoring the emotional correlation between images and text leads to a large amount of redundant features in the fused representation. To mitigate this challenge, this paper introduces a multimodal sentiment analysis model grounded in convolutional cross-attention and cross-modal dynamic gating (CCA-CDG). The CCA-CDG model incorporates a convolutional cross-attention module to capture consistent expressions between images and text effectively, thereby obtaining aligned features. Furthermore, the model employs a cross-modal dynamic gating module to modulate the fusion of emotional features dynamically based on their interrelations across modalities. Additionally, recognizing the importance of contextual information from images and text for accurate sentiment interpretation, this paper devises a global feature fusion module that integrates interaction features with global feature weights, which leads to more reliable sentiment predictions. Experiments conducted on the MVSA-Single and MVSA-Multi datasets validate that the proposed CCA-CDG model remarkably enhances performance in multimodal sentiment analysis.