<-Previous Article Next Article->

[1]ZHONG Zhaoman,FAN Jidong,ZHANG Yu,et al.Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating[J].CAAI Transactions on Intelligent Systems,2025,20(4):999-1009.[doi:10.11992/tis.202409012]

Copy

Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 4 Page number: 999-1009 Column: 学术论文—机器学习 Public date: 2025-08-05

Title:: Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating

Author(s):: ZHONG Zhaoman¹; 2; FAN Jidong¹; ZHANG Yu¹; WANG Chen¹; LYU Huihui¹; ZHANG Liling¹; 1. School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China;
2. Jiangsu Institute of Marine Resources Development, Lianyungang 222005, China

Keywords:: multimodal fusion; emotion analysis; emotional relevance; attention mechanism; convolutional cross attention; cross-modal dynamic gating; global feature association; weight fusion

CLC:: TP391

DOI:: 10.11992/tis.202409012

Abstract:: In multimodal sentiment analysis tasks, ignoring the emotional correlation between images and text leads to a large amount of redundant features in the fused representation. To mitigate this challenge, this paper introduces a multimodal sentiment analysis model grounded in convolutional cross-attention and cross-modal dynamic gating (CCA-CDG). The CCA-CDG model incorporates a convolutional cross-attention module to capture consistent expressions between images and text effectively, thereby obtaining aligned features. Furthermore, the model employs a cross-modal dynamic gating module to modulate the fusion of emotional features dynamically based on their interrelations across modalities. Additionally, recognizing the importance of contextual information from images and text for accurate sentiment interpretation, this paper devises a global feature fusion module that integrates interaction features with global feature weights, which leads to more reliable sentiment predictions. Experiments conducted on the MVSA-Single and MVSA-Multi datasets validate that the proposed CCA-CDG model remarkably enhances performance in multimodal sentiment analysis.

References:: [1] YUE Lin, CHEN Weitong, LI Xue, et al. A survey of sentiment analysis in social media[J]. Knowledge and information systems, 2019, 60: 617-663.
[2] TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J]. Computational linguistics, 2011, 37(2): 267-307.
[3] 吴杰胜, 陆奎. 基于多部情感词典和规则集的中文微博情感分析研究[J]. 计算机应用与软件, 2019, 36(9): 93-99.
WU Jiesheng, LU Kui. Chinese weibo sentiment analysis based on multiple sentiment lexicons and rule sets[J]. Computer applications and software, 2019, 36(9): 93-99.
[4] 李洋, 董红斌. 基于CNN和BiLSTM网络特征融合的文本情感分析[J]. 计算机应用, 2018, 38(11): 3075-3080.
LI Yang, DONG Hongbin. Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J]. Journal of computer applications, 2018, 38(11): 3075-3080.
[5] 刘继, 顾凤云. 基于BERT与BiLSTM混合方法的网络舆情非平衡文本情感分析[J]. 情报杂志, 2022, 41(4): 104-110.
LIU Ji, GU Fengyun. Unbalanced text sentiment analysis of network public opinion based on BERT and BiLSTM hybrid method[J]. Journal of intelligence, 2022, 41(4): 104-110.
[6] DATTA R, JOSHI D, LI Jia, et al. Studying aesthetics in photographic images using a computational approach[C]// Computer Vision–ECCV 2006. Berlin: Springer Berlin Heidelberg, 2006: 288-301.
[7] MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 83–92.
[8] ZHOU Yitao, REN Fuji, NISHIDE S, et al. Facial sentiment classification based on Resnet-18 model[C]//2019 International Conference on Electronic Engineering and Informatics. Nanjing: IEEE, 2019: 463-466.
[9] MEENA G, MOHBEY K K, INDIAN A, et al. Sentiment analysis from images using VGG19 based transfer learning approach[J]. Procedia computer science, 2022, 204: 411-418.
[10] YANG Xiaocui, FENG Shi, WANG Daling, et al. Image-text multimodal emotion classification via multi-view attentional network[J]. IEEE transactions on multimedia, 2020, 23: 4014-4026.
[11] 张继东, 张慧迪. 融合注意力机制的多模态突发事件用户情感分析[J]. 情报理论与实践, 2022, 45(11): 170-177.
ZHANG Jidong, ZHANG Huidi. Multimodal user emotion analysis of emergencies based on attention mechanism[J]. Information studies: theory & application, 2022, 45(11): 170-177.
[12] 杨力, 钟俊弘, 张赟, 等. 基于复合跨模态交互网络的时序多模态情感分析[J]. 计算机科学与探索, 2024, 18(5): 1318-1327.
YANG Li, ZHONG Junhong, ZHANG Yun, et al. Temporal multimodal sentiment analysis with composite cross modal interaction network[J]. Journal of frontiers of computer science and technology, 2024, 18(5): 1318-1327.
[13] ZADEH A, CHEN Minghai, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL]. (2017-07-23)[2024-09-06]. https://arxiv.org/abs/1707.07250.
[14] HAN Wei, CHEN Hui, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. Montréal: ACM, 2021: 6-15.
[15] 曾子明, 孙守强, 李青青. 基于融合策略的突发公共卫生事件网络舆情多模态负面情感识别[J]. 情报学报, 2023, 42(5): 611-622.
ZENG Ziming, SUN Shouqiang, LI Qingqing. Multimodal negative sentiment recognition in online public opinion during public health emergencies based on fusion strategy[J]. Journal of the China society for scientific and technical information, 2023, 42(5): 611-622.
[16] 杨颖, 钱馨雨, 王合宁. 结合多粒度视图动态融合的多模态方面级情感分析[J]. 计算机工程与应用, 2024, 60(22): 172-183.
YANG Ying, QIAN Xinyu, WANG Hening. Multimodal aspect-level sentiment analysis based on multi-granularity view dynamic fusion[J]. Computer engineering and applications, 2024, 60(22): 172-183.
[17] GAN Chenquan, FU Xiang, FENG Qingdong, et al. A multimodal fusion network with attention mechanisms for visual-textual sentiment analysis[J]. Expert systems with applications, 2024, 242: 122731.
[18] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Minneapolis: ACL, 2019: 4171-4186.
[19] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2024-09-06]. https://arxiv.org/abs/1409.0473.
[20] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248-255.
[21] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04)[2024-09-06]. https://arxiv.org/abs/1409.1556.
[22] ZHU Tong, LI Leida, YANG Jufeng, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE transactions on multimedia, 2022, 25: 3375-3385.
[23] NIU Teng, ZHU Shiai, PANG Lei, et al. Sentiment analysis on multi-view social data[C]//MultiMedia Modeling. Cham: Springer International Publishing, 2016: 15-27.
[24] XU Nan, MAO Wenji. MultiSentiNet: a deep semant-ic network for multimodal sentiment analysis[C]//Proceedings of the 2017 ACM on Conference on Informa-tion and Knowledge Management. Singapore: ACM, 2017: 2399-2402.
[25] XU Nan, MAO Wenji, CHEN Guandan. A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Ann Arbor: ACM, 2018: 929–933.
[26] YANG Xiaocui, FENG Shi, ZHANG Yifei, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). [S. l. ]: ACL, 2021: 328-339.
[27] 蔡宇扬, 蒙祖强. 基于模态信息交互的多模态情感分析[J]. 计算机应用研究, 2023, 40(9): 2603-2608.
CAI Yuyang, MENG Zuqiang. Multimodal sentiment analysis based on modal information interaction[J]. Application research of computers, 2023, 40(9): 2603-2608.
[28] LU Wenjie, ZHANG Dong. Unified multi-modal multi-task joint learning for language-vision relation inference[C]//2022 IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1-6.
[29] 周婷, 杨长春. 基于多层注意力机制的图文双模态情感分析[J]. 计算机工程与设计, 2023, 44(6): 1853-1859.
ZHOU Ting, YANG Changchun. Image-text sentiment analysis based on multi-level attention mechanism[J]. Computer engineering and design, 2023, 44(6): 1853-1859.

Similar References:

Memo

Last Update: 1900-01-01

Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating PDF DownloadHTML

Memo

Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating

PDF Download HTML