<-上一篇/Previous Article 下一篇/Next Article->

[1]仲兆满,樊继冬,张渝,等.基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型[J].智能系统学报,2025,20(4):999-1009.[doi:10.11992/tis.202409012]
　ZHONG Zhaoman,FAN Jidong,ZHANG Yu,et al.Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating[J].CAAI Transactions on Intelligent Systems,2025,20(4):999-1009.[doi:10.11992/tis.202409012]

点击复制

基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第4期页码: 999-1009 栏目: 学术论文—机器学习出版日期: 2025-08-05

Title:: Multimodal sentiment analysis model with convolutional cross-attention and cross-modal dynamic gating

作者:: 仲兆满^1,2, 樊继冬¹, 张渝¹, 王晨¹, 吕慧慧¹, 张丽玲¹; 1. 江苏海洋大学计算机工程学院, 江苏连云港 222005;
2. 江苏省海洋资源开发研究院, 江苏连云港 222005

Author(s):: ZHONG Zhaoman^1,2, FAN Jidong¹, ZHANG Yu¹, WANG Chen¹, LYU Huihui¹, ZHANG Liling¹; 1. School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China;
2. Jiangsu Institute of Marine Resources Development, Lianyungang 222005, China

关键词:: 多模态融合; 情感分析; 情感关联性; 注意力机制; 卷积交叉注意力; 跨模态动态门控; 全局特征联合; 权重融合

Keywords:: multimodal fusion; emotion analysis; emotional relevance; attention mechanism; convolutional cross attention; cross-modal dynamic gating; global feature association; weight fusion

分类号:: TP391

DOI:: 10.11992/tis.202409012

文献标志码:: 2025-2-21

摘要:: 在多模态情感分析任务中，现有方法由于忽视了图像与文本之间的情感关联性，导致融合特征存在大量冗余特征。为此，提出了一种基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型(convolutional cross-attention and cross-modal dynamic gating, CCA-CDG)。CCA-CDG通过引入卷积交叉注意力模块(convolutional cross-attention module, CCAM) 来捕捉图像与文本间的一致性表达，获取图文之间的对齐特征；同时利用跨模态动态门控模块(cross-modal dynamic gating module, CDGM)，根据图文之间的情感关联性动态调节情感特征的融合。此外，考虑到图文上下文信息对于理解情感的重要性，还设计了一个全局特征联合模块，将图文交互特征与全局特征权重融合，实现更可靠的情感预测。在MVSA-Single和MVSA-Multi数据集上进行实验验证，所提出的CCA-CDG能够有效改善多模态情感分析的效果。

Abstract:: In multimodal sentiment analysis tasks, ignoring the emotional correlation between images and text leads to a large amount of redundant features in the fused representation. To mitigate this challenge, this paper introduces a multimodal sentiment analysis model grounded in convolutional cross-attention and cross-modal dynamic gating (CCA-CDG). The CCA-CDG model incorporates a convolutional cross-attention module to capture consistent expressions between images and text effectively, thereby obtaining aligned features. Furthermore, the model employs a cross-modal dynamic gating module to modulate the fusion of emotional features dynamically based on their interrelations across modalities. Additionally, recognizing the importance of contextual information from images and text for accurate sentiment interpretation, this paper devises a global feature fusion module that integrates interaction features with global feature weights, which leads to more reliable sentiment predictions. Experiments conducted on the MVSA-Single and MVSA-Multi datasets validate that the proposed CCA-CDG model remarkably enhances performance in multimodal sentiment analysis.

参考文献/References:: [1] YUE Lin, CHEN Weitong, LI Xue, et al. A survey of sentiment analysis in social media[J]. Knowledge and information systems, 2019, 60: 617-663.
[2] TABOADA M, BROOKE J, TOFILOSKI M, et al. Lexicon-based methods for sentiment analysis[J]. Computational linguistics, 2011, 37(2): 267-307.
[3] 吴杰胜, 陆奎. 基于多部情感词典和规则集的中文微博情感分析研究[J]. 计算机应用与软件, 2019, 36(9): 93-99.
WU Jiesheng, LU Kui. Chinese weibo sentiment analysis based on multiple sentiment lexicons and rule sets[J]. Computer applications and software, 2019, 36(9): 93-99.
[4] 李洋, 董红斌. 基于CNN和BiLSTM网络特征融合的文本情感分析[J]. 计算机应用, 2018, 38(11): 3075-3080.
LI Yang, DONG Hongbin. Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J]. Journal of computer applications, 2018, 38(11): 3075-3080.
[5] 刘继, 顾凤云. 基于BERT与BiLSTM混合方法的网络舆情非平衡文本情感分析[J]. 情报杂志, 2022, 41(4): 104-110.
LIU Ji, GU Fengyun. Unbalanced text sentiment analysis of network public opinion based on BERT and BiLSTM hybrid method[J]. Journal of intelligence, 2022, 41(4): 104-110.
[6] DATTA R, JOSHI D, LI Jia, et al. Studying aesthetics in photographic images using a computational approach[C]// Computer Vision–ECCV 2006. Berlin: Springer Berlin Heidelberg, 2006: 288-301.
[7] MACHAJDIK J, HANBURY A. Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 83–92.
[8] ZHOU Yitao, REN Fuji, NISHIDE S, et al. Facial sentiment classification based on Resnet-18 model[C]//2019 International Conference on Electronic Engineering and Informatics. Nanjing: IEEE, 2019: 463-466.
[9] MEENA G, MOHBEY K K, INDIAN A, et al. Sentiment analysis from images using VGG19 based transfer learning approach[J]. Procedia computer science, 2022, 204: 411-418.
[10] YANG Xiaocui, FENG Shi, WANG Daling, et al. Image-text multimodal emotion classification via multi-view attentional network[J]. IEEE transactions on multimedia, 2020, 23: 4014-4026.
[11] 张继东, 张慧迪. 融合注意力机制的多模态突发事件用户情感分析[J]. 情报理论与实践, 2022, 45(11): 170-177.
ZHANG Jidong, ZHANG Huidi. Multimodal user emotion analysis of emergencies based on attention mechanism[J]. Information studies: theory & application, 2022, 45(11): 170-177.
[12] 杨力, 钟俊弘, 张赟, 等. 基于复合跨模态交互网络的时序多模态情感分析[J]. 计算机科学与探索, 2024, 18(5): 1318-1327.
YANG Li, ZHONG Junhong, ZHANG Yun, et al. Temporal multimodal sentiment analysis with composite cross modal interaction network[J]. Journal of frontiers of computer science and technology, 2024, 18(5): 1318-1327.
[13] ZADEH A, CHEN Minghai, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL]. (2017-07-23)[2024-09-06]. https://arxiv.org/abs/1707.07250.
[14] HAN Wei, CHEN Hui, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. Montréal: ACM, 2021: 6-15.
[15] 曾子明, 孙守强, 李青青. 基于融合策略的突发公共卫生事件网络舆情多模态负面情感识别[J]. 情报学报, 2023, 42(5): 611-622.
ZENG Ziming, SUN Shouqiang, LI Qingqing. Multimodal negative sentiment recognition in online public opinion during public health emergencies based on fusion strategy[J]. Journal of the China society for scientific and technical information, 2023, 42(5): 611-622.
[16] 杨颖, 钱馨雨, 王合宁. 结合多粒度视图动态融合的多模态方面级情感分析[J]. 计算机工程与应用, 2024, 60(22): 172-183.
YANG Ying, QIAN Xinyu, WANG Hening. Multimodal aspect-level sentiment analysis based on multi-granularity view dynamic fusion[J]. Computer engineering and applications, 2024, 60(22): 172-183.
[17] GAN Chenquan, FU Xiang, FENG Qingdong, et al. A multimodal fusion network with attention mechanisms for visual-textual sentiment analysis[J]. Expert systems with applications, 2024, 242: 122731.
[18] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Minneapolis: ACL, 2019: 4171-4186.
[19] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2024-09-06]. https://arxiv.org/abs/1409.0473.
[20] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248-255.
[21] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04)[2024-09-06]. https://arxiv.org/abs/1409.1556.
[22] ZHU Tong, LI Leida, YANG Jufeng, et al. Multimodal sentiment analysis with image-text interaction network[J]. IEEE transactions on multimedia, 2022, 25: 3375-3385.
[23] NIU Teng, ZHU Shiai, PANG Lei, et al. Sentiment analysis on multi-view social data[C]//MultiMedia Modeling. Cham: Springer International Publishing, 2016: 15-27.
[24] XU Nan, MAO Wenji. MultiSentiNet: a deep semant-ic network for multimodal sentiment analysis[C]//Proceedings of the 2017 ACM on Conference on Informa-tion and Knowledge Management. Singapore: ACM, 2017: 2399-2402.
[25] XU Nan, MAO Wenji, CHEN Guandan. A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Ann Arbor: ACM, 2018: 929–933.
[26] YANG Xiaocui, FENG Shi, ZHANG Yifei, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). [S. l. ]: ACL, 2021: 328-339.
[27] 蔡宇扬, 蒙祖强. 基于模态信息交互的多模态情感分析[J]. 计算机应用研究, 2023, 40(9): 2603-2608.
CAI Yuyang, MENG Zuqiang. Multimodal sentiment analysis based on modal information interaction[J]. Application research of computers, 2023, 40(9): 2603-2608.
[28] LU Wenjie, ZHANG Dong. Unified multi-modal multi-task joint learning for language-vision relation inference[C]//2022 IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1-6.
[29] 周婷, 杨长春. 基于多层注意力机制的图文双模态情感分析[J]. 计算机工程与设计, 2023, 44(6): 1853-1859.
ZHOU Ting, YANG Changchun. Image-text sentiment analysis based on multi-level attention mechanism[J]. Computer engineering and design, 2023, 44(6): 1853-1859.

相似文献/References:: [1]赵文清,侯小可,沙海虹.语义规则在微博热点话题情感分析中的应用[J].智能系统学报,2014,9(1):121.[doi:10.3969/j.issn.1673-4785.201208020]
　ZHAO Wenqing,HOU Xiaoke,SHA Haihong.Application of semantic rules to sentiment analysis of microblog hot topics[J].CAAI Transactions on Intelligent Systems,2014,9():121.[doi:10.3969/j.issn.1673-4785.201208020]
[2]吴钟强,张耀文,商琳.基于语义特征的多视图情感分类方法[J].智能系统学报,2017,12(5):745.[doi:10.11992/tis.201706026]
　WU Zhongqiang,ZHANG Yaowen,SHANG Lin.Multi-view sentiment classification of microblogs based on semantic features[J].CAAI Transactions on Intelligent Systems,2017,12():745.[doi:10.11992/tis.201706026]
[3]温晓红,刘华平,阎高伟,等.基于超限学习机的非线性典型相关分析及应用[J].智能系统学报,2018,13(4):633.[doi:10.11992/tis.201703034]
　WEN Xiaohong,LIU Huaping,YAN Gaowei,et al.Nonlinear canonical correlation analysis and application based on extreme learning machine[J].CAAI Transactions on Intelligent Systems,2018,13():633.[doi:10.11992/tis.201703034]
[4]贾晨,刘华平,续欣莹,等.基于宽度学习方法的多模态信息融合[J].智能系统学报,2019,14(1):150.[doi:10.11992/tis.201803022]
　JIA Chen,LIU Huaping,XU Xinying,et al.Multi-modal information fusion based on broad learning method[J].CAAI Transactions on Intelligent Systems,2019,14():150.[doi:10.11992/tis.201803022]
[5]曾碧卿,韩旭丽,王盛玉,等.层次化双注意力神经网络模型的情感分析研究[J].智能系统学报,2020,15(3):460.[doi:10.11992/tis.201812017]
　ZENG Biqing,HAN Xuli,WANG Shengyu,et al.Hierarchical double-attention neural networks for sentiment classification[J].CAAI Transactions on Intelligent Systems,2020,15():460.[doi:10.11992/tis.201812017]
[6]王召新,续欣莹,刘华平,等.基于级联宽度学习的多模态材质识别[J].智能系统学报,2020,15(4):787.[doi:10.11992/tis.201908021]
　WANG Zhaoxin,XU Xinying,LIU Huaping,et al.Cascade broad learning for multi-modal material recognition[J].CAAI Transactions on Intelligent Systems,2020,15():787.[doi:10.11992/tis.201908021]
[7]肖宇晗,林慧苹,汪权彬,等.基于双特征嵌套注意力的方面词情感分析算法[J].智能系统学报,2021,16(1):142.[doi:10.11992/tis.202012024]
　XIAO Yuhan,LIN Huiping,WANG Quanbin,et al.An algorithm for aspect-based sentiment analysis based on dual features attention-over-attention[J].CAAI Transactions on Intelligent Systems,2021,16():142.[doi:10.11992/tis.202012024]
[8]赵小明,唐志伟,张石清.面向听视觉信息的多模态人格识别研究进展[J].智能系统学报,2021,16(2):189.[doi:10.11992/tis.202101034]
　ZHAO Xiaoming,TANG Zhiwei,ZHANG Shiqing.Research advance of multimodal personality recognition based on audio and visual cues[J].CAAI Transactions on Intelligent Systems,2021,16():189.[doi:10.11992/tis.202101034]
[9]张铭泉,周辉,曹锦纲.基于注意力机制的双BERT有向情感文本分类研究[J].智能系统学报,2022,17(6):1220.[doi:10.11992/tis.202112038]
　ZHANG Mingquan,ZHOU Hui,CAO Jingang.Dual BERT directed sentiment text classification based on attention mechanism[J].CAAI Transactions on Intelligent Systems,2022,17():1220.[doi:10.11992/tis.202112038]
[10]胡文彬,陈龙,黄贤波,等.融合交叉注意力的突发事件多模态中文反讽识别模型[J].智能系统学报,2024,19(2):392.[doi:10.11992/tis.202212011]
　HU Wenbin,CHEN Long,HUANG Xianbo,et al.A multimodal Chinese sarcasm detection model for emergencies based on cross attention[J].CAAI Transactions on Intelligent Systems,2024,19():392.[doi:10.11992/tis.202212011]

备注/Memo

收稿日期:2024-9-6。
基金项目:国家自然科学基金项目（72174079）; 江苏省“青蓝工程”大数据优秀教学团队项目（2022-29）.
作者简介:仲兆满，教授，江苏海洋大学计算机工程学院院长，中国矿业大学兼职博士生导师，主要研究方向为互联网舆情大数据分析及管控。主持国家自然科学基金面上项目1项，获中国自动化学会科技进步奖二等奖，发表学术论文50余篇，出版专著1部。E-mail： zhongzhaoman@163.com。;樊继冬，硕士研究生，主要研究方向为多模态情感分析、大数据采集与分析。E-mail：ffanjdong@163.com。;张渝，硕士研究生，主要研究方向为网络舆情分析、方面级情感分析。E-mail：zhou90616@gmail.com。
通讯作者:仲兆满. E-mail：zhongzhaoman@163.com

更新日期/Last Update: 1900-01-01

基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型 PDF下载HTML

备注/Memo

基于卷积交叉注意力与跨模态动态门控的多模态情感分析模型

PDF下载 HTML