<-Previous Article Next Article->

[1]QU Haicheng,XU Bo.Multimodal sentiment analysis based on adaptive graph learning weight[J].CAAI Transactions on Intelligent Systems,2025,20(2):516-528.[doi:10.11992/tis.202401001]

Copy

Multimodal sentiment analysis based on adaptive graph learning weight

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 2 Page number: 516-528 Column: 人工智能院长论坛 Public date: 2025-03-05

Title:: Multimodal sentiment analysis based on adaptive graph learning weight

Author(s):: QU Haicheng; XU Bo; School of Software, Liaoning Technical University, Huludao 125105, China

Keywords:: multimodal; sentiment analysis; modal differences; information redundancy; adaptive graph learning; cross modal attention; similarity constraints; information bottleneck

CLC:: TP391

DOI:: 10.11992/tis.202401001

Abstract:: The inconsistency in representing different modalities in multimodal sentiment analysis tasks results in significant differences in the density of emotional information between modalities. A multimodal sentiment analysis method based on adaptive graph learning weights is proposed to balance the uneven distribution of emotional information in different modalities and reduce the redundancy of multimodal feature representations. First, different feature extraction methods are used to capture specific information within each mode. Second, different modalities are mapped to the same space through a common encoder, and cross-modal attention mechanisms are used to explicitly construct correlations between modalities. Third, the predicted values and modal representations of each modality for task classification are embedded into the adaptive graph, and the contribution of different modalities to the final classification task is learned through modal labels to dynamically adjust the weights between different modalities for adapting to changes in the dominant modality. Finally, an information bottleneck mechanism is introduced for denoising, aiming to learn a nonredundant multimodal feature representation for sentiment prediction. The proposed model is evaluated on the publicly available multimodal sentiment analysis datasets. Experimental results show that its effectively improving the accuracy of multimodal sentiment analysis.

References:: [1] PE?A D, AGUILERA A, DONGO I, et al. A framework to evaluate fusion methods for multimodal emotion recognition[J]. IEEE access, 2023, 11: 10218-10237.
[2] ZHANG Junling, WU Xuemei, HUANG Changqin. AdaMoW: multimodal sentiment analysis based on adaptive modality-specific weight fusion network[J]. IEEE access, 2023, 11: 48410-48420.
[3] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438.
ZHANG Yazhou, RONG Lu, SONG Dawei, et al. A survey on multimodal sentiment analysis[J]. Pattern recognition and artificial intelligence, 2020, 33(5): 426-438.
[4] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information fusion, 2023, 91: 424-444.
[5] MAI Sijie, HU Haifeng, XING Songlong. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 164-172.
[6] ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 5642-5649.
[7] HAN Wei, CHEN Hui, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. Montréal: ACM, 2021: 6-15.
[8] 刘颖, 王哲, 房杰, 等. 基于图文融合的多模态舆情分析[J]. 计算机科学与探索, 2022, 16(6): 1260-1278.
LIU Ying, WANG Zhe, FANG Jie, et al. Multi-modal public opinion analysis based on image and text fusion[J]. Journal of frontiers of computer science and technology, 2022, 16(6): 1260-1278.
[9] HUANG Changqin, ZHANG Junling, WU Xuemei, et al. TeFNA: text-centered fusion network with crossmodal attention for multimodal sentiment analysis[J]. Knowledge-based systems, 2023, 269: 110502.
[10] SUN Hao, LIU Jiaqing, CHEN Y W, et al. Modality-invariant temporal representation learning for multimodal sentiment classification[J]. Information fusion, 2023, 91: 504-514.
[11] MAI Sijie, ZENG Ying, HU Haifeng. Multimodal information bottleneck: learning minimal sufficient unimodal and multimodal representations[J]. IEEE transactions on multimedia, 2023, 25: 4121-4134.
[12] ZADEH A, CHEN Minghai, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[EB/OL]. (2017-07-23)[2024-01-02]. https://arxiv.org/abs/1707.07250.
[13] TSAI Y H H, BAI Shaojie, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the Conference Association for Computational Linguistics Meeting. Florence: ACL, 2019: 6558-6569.
[14] SU Guixin, HE Junyi, LI Xia, et al. NFCMF: noise filtering and CrossModal fusion for multimodal sentiment analysis[C]//2021 International Conference on Asian Language Processing. Singapore: IEEE, 2021: 316-321.
[15] YANG Shuo, XU Zhaopan, WANG Kai, et al. BiCro: noisy correspondence rectification for multi-modality data via bi-directional cross-modal similarity consistency[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 19883-19892.
[16] 孙杰, 车文刚, 高盛祥. 面向多模态情感分析的多通道时序卷积融合[J]. 计算机科学与探索, 2024, 18(11): 3041-3050.
SUN Jie, CHE Wengang, GAO Shengxiang. Multi-channel temporal convolution fusion for multimodal sentiment analysis[J]. Journal of frontiers of computer science and technology, 2024, 18(11): 3041-3050.
[17] 鲍小异, 姜晓彤, 王中卿, 等. 基于跨语言图神经网络模型的属性级情感分类[J]. 软件学报, 2023, 34(2): 676-689.
BAO Xiaoyi, JIANG Xiaotong, WANG Zhongqing, et al. Cross-lingual aspect-level sentiment classification with graph neural network[J]. Journal of software, 2023, 34(2): 676-689.
[18] JANGRA A, MUKHERJEE S, JATOWT A, et al. A survey on multi-modal summarization[J]. ACM computing surveys, 2023, 55(13s): 1-36.
[19] MAJUMDER N, HAZARIKA D, GELBUKH A, et al. Multimodal sentiment analysis using hierarchical fusion with context modeling[J]. Knowledge-based systems, 2018, 161: 124-133.
[20] XU Nan, MAO Wenji. MultiSentiNet: a deep semantic network for multimodal sentiment analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: ACM, 2017: 2399-2402.
[21] AKHTAR M S, CHAUHAN D S, GHOSAL D, et al. Multi-task learning for multi-modal emotion recognition and sentiment analysis[EB/OL]. (2019-05-14) [2024-01-02]. https://arxiv.org/abs/1905.05812.
[22] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020: 1122-1131.
[23] YU Wenmeng, XU Hua, YUAN Ziqi, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2021: 10790-10797.
[24] ZUO Haolin, LIU Rui, ZHAO Jinming, et al. Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities[C]//2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1-5.
[25] LIU A H, JIN S, LAI C I J, et al. Cross-modal discrete representation learning[EB/OL]. (2021-06-10) [2024-01-02]. https://arxiv.org/abs/2106.05438.
[26] 胡文彬, 陈龙, 黄贤波, 等. 融合交叉注意力的突发事件多模态中文反讽识别模型[J]. 智能系统学报, 2024, 19(2): 392-400.
HU Wenbin, CHEN Long, HUANG Xianbo, et al. A multimodal Chinese sarcasm detection model for emergencies based on cross attention[J]. CAAI transactions on intelligent systems, 2024, 19(2): 392-400.
[27] 李梦云, 张景, 张换香, 等. 基于跨模态语义信息增强的多模态情感分析[J]. 计算机科学与探索, 2024, 18(9): 2476-2486.
LI Mengyun, ZHANG Jing, ZHANG Huanxiang, et al. Multimodal sentiment analysis based on cross-modal semantic information enhancement[J]. Journal of frontiers of computer science and technology, 2024, 18(9): 2476-2486.
[28] 包广斌, 李港乐, 王国雄. 面向多模态情感分析的双模态交互注意力[J]. 计算机科学与探索, 2022, 16(4): 909-916.
BAO Guangbin, LI Gangle, WANG Guoxiong. Bimodal interactive attention for multimodal sentiment analysis[J]. Journal of frontiers of computer science and technology, 2022, 16(4): 909-916.
[29] TANG Jiajia, LIU Dongjun, JIN Xuanyu, et al. BAFN: bi-direction attention based fusion network for multimodal sentiment analysis[J]. IEEE transactions on circuits and systems for video technology, 2023, 33(4): 1966-1978.
[30] LYU Fengmao, CHEN Xiang, HUANG Yanyong, et al. Progressive modality reinforcement for human multimodal emotion recognition from unaligned multimodal sequences[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 2554-2562.
[31] RAHMAN W, HASAN M K, LEE Sangwu, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the conference. Association for Computational Linguistics. Online: ACL, 2020: 2359-2369.
[32] SANGWAN S, CHAUHAN D S, AKHTAR M S, et al. Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis[C]//Neural Information Processing. Sydney: Springer, 2019: 662-669.
[33] HUANG Yanping, PENG Hong, LIU Qian, et al. Attention-enabled gated spiking neural P model for aspect-level sentiment classification[J]. Neural networks, 2023, 157: 437-443.
[34] TISHBY N, PEREIRA F C, BIALEK W. The information bottleneck method[EB/OL]. (2000-04-24) [2024-01-02]. https://arxiv.org/abs/physics/0004057.
[35] ALEMI A A, FISCHER I, DILLON J V, et al. Deep variational information bottleneck[EB/OL]. (2016-12-01) [2024-01-02]. https://arxiv.org/abs/1612.00410.
[36] ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages[J]. IEEE intelligent systems, 2016, 31(6): 82-88.
[37] BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: ACL, 2018: 2236-2246.
[38] LIU Zhun, SHEN Ying, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL]. (2018-05-31) [2024-01-02]. https://arxiv.org/abs/1806.00064.
[39] HAN Wei, CHEN Hui, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[EB/OL]. (2021-09-01) [2024-01-02]. https://arxiv.org/abs/2109.00412.
[40] WANG Di, GUO Xutong, TIAN Yumin, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern recognition, 2023, 136: 109259.
[41] LI Yong, WANG Yuanzhi, CUI Zhen. Decoupled multimodal distilling for emotion recognition[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 6631-6640.

Similar References:

Memo

Last Update: 2025-03-05

Multimodal sentiment analysis based on adaptive graph learning weight PDF DownloadHTML

Memo

Multimodal sentiment analysis based on adaptive graph learning weight

PDF Download HTML