[1]赵雪峰,狄恒西,柏长泽,等.结合多面图像特征提取和门控融合机制的多模态方面级情感分析[J].智能系统学报,2025,20(6):1461-1473.[doi:10.11992/tis.202503032]
ZHAO Xuefeng,DI Hengxi,BAI Changze,et al.Multimodal aspect-based sentiment analysis combining multifaceted image feature extraction and gated fusion mechanism[J].CAAI Transactions on Intelligent Systems,2025,20(6):1461-1473.[doi:10.11992/tis.202503032]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第6期
页码:
1461-1473
栏目:
学术论文—机器感知与模式识别
出版日期:
2025-11-05
- Title:
-
Multimodal aspect-based sentiment analysis combining multifaceted image feature extraction and gated fusion mechanism
- 作者:
-
赵雪峰, 狄恒西, 柏长泽, 仲兆满, 仲晓敏
-
江苏海洋大学 计算机工程学院, 江苏 连云港 222005
- Author(s):
-
ZHAO Xuefeng, DI Hengxi, BAI Changze, ZHONG Zhaoman, ZHONG Xiaomin
-
College of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China
-
- 关键词:
-
全局特征; 多模态; 方面级情感分析; 文本描述; 门控机制; 交互注意力; 图片提示; 预训练语言模型
- Keywords:
-
global feature; multimodal; aspect-based sentiment analysis; text description; gating mechanism; cross attention; image-prompt; pre-trained language model
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202503032
- 摘要:
-
针对现阶段多模态方面级情感分析(multimodal aspect-based sentiment analysis, MABSA)模型仅提取单一图像全局特征、忽略关键细节信息的问题,提出一种结合多面图像特征提取和门控融合机制的网络模型。该模型通过构建多面图像特征提取模块,采用跨模态翻译技术,从图像中与情感相关的多个维度生成场景、人脸、物体和颜色文本描述,实现细节信息提取与跨模态信息对齐;设计门控融合交互模块,引入门控机制与交互注意力实现特征间的高效融合交互;为了弥补不同模态间的表示差距,构建融合图片提示的序列信息,将图像特征转换到预训练语言模型(pre-trained language model, PLM)的输入空间中,实现更准确的情感分类。在Twitter-2015和Twitter-2017数据集上的实验表明,该模型较现有模型在准确率和F1上平均提高0.93%和0.52%,能有效改善情感分类效果。
- Abstract:
-
Existing multimodal aspect-based sentiment analysis models only extract single global image features, thereby overlooking key detailed information. To address this issue, this study proposes a network model that combines multifaceted image feature extraction and a gated fusion mechanism. Specifically, a multifaceted image feature extraction module is constructed in the proposed model. By leveraging cross-modal translation technology, textual descriptions of scenes, human faces, objects, and colors are generated from multiple sentiment-related dimensions of the image. This process achieves detailed information extraction and cross-modal information alignment. Furthermore, a gated fusion interaction module has been developed, incorporating a gating mechanism and interactive attention to facilitate efficient fusion and interaction between features. In order to address the representation gap across different modalities, sequence information is integrated with image prompts to convert image features into the input space of the pre-trained language model (PLM). This facilitates more accurate sentiment classification. Experiments conducted on the Twitter-2015 and Twitter-2017 datasets demonstrate that compared with existing models, the proposed model achieves an average improvement of 0.93% in accuracy and 0.52% in F1-score, effectively enhancing the performance of sentiment classification.
备注/Memo
收稿日期:2025-3-24。
基金项目:国家自然科学基金项目(72174079);江苏省“青蓝工程”优秀教学团队项目(2022-29).
作者简介:赵雪峰,副教授,博士,江苏省科技副总,主要研究方向为多模态情感分析、数字图像处理与无损检测。作为主要成员主持、参与完成省市级项目4项,发表学术论文20余篇。E-mail:zhaoxf@jou.edu.cn。;狄恒西,硕士研究生, 主要研究方向为多模态情感分析、自然语言处理。E-mail:dihx@jou.edu.cn。;柏长泽,硕士研究生, 主要研究方向为多模态情感分析、自然语言处理。E-mail:2023220901@jou.edu。
通讯作者:赵雪峰. E-mail:zhaoxf@jou.edu.cn
更新日期/Last Update:
1900-01-01