[1]朱超杰,闫昱名,初宝昌,等.采用目标注意力的方面级多模态情感分析研究[J].智能系统学报,2024,19(6):1562-1572.[doi:10.11992/tis.202404009]
ZHU Chaojie,YAN Yuming,CHU Baochang,et al.Aspect-level multimodal sentiment analysis via object-attention[J].CAAI Transactions on Intelligent Systems,2024,19(6):1562-1572.[doi:10.11992/tis.202404009]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第6期
页码:
1562-1572
栏目:
人工智能院长论坛
出版日期:
2024-12-05
- Title:
-
Aspect-level multimodal sentiment analysis via object-attention
- 作者:
-
朱超杰1, 闫昱名2, 初宝昌2, 李刚2, 黄河燕1, 高小燕3
-
1. 北京理工大学 计算机学院, 北京 100081;
2. 北京华电电子商务科技有限公司, 北京 100073;
3. 北京工业大学 计算机学院, 北京 100124
- Author(s):
-
ZHU Chaojie1, YAN Yuming2, CHU Baochang2, LI Gang2, HUANG Heyan1, GAO Xiaoyan3
-
1. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China;
2. Beijing Huadian E-Commerce Technology Co., Ltd., Beijing 100073, China;
3. Faculty of Information Technology, Beijing University of Technology, Be
-
- 关键词:
-
方面级情感分析; 多模态; 情感分析; 目标检测; 自注意力机制; 自然语言处理; 深度学习; 特征提取
- Keywords:
-
aspect-level sentiment analysis; multimodal; sentiment analysis; object detection; self-attention; natural language processing systems; deep learning; feature extraction
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202404009
- 摘要:
-
方面级的多模态情感分析(aspect-level multimodal sentiment analysis,ALMSA)旨在识别出语句和图像信息在某个特定方面上所表现出的情感极性。该任务现有分析模型使用的均是图像的全局特征,并未考虑原始图像信息中的细节信息。针对这一问题,提出一种基于目标注意力的方面级多模态情感分析模型OAB-ALMSA(object-attention based aspect-level multimodal sentiment analysis)。采用目标检测算法捕获原始图像中目标的细节信息;引入目标注意力机制并构建迭代的融合层来完成多模态信息的充分融合;针对数据较高的复杂性所导致的训练困难问题,为模型制定课程式学习策略。经课程式学习训练的OAB-ALMSA模型在TWITTER-2015数据集上得到了最高的F1,这表明对图像中细节信息的利用能够提高模型对数据的综合理解,提升预测效果。
- Abstract:
-
Aspect-level multimodal sentiment analysis (ALMSA) aims to identify the sentiment polarity of a specific aspect word using both sentence and image data. Current models often rely on the global features of images, overlooking the details in the original image. To address this issue, we propose an object attention-based aspect-level multimodal sentiment analysis model (OAB-ALMSA). This model first employs an object detection algorithm to capture the detailed information of the objects from the original image. It then applies an object-attention mechanism and builds an iterative fusion layer to fully fuse the multimodal information. Finally, a curriculum learning strategy is developed to tackle the challenges of training with complex samples. Experiments conducted on TWITTER-2015 data sets demonstrate that OAB-ALMSA, when combined with curriculum learning, achieves the highest F1. These results highlight that leveraging detailed image data enhances the model’s overall understanding and improves prediction accuracy.
更新日期/Last Update:
2024-11-05