[1]HU Wenbin,CHEN Long,HUANG Xianbo,et al.A multimodal Chinese sarcasm detection model for emergencies based on cross attention[J].CAAI Transactions on Intelligent Systems,2024,19(2):392-400.[doi:10.11992/tis.202212011]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 2
Page number:
392-400
Column:
学术论文—自然语言处理与理解
Public date:
2024-03-05
- Title:
-
A multimodal Chinese sarcasm detection model for emergencies based on cross attention
- Author(s):
-
HU Wenbin1; 2; CHEN Long1; HUANG Xianbo1; CHEN Chen1; ZHONG Zhaoman1; 2
-
1. School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China;
2. Jiangsu Institute of Marine Resources Development, Lianyungang 222005, China
-
- Keywords:
-
emergency; social media; multimodal comment; Chinese sarcasm detection; Chinese sarcasm dataset; cross-attention mechanism; attention mechanism; sentiment analysis
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202212011
- Abstract:
-
Internet users often use sarcasm when discussing emergencies on social media, which complicates emotional analysis. In addition, there is a lack of research on multimodal comments, particularly those in Chinese, and their use of sarcasm on social media platforms. Therefore, it is necessary to delve deeper into sarcasm detection in multimodal Chinese content, specifically within images and text. To address this need, we propose a multimodal Chinese sarcasm detection model called the fuse cross-attention model (FCAM). This model incorporates a cross-attention mechanism to identify inconsistencies between modes. The text convolutional neural network (TextCNN) is used to extract basic features of Chinese text, while the deep residential network (ResNet) is used to extract image features. The cross-attention mechanism is used to obtain attention features from the text and image layers. The residual method is employed to establish a connection between the basic text features and the text layer’s attention features, as well as a link between the image features and the image layer’s attention features. These two feature representations are fused using the attention mechanism, resulting in the sarcasm classification results through the classification layer. We have constructed a multimodal Chinese sarcasm data set based on Weibo comment data related to the COVID-19 pandemic in a specific region. Experimental testing on this data set confirms that FCAM holds certain advantages over the benchmark model.