[1]孟想,王博岳,高祎菡,等.基于视觉-语言关键线索挖掘的多模态假新闻检测模型[J].智能系统学报,2026,21(1):109-119.[doi:10.11992/tis.202505007]
MENG Xiang,WANG Boyue,GAO Yihan,et al.Visual-language key clue discovery-based multimodal fake news detection model[J].CAAI Transactions on Intelligent Systems,2026,21(1):109-119.[doi:10.11992/tis.202505007]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
21
期数:
2026年第1期
页码:
109-119
栏目:
学术论文—机器感知与模式识别
出版日期:
2026-03-05
- Title:
-
Visual-language key clue discovery-based multimodal fake news detection model
- 作者:
-
孟想, 王博岳, 高祎菡, 吴广超, 刘易昆, 吕松澄, 尹宝才
-
北京工业大学 信息科学技术学院, 北京 100124
- Author(s):
-
MENG Xiang, WANG Boyue, GAO Yihan, WU Guangchao, LIU Yikun, LYU Songcheng, YIN Baocai
-
School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China
-
- 关键词:
-
多模态虚假新闻检测; 多尺度特征交互; 关键线索发现; 细尺度表示; 跨模态注意力; 全局特征对齐; 记忆增强机制; 语义不一致检测
- Keywords:
-
multimodal fake news detection; multi-scale feature interaction; key clue discovery; fine-grained representation; cross-modal attention; global feature alignment; memory-enhanced mechanism; semantic inconsistency detection
- 分类号:
-
TP391.1
- DOI:
-
10.11992/tis.202505007
- 摘要:
-
为了解决现有模型在应对虚假新闻时往往忽视具有判别性的局部细节且难以准确捕捉图文间关键矛盾关系的问题,本文提出一种基于视觉-语言关键线索挖掘的多模态假新闻检测模型(visual-language key clue discovery-based multi-modal fake news detection model,VKC-MFND),这也是一种具有决定性区域/位置感知的多尺度交互模型。该模型包含多尺度特征提取模块、关键特征信息提取模块以及多尺度特征对齐模块3个关键模块。具体而言,多尺度特征提取模块用于提取文本与图像在不同尺度层面的特征,包括句子级/描述级的全局特征和词级/目标框级的局部特征,从而全面理解多模态数据,增强信息的表达能力;关键特征信息提取模块借助注意力机制,在细尺度特征之间进行交互,以发现具有判别性的关键内容,并与全局语义进行对齐,实现对图文间关键线索的有效融合;多尺度特征对齐模块通过联合分类损失与对齐损失进行优化,进一步挖掘全局语义特征,实现语义空间的一致性。实验结果表明,所提出的模型在Weibo、Weibo-19及Pheme等多个主流多模态假新闻数据集上均优于现有先进方法,展现出更优的检测性能。消融实验进一步验证了各子模块在整体模型中的有效性和必要性。本研究的结论可为未来多模态假新闻检测模型的设计与优化提供指导。
- Abstract:
-
Multimodal fake news detection aims to enhance the reliability of authenticity assessment by integrating diverse modalities such as text, images, videos, and audio. However, existing models often overlook discriminative local details and struggle to capture the critical inconsistencies between textual and visual content. To address these challenges, this study proposes a novel multimodal fake news detection model, termed the visual-language key clue discovery-based multimodal fake news detection model (VKC-MFND), which is designed to discover key visual-linguistic cues. The model comprises three main components: a multi-scale feature extraction module, a key feature information extraction module, and a multi-scale feature alignment module. Specifically, the multi-scale feature extraction module captures both global features (sentence-level or description-level) and local features (word-level or object box-level) from text and images, thereby enriching the diversity of information representation. The key feature information extraction module utilizes attention-based interactions among fine-grained features to uncover discriminative clues and aligns them with global semantic representations, facilitating the fusion of critical cross-modal information. Meanwhile, the multi-scale feature alignment module optimizes the model using both classification and alignment losses, enhancing semantic consistency in the shared feature space. Extensive experiments conducted on three benchmark multimodal fake news datasets—Weibo, Weibo-19, and Pheme—demonstrate that the proposed model significantly outperforms state-of-the-art approaches. Further ablation studies confirm the effectiveness and necessity of each component in the model.
备注/Memo
收稿日期:2025-5-16。
基金项目:国家自然科学基金项目(92370102).
作者简介:孟想,主要研究方向为多模态真假新闻检测。E-mail:mx2005@emails.bjut.edu.cn。;王博岳,教授,主要研究方向为跨媒体数据分析、图结构学习。主持国家自然科学基金项目等10余项,发表学术论文10余篇。E-mail:wby@bjut.edu.cn。;尹宝才,教授,主要研究方向为多媒体技术、跨媒体智能、视频编码。主持国家青年科学基金项目A类、国家自然科学基金重大项目课题等多项,发表学术论文100余篇。 E-mail:ybc@bjut.edu.cn。
通讯作者:尹宝才. E-mail:ybc@bjut.edu.cn
更新日期/Last Update:
2026-01-05