<-上一篇/Previous Article 下一篇/Next Article->

[1]孟想,王博岳,高祎菡,等.基于视觉-语言关键线索挖掘的多模态假新闻检测模型[J].智能系统学报,2026,21(1):109-119.[doi:10.11992/tis.202505007]
　MENG Xiang,WANG Boyue,GAO Yihan,et al.Visual-language key clue discovery-based multimodal fake news detection model[J].CAAI Transactions on Intelligent Systems,2026,21(1):109-119.[doi:10.11992/tis.202505007]

点击复制

基于视觉-语言关键线索挖掘的多模态假新闻检测模型

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 21 期数: 2026年第1期页码: 109-119 栏目: 学术论文—机器感知与模式识别出版日期: 2026-01-05

Title:: Visual-language key clue discovery-based multimodal fake news detection model

作者:: 孟想, 王博岳, 高祎菡, 吴广超, 刘易昆, 吕松澄, 尹宝才; 北京工业大学信息科学技术学院, 北京 100124

Author(s):: MENG Xiang, WANG Boyue, GAO Yihan, WU Guangchao, LIU Yikun, LYU Songcheng, YIN Baocai; School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China

关键词:: 多模态虚假新闻检测; 多尺度特征交互; 关键线索发现; 细尺度表示; 跨模态注意力; 全局特征对齐; 记忆增强机制; 语义不一致检测

Keywords:: multimodal fake news detection; multi-scale feature interaction; key clue discovery; fine-grained representation; cross-modal attention; global feature alignment; memory-enhanced mechanism; semantic inconsistency detection

分类号:: TP391.1

DOI:: 10.11992/tis.202505007

摘要:: 为了解决现有模型在应对虚假新闻时往往忽视具有判别性的局部细节且难以准确捕捉图文间关键矛盾关系的问题，本文提出一种基于视觉-语言关键线索挖掘的多模态假新闻检测模型(visual-language key clue discovery-based multi-modal fake news detection model，VKC-MFND)，这也是一种具有决定性区域/位置感知的多尺度交互模型。该模型包含多尺度特征提取模块、关键特征信息提取模块以及多尺度特征对齐模块3个关键模块。具体而言，多尺度特征提取模块用于提取文本与图像在不同尺度层面的特征，包括句子级/描述级的全局特征和词级/目标框级的局部特征，从而全面理解多模态数据，增强信息的表达能力；关键特征信息提取模块借助注意力机制，在细尺度特征之间进行交互，以发现具有判别性的关键内容，并与全局语义进行对齐，实现对图文间关键线索的有效融合；多尺度特征对齐模块通过联合分类损失与对齐损失进行优化，进一步挖掘全局语义特征，实现语义空间的一致性。实验结果表明，所提出的模型在Weibo、Weibo-19及Pheme等多个主流多模态假新闻数据集上均优于现有先进方法，展现出更优的检测性能。消融实验进一步验证了各子模块在整体模型中的有效性和必要性。本研究的结论可为未来多模态假新闻检测模型的设计与优化提供指导。

Abstract:: Multimodal fake news detection aims to enhance the reliability of authenticity assessment by integrating diverse modalities such as text, images, videos, and audio. However, existing models often overlook discriminative local details and struggle to capture the critical inconsistencies between textual and visual content. To address these challenges, this study proposes a novel multimodal fake news detection model, termed the visual-language key clue discovery-based multimodal fake news detection model (VKC-MFND), which is designed to discover key visual-linguistic cues. The model comprises three main components: a multi-scale feature extraction module, a key feature information extraction module, and a multi-scale feature alignment module. Specifically, the multi-scale feature extraction module captures both global features (sentence-level or description-level) and local features (word-level or object box-level) from text and images, thereby enriching the diversity of information representation. The key feature information extraction module utilizes attention-based interactions among fine-grained features to uncover discriminative clues and aligns them with global semantic representations, facilitating the fusion of critical cross-modal information. Meanwhile, the multi-scale feature alignment module optimizes the model using both classification and alignment losses, enhancing semantic consistency in the shared feature space. Extensive experiments conducted on three benchmark multimodal fake news datasets—Weibo, Weibo-19, and Pheme—demonstrate that the proposed model significantly outperforms state-of-the-art approaches. Further ablation studies confirm the effectiveness and necessity of each component in the model.

参考文献/References:: [1] VOSOUGHI S, ROY D, ARAL S. The spread of true and false news online[J]. Science, 2018, 359(6380): 1146-1151
[2] CAO Juan, QI Peng, SHENG Qiang, et al. Exploring the role of visual content in fake news detection[EB/OL]. (2020-03-11)[2025-04-20]. https://arxiv.org/abs/2003.05096.
[3] ZHANG Xichen, GHORBANI A A. An overview of online fake news: Characterization, detection, and discussion[J]. Information processing & management, 2020, 57(2): 102025
[4] ZHANG Zhenyu, ZHANG Lei, YANG Dingqi, et al. KRAN: knowledge refining attention network for recommendation[J]. ACM transactions on knowledge discovery from data, 2022, 16(2): 1-20
[5] NAN Qiong, CAO Juan, ZHU Yongchun, et al. MDFEND: multi-domain fake news detection[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Virtual Event: ACM, 2021.
[6] RADFORD A, KIM J K, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. Online: ICML, 2021.
[7] WU Yang, ZHAN Pengwei, ZHANG Yunjian, et al. Multimodal fusion with co-attention networks for fake news detection[C]//Findings of the Association for Computational Linguistics, Stroudsburg: USAACL, 2021.
[8] 王安然, 袁得嵛, 潘语泉, 等. 基于超图双重注意力机制的多模态谣言检测模型[J]. 计算机科学与探索, 2025, 19(11): 3033-3045 WANG Anran, YUAN Deyu, PAN Yuquan, et al. Multimodal rumor detection model based on hypergraph dual attention mechanism[J]. Journal of frontiers of computer science and technology, 2025, 19(11): 3033-3045
[9] QIAN Shengsheng, WANG Jinguang, HU Jun, et al. Hierarchical multi-modal contextual attention network for fake news detection[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Virtual Even: ACM, 2021.
[10] 赵梦凡, 张钰涛, 赵铤钊. 社交媒体假新闻检测: 基本理论、方法及研究方向[J]. 软件导刊, 2024, 23(9): 31-40. ZHAO Mengfan, ZHANG Yutao, ZHAO Tingzhao, et al. Social media fake news detection: basic theories, methods, and research directions[J]. Software guide, 23(9): 31–40.
[11] 朱枫, 张廷辉, 李鹏, 等. 基于多模态自适应融合的短视频虚假新闻检测[J]. 计算机科学, 2024, 51(11): 39-46. ZHU Feng, ZHANG Tinghui, LI Peng, et al. Multimodal adaptive fusion-based short video fake news detection[J]. Computer science, 51(11): 39-46.
[12] QI Peng, CAO Juan, LI Xirong, et al. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues[C]//Proceedings of the 29th ACM International Conference on Multimedia. Chengdu: ACM, 2021.
[13] CHEN Yixuan, LI Dongsheng, ZHANG Peng, et al. Cross-modal ambiguity learning for multimodal fake news detection[C]//Proceedings of the ACM Web Conference 2022. Virtual Event: ACM, 2022.
[14] YING Qichao, HU Xiaoxiao, ZHOU Yangming, et al. Bootstrapping multi-view representations for fake news detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023.
[15] ZHENG Jiaqi, ZHANG Xi, GUO Sanchuan, et al. MFAN: multi-modal feature-enhanced attention networks for rumor detection[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Vienna: International Joint Conferences on Artificial Intelligence Organization, 2022.
[16] 彭广川, 吴飞, 韩璐, 等. 基于跨模态交互与特征融合网络的假新闻检测方法[J]. 计算机科学, 2024, 51(11): 23-29. PENG Guangchuan, WU Fei, HAN Lu, et al. A fake news detection method based on cross-modal interaction and feature fusion network[J]. Computer science, 51(11), 23–29.
[17] 杨书新, 丁祺伟. 基于局部和全局特征聚合的虚假新闻检测方法[J]. 计算机工程与应用, 2025, 61(9): 139-147 YANG Shuxin, DING Qiwei. False news detection method based on local and global feature aggregation[J]. Computer engineering and applications, 2025, 61(9): 139-147
[18] LIU Xuannan, LI Peipei, HUANG Huaibo, et al. FKA-owl: advancing multimodal fake news detection through knowledge-augmented LVLMs[C]//Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne: ACM, 2024.
[19] ZHANG Pengchuan, LI Xiujun, HU Xiaowei, et al. VinVL: revisiting visual representations in vision-language models[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021.
[20] 袁玥, 刘永彬, 欧阳纯萍, 等. 基于一对多关系的多模态虚假新闻检测[J]. 中文信息学报, 2023, 37(9): 131-139 YUAN Yue, LIU Yongbin, OUYANG Chunping, et al. Multimodal fake news detection based on one-to-many relationships[J]. Journal of Chinese information processing, 2023, 37(9): 131-139
[21] KIM W, SON B, KIM I. Vilt: vision-and-language transformer without convolution or region supervision[C]//International Conference on Machine Learning. Online: PMLR, 2021.
[22] WANG Peng, YANG An, MEN Rui, et al. OFA: unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework[C]//International Conference on Machine Learning. Baltimore: PMLR, 2022.
[23] ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[EB/OL]. (2024-03-04)[2025-04-20]. https://arxiv.org/abs/2303.08774.
[24] 周昊玮, 刘勇, 玄萍. 基于预训练和多模态融合的假新闻检测[J]. 计算机工程, 2024, 50(1): 289-295 ZHOU Haowei, LIU Yong, XUAN Ping. Fake news detection based on pretraining and multimodal fusion[J]. Computer engineering, 2024, 50(1): 289-295
[25] WANG Yaqing, MA Fenglong, JIN Zhiwei, et al. EANN: event adversarial neural networks for multi-modal fake news detection[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM, 2018.
[26] KHATTAR D, GOUD J S, GUPTA M, et al. MVAE: multimodal variational autoencoder for fake news detection[C]//The World Wide Web Conference. San Francisco: ACM, 2019.
[27] SINGHAL S, SHAH R R, CHAKRABORTY T, et al. SpotFake: a multi-modal framework for fake news detection[C]//2019 IEEE Fifth International Conference on Multimedia Big Data. Singapore: IEEE, 2019.
[28] SINGHAL S, KABRA A, SHARMA M, et al. SpotFake+: a multimodal framework for fake news detection via transfer learning (student abstract)[C]//34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020.
[29] LI Jun, BIN Yi, ZOU Jie, et al. Cross-modal consistency learning with fine-grained fusion network for multimodal fake news detection[C]//Proceedings of the 5th ACM International Conference on Multimedia in Asia. New York: Association for Computing Machinery, 2023.
[30] ZHOU Xinyi, WU Jindi, ZAFARANI R, et al. SAFE: similarity-aware multi-modal fake news detection[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online: ACL, 2020.
[31] ZHENG Jiaqi, ZHANG Xi, GUO Sanchuan, et al. MFAN: multi-modal feature-enhanced attention networks for rumor detection[C]//International Joint Conference on Artificial Intelligence. Vienna: IJCAI, 2022.
[32] CHEN Yanchun, ZHANG Yuan, ZHANG Mengnan, et al. Consumption of coffee and tea with all-cause and cause-specific mortality: a prospective cohort study[J]. BMC medicine, 2022, 20(1): 449
[33] WANG Longzheng, ZHANG Chuang, XU Hongbo, et al. Cross-modal contrastive learning for multimodal fake news detection[C]//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa: ACM, 2023.
[34] LI Bo, ZHANG Yuanhan, GUO Dong, et al. LLaVA-onevision: easy visual task transfer[EB/OL]. (2024-10-26)[2025-01-20]. https://arxiv.org/abs/2408.03326.
[35] LIU Yihan, OTT M, GOYAL N, et al. Roberta: a robustly optimized bert pretraining approach[EB/OL]. (2019-07-26)[2025-04-20]. https://arxiv.org/abs/1907.11692.
[36] JIN Zhiwei, CAO Juan, GUO Han, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View: ACM, 2017.
[37] SONG Changhe, YANG Cheng, CHEN Huimin, et al. CED: credible early detection of social media rumors[J]. IEEE transactions on knowledge and data engineering, 2021, 33(8): 3035-3047
[38] ZUBIAGA A, LIAKATA M, PROCTER R. Exploiting context for rumour detection in social media[C]//The 9th International Conference on Social Informatics. Oxford: Springer International Publishing, 2017.

备注/Memo

收稿日期:2025-5-16。
基金项目:国家自然科学基金项目(92370102).
作者简介:孟想，主要研究方向为多模态真假新闻检测。E-mail：mx2005@emails.bjut.edu.cn。;王博岳，教授，主要研究方向为跨媒体数据分析、图结构学习。主持国家自然科学基金项目等10余项，发表学术论文10余篇。E-mail：wby@bjut.edu.cn。;尹宝才，教授，主要研究方向为多媒体技术、跨媒体智能、视频编码。主持国家青年科学基金项目A类、国家自然科学基金重大项目课题等多项，发表学术论文100余篇。 E-mail：ybc@bjut.edu.cn。
通讯作者:尹宝才. E-mail：ybc@bjut.edu.cn

更新日期/Last Update: 2026-01-05

基于视觉-语言关键线索挖掘的多模态假新闻检测模型 PDF下载HTML

备注/Memo

基于视觉-语言关键线索挖掘的多模态假新闻检测模型

PDF下载 HTML