[1]陈晓琪,谢振平,刘渊.增量采样聚类驱动的新闻事件发现[J].智能系统学报,2020,15(6):1175-1184.[doi:10.11992/tis.201912037]
 CHEN Xiaoqi,XIE Zhenping,LIU Yuan.News event detection driven by incremental sampling clustering[J].CAAI Transactions on Intelligent Systems,2020,15(6):1175-1184.[doi:10.11992/tis.201912037]
点击复制

增量采样聚类驱动的新闻事件发现(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年6期
页码:
1175-1184
栏目:
学术论文—自然语言处理与理解
出版日期:
2020-11-05

文章信息/Info

Title:
News event detection driven by incremental sampling clustering
作者:
陈晓琪12 谢振平12 刘渊12
1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122;
2. 江南大学 江苏省媒体设计与软件技术重点实验室, 江苏 无锡 214122
Author(s):
CHEN Xiaoqi12 XIE Zhenping12 LIU Yuan12
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;
2. Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi 214122, China
关键词:
新闻流数据事件发现代表性新闻增量采样信息支撑度近邻传播事件网络分层聚类
Keywords:
news flow dataevent detectionrepresentative newsincremental samplinginformation supporting degreeaffinity propagationevent networkhierarchical clustering
分类号:
TP391
DOI:
10.11992/tis.201912037
摘要:
为获得更好的事件发现和代表性新闻抽取性能,引入数据集代表点采样聚类的视角,研究实现了一种事件发现及表示的集成分析方法。对于给定的新闻流数据,首先引入信息支撑度定义新闻间关系权重和事件关系权重,并通过引入双层近邻传播算法的迭代构建整体时间流上的单向事件内容支撑度网络,实现代表性新闻的分层增量采样,进一步考虑以最大相似度划分策略实现代表性新闻上的整体新闻流数据聚类。实验结果表明,相比于现有相关方法,新方法在大规模新闻流数据上具有显著的计算效率,可提取出新闻流中极有代表性的新闻,以及获得更好的新闻文档聚类质量,其热点事件发现结果与权威机构评选的重大新闻有极高吻合度。
Abstract:
For obtaining better performance of event detection and representative news extraction, an integrated analysis method of event detection and representation is proposed by introducing the sampling clustering strategy on news documents. For a given news flow data, first, we present two-weight definitions on the relationships between news and events by introducing an information supporting degree concept and then construct a one-way event content support network on the whole time flow using the iterative algorithm of double-layer nearest affinity propagation to realize layer-by-layer incremental sampling of representative news. Furthermore, overall news clustering was performed by using the maximum similarity division strategy. According to our experimental results, compared with existing related methods, the new method has significant computational efficiency for processing large-scale news flow data. It can extract the most representative news from the news flow and obtain better clustering quality of news documents. Its hot event detection results are highly consistent with the major news selected by the authority.

参考文献/References:

[1] QU Xiaoting, YANG Juan, WU Bin, et al. A news event detection algorithm based on key elements recognition[C]//Proceedings of 2016 IEEE First International Conference on Data Science in Cyberspace. Changsha, China, 2016:394-399.
[2] YAN Danfeng, HUA Enzheng, HU Bo. An improved single-pass algorithm for Chinese microblog topic detection and tracking[C]//Proceedings of 2016 IEEE International Congress on Big Data. San Francisco, USA, 2016:251-258.
[3] 路荣, 项亮, 刘明荣, 等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J]. 模式识别与人工智能, 2012, 25(3): 382-387
LU Rong, XIANG Liang, LIU Mingrong, et al. Discovering news topics from microblogs based on hidden topics analysis and text clustering[J]. Pattern recognition and artificial intelligence, 2012, 25(3): 382-387
[4] GENG Xiao, ZHANG Yanmei, JIAO Yuhang, et al. A novel hybrid clustering algorithm for topic detection on Chinese microblogging[J]. IEEE transactions on computational social systems, 2019, 6(2): 289-300.
[5] GUAN Renchu, SHI Xiaohu, MARCHESE M, et al. Text clustering with seeds affinity propagation[J]. IEEE transactions on knowledge and data engineering, 2011, 23(4): 627-637.
[6] SHRIVASTAVA S K, RANA J L, JAIN R C. Text document clustering based on phrase similarity using affinity propagation[J]. International journal of computer applications, 2013, 61(18): 38-44.
[7] ALLAN J, PAPKA R, LAVRENKO V. On-line new event detection and tracking[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Canberra, Australia, 1998: 37-45.
[8] 赵旭剑, 杨春明, 李波, 等. 一种基于特征演变的新闻话题演化挖掘方法[J]. 计算机学报, 2014, 37(4): 819-832
ZHAO Xujian, YANG Chunming, LI Bo, et al. A topic evolution mining algorithm of news text based on feature evolving[J]. Chinese journal of computers, 2014, 37(4): 819-832
[9] YIN Jianhua, WANG Jianyong. A dirichlet multinomial mixture model-based approach for short text clustering[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2014: 233-242.
[10] 周楠, 杜攀, 靳小龙, 等. 面向舆情事件的子话题标签生成模型ET-TAG[J]. 计算机学报, 2018, 41(7): 1490-1503
ZHOU Nan, DU Pan, JIN Xiaolong, et al. ET-TAG: a tag generation model for the sub-topics of public opinion events[J]. Chinese journal of computers, 2018, 41(7): 1490-1503
[11] XU Guixian, MENG Yueting, CHEN Zhan, et al. Research on topic detection and tracking for online news texts[J]. IEEE access, 2019, 7: 58407-58418.
[12] 黄九鸣, 吴泉源, 张圣栋, 等. 基于AC-Trie的在线社交网络文本流热点短语挖掘[J]. 电子学报, 2016, 44(10): 2466-2470
HUANG Jiuming, WU Quanyuan, ZHANG Shengdong, et al. Mining hot phrases on sociai network text streams based on AC-Trie[J]. Acta electronica sinica, 2016, 44(10): 2466-2470
[13] CHEN Ling, TU Ding, LV Mingqi, et al. A knowledge-based semisupervised hierarchical online topic detection framework[J]. IEEE transactions on cybernetics, 2019, 49(9): 3307-3321.
[14] SAYYADI H, RASCHID L. A graph analytical approach for topic detection[J]. ACM transactions on internet technology, 2013, 13(2): 4.
[15] CHEN Peixian, ZHANG N L, LIU Tengfei, et al. Latent tree models for hierarchical topic detection[J]. Artificial intelligence, 2017, 250: 105-124.
[16] 柏文言, 张闯, 徐克付, 等. 一种融合用户关系的自适应微博话题跟踪方法[J]. 电子学报, 2017, 45(6): 1375-1381
BAI Wenyan, ZHANG Chuang, XU Kefu, et al. A self-adaptive microblog topic tracking method by user relationship[J]. Acta electronica sinica, 2017, 45(6): 1375-1381
[17] 张斌, 胡琳梅, 侯磊, 等. 基于词向量的中文事件发现及表示[J]. 模式识别与人工智能, 2018, 31(3): 275-282
ZHANG Bin, HU Linmei, HOU Lei, et al. Word embedding based Chinese news event detection and representation[J]. Pattern recognition and artificial intelligence, 2018, 31(3): 275-282
[18] FREY B J, DUECK D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976.
[19] 谢振平, 金晨, 刘渊. 基于建构主义学习理论的个性化知识推荐模型[J]. 计算机研究与发展, 2018, 55(1): 125-138
XIE Zhenping, JIN Chen, LIU Yuan. Personalized knowledge recommendation model based on constructivist learning theory[J]. Journal of computer research and development, 2018, 55(1): 125-138
[20] ARTHUR D, VASSILVITSKⅡ S. K-Means++: the advantages of careful seeding[C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, USA, 2007: 1027-1035.
[21] SUN Leilei, GUO Chonghui. Incremental affinity propagation clustering based on message passing[J]. IEEE transactions on knowledge and data engineering, 2014, 26(11): 2731-2744.
[22] SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[23] DAI Xiangying, HE Yancheng, SUN Yunlian. A two-layer text clustering approach for retrospective news event detection[C]//Proceedings of 2010 International Conference on Artificial Intelligence and Computational Intelligence. Sanya, China, 2010: 364-368.
[24] AMELIO A, PIZZUTI C. Is normalized mutual information a fair measure for comparing community detection methods?[C]//Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. Paris, France, 2015: 1584-1585.
[25] LUO Ruixuan, XU Jingjing, ZHANG Yi, et al. PKUSEG: a toolkit for multi-domain Chinese word segmentation[EB/OL]. [2019-06-22] http://axvio.org/abs/1906.11455.
[26] 人民日报社评选. 2018国内十大新闻[N]. 人民日报, 2018-12-29(02).
[27] 人民日报. 9张图速读12个月, 带你回顾即将过去的2018[DB/OL]. 人民日报微博, [2018-12-26]. https://baijiahao.baidu.com/s?id=1620877305413536727.

备注/Memo

备注/Memo:
收稿日期:2019-12-31。
基金项目:国家自然科学基金项目(61872166);江苏省“六大人才高峰”项目(2019XYDXX-161)
作者简介:陈晓琪,硕士研究生,主要研究方向为大数据知识发现;谢振平,教授,博士生导师,主要研究方向为知识表示与认知学习。主持或参与完成国家、省部级科研项目6项,承担产学研合作项目15项。获发明专利5项,发表学术论文30余篇;刘渊,教授,博士生导师,主要研究方向为网络安全、数字媒体。作为项目负责人完成了省部级科研项目3项。发表学术论文40余篇,出版专著1部
通讯作者:谢振平.E-mail:xiezp@jiangnan.edu.cn
更新日期/Last Update: 2020-12-25