[1]陈晓琪,谢振平,刘渊.增量采样聚类驱动的新闻事件发现[J].智能系统学报,2020,15(6):1175-1184.[doi:10.11992/tis.201912037]
CHEN Xiaoqi,XIE Zhenping,LIU Yuan.News event detection driven by incremental sampling clustering[J].CAAI Transactions on Intelligent Systems,2020,15(6):1175-1184.[doi:10.11992/tis.201912037]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第6期
页码:
1175-1184
栏目:
学术论文—自然语言处理与理解
出版日期:
2020-11-05
- Title:
-
News event detection driven by incremental sampling clustering
- 作者:
-
陈晓琪1,2, 谢振平1,2, 刘渊1,2
-
1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122;
2. 江南大学 江苏省媒体设计与软件技术重点实验室, 江苏 无锡 214122
- Author(s):
-
CHEN Xiaoqi1,2, XIE Zhenping1,2, LIU Yuan1,2
-
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;
2. Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
新闻流数据; 事件发现; 代表性新闻; 增量采样; 信息支撑度; 近邻传播; 事件网络; 分层聚类
- Keywords:
-
news flow data; event detection; representative news; incremental sampling; information supporting degree; affinity propagation; event network; hierarchical clustering
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201912037
- 摘要:
-
为获得更好的事件发现和代表性新闻抽取性能,引入数据集代表点采样聚类的视角,研究实现了一种事件发现及表示的集成分析方法。对于给定的新闻流数据,首先引入信息支撑度定义新闻间关系权重和事件关系权重,并通过引入双层近邻传播算法的迭代构建整体时间流上的单向事件内容支撑度网络,实现代表性新闻的分层增量采样,进一步考虑以最大相似度划分策略实现代表性新闻上的整体新闻流数据聚类。实验结果表明,相比于现有相关方法,新方法在大规模新闻流数据上具有显著的计算效率,可提取出新闻流中极有代表性的新闻,以及获得更好的新闻文档聚类质量,其热点事件发现结果与权威机构评选的重大新闻有极高吻合度。
- Abstract:
-
For obtaining better performance of event detection and representative news extraction, an integrated analysis method of event detection and representation is proposed by introducing the sampling clustering strategy on news documents. For a given news flow data, first, we present two-weight definitions on the relationships between news and events by introducing an information supporting degree concept and then construct a one-way event content support network on the whole time flow using the iterative algorithm of double-layer nearest affinity propagation to realize layer-by-layer incremental sampling of representative news. Furthermore, overall news clustering was performed by using the maximum similarity division strategy. According to our experimental results, compared with existing related methods, the new method has significant computational efficiency for processing large-scale news flow data. It can extract the most representative news from the news flow and obtain better clustering quality of news documents. Its hot event detection results are highly consistent with the major news selected by the authority.
更新日期/Last Update:
2020-12-25