[1]肖融,孔亮,张岩.基于文本的新闻事件多版本发现模型[J].智能系统学报,2012,7(4):307-314.
XIAO Rong,KONG Liang,ZHANG Yan.A text clustering model for diverse versions discovery[J].CAAI Transactions on Intelligent Systems,2012,7(4):307-314.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
7
期数:
2012年第4期
页码:
307-314
栏目:
学术论文—自然语言处理与理解
出版日期:
2012-08-25
- Title:
-
A text clustering model for diverse versions discovery
- 文章编号:
-
1673-4785(2012)04-0307-08
- 作者:
-
肖融,孔亮,张岩
-
北京大学 教育部机器感知重点实验室,北京 100871
- Author(s):
-
XIAO Rong, KONG Liang, ZHANG Yan
-
Key Laboratory on Machine Perception of MOE, Peking University, Beijing 100871, China
-
- 关键词:
-
多版本事件; 高区分度; 聚类模型; 话题分析
- Keywords:
-
diverse versions discovery; highlydifferentiated words; clustering model; topic analysis
- 分类号:
-
TP18
- 文献标志码:
-
A
- 摘要:
-
信息时代的发展让越来越多的新闻事件充斥人们的生活,对于一件特定的新闻事件,目前已有很多算法可以帮助人们进行事件追踪和发现.提出一种CDW算法,帮助读者对于一件具有多个版本描述的新闻事件进行多个不同版本的发现.这个算法将文档集映射到话题层,通过提取每个话题的流行词,以得到文档集中具有高区分度的特征.然后根据这些特征对文档集进行聚类,最后得到事件的多个版本.通过在2个实际数据集上进行实验,实验结果表明,该算法与以往的相关算法相比是十分有效的.
- Abstract:
-
The development of information technology brings numerous news and events to our daily life. Although previous researches have provided various algorithms to detect and track events, few of them focus on uncovering the diversified versions of an event. In this paper, a novel algorithm CDW which is capable of discovering different versions of one event according to the news reports was proposed. First, documents were mapped to the topic layer to get the information of each topic. Then the highlydifferentiated words of each topic were extracted to cluster the documents. At last, various versions of one event were got. Experiments conducted on two data sets show that the algorithm given in this paper is effective and outperforms various related algorithms, including classical methods such as Kmeans and linear discriminant analysis (LDA).
更新日期/Last Update:
2012-09-26