[1]赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449.
ZHAO Wenqing,HOU Xiaoke.News topic recognition of Chinese microblog based on word cooccurrence graph[J].CAAI Transactions on Intelligent Systems,2012,7(5):444-449.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
7
期数:
2012年第5期
页码:
444-449
栏目:
学术论文—自然语言处理与理解
出版日期:
2012-10-25
- Title:
-
News topic recognition of Chinese microblog based on word cooccurrence graph
- 文章编号:
-
1673-4785(2012)05-0444-06
- 作者:
-
赵文清,侯小可
-
华北电力大学 控制与计算机工程学院,河北 保定 071003
- Author(s):
-
ZHAO Wenqing, HOU Xiaoke
-
School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China
-
- 关键词:
-
微博; 新闻话题; 新闻话题识别; 主题词; 词共现图
- Keywords:
-
microblog; news topics; topic recognition; keywords; word cooccurrence graph
- 分类号:
-
TP391.1
- 文献标志码:
-
A
- 摘要:
-
针对传统的话题检测算法主要适用于新闻网页和博客等长文本信息,而不能有效处理具有稀疏性的微博数据,给出一种基于词共现图的方法来识别微博中的新闻话题.该方法首先在微博数据预处理之后,综合相对词频和词频增加率2个因素抽取微博数据中的主题词.然后根据主题词间的共现度构建词共现图,把词共现图中每个不连通的簇集看成一个新闻话题,并使用每个簇集中包含信息量较大的几个主题词来表示微博新闻话题.最后在微博数据集上进行实验,实现了对微博中新闻话题的识别,验证了该方法的有效性.
- Abstract:
-
The traditional topic detection algorithm is applied to longer texts such as: news website pages or blogs, causing it to be hard to deal with sparse microblog data effectively. In this paper, a method based on the word cooccurrence graph was provided to detect news topics of microblogs. Firstly, the relative word frequency and the word frequency increase rate were considered to extract new keywords from microblog text after pretreatment. Secondly, a word cooccurrence graph was built by cooccurrence degrees of keywords; each unconnected cluster in a word cooccurrence graph was taken as a news topic by calculating several keywords.These keywords contain much more information in each cluster, was used to represent a news topic of microblog. Finally, data analysis provided evidence on how the approach is most effective and also revealed the microblog data set recognized news topic recognition.
更新日期/Last Update:
2012-11-13