[1]刘志雄,贾彩燕.面向用户兴趣与社区关系的微博话题检测方法[J].智能系统学报编辑部,2016,11(3):294-299.[doi:10.11992/tis.201603341]
LIU Zhixiong,JIA Caiyan.Micro-blog topic detection based on users’ interests and communities[J].CAAI Transactions on Intelligent Systems,2016,11(3):294-299.[doi:10.11992/tis.201603341]
点击复制
《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷:
11
期数:
2016年第3期
页码:
294-299
栏目:
学术论文—自然语言处理与理解
出版日期:
2016-06-25
- Title:
-
Micro-blog topic detection based on users’ interests and communities
- 作者:
-
刘志雄1,2, 贾彩燕1,2
-
1. 北京交通大学 计算机与信息技术学院, 北京 100044;
2. 北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044
- Author(s):
-
LIU Zhixiong1,2, JIA Caiyan1,2
-
1. School of Computer and Information Technology, University of Beijing Jiaotong, Beijing 100044, China;
2. University of Beijing Jiaotong Beijing Key Lab of Traffic Data Analysis and Mining, Beijing 100044, China
-
- 关键词:
-
微博; 社区; 网络; 文本; 话题; 兴趣; 噪声; 主题
- Keywords:
-
microblog; community; network; text; topic; interest; noise; theme
- 分类号:
-
TP393
- DOI:
-
10.11992/tis.201603341
- 摘要:
-
微博话题检测是一种特殊形式的话题检测,传统的话题检测方法并不能取得很好的效果。提出了一种面向微博用户社区的话题检测方法。该方法首先在用户发表的微博文本上,利用LDA主题模型分析用户的兴趣分布。接着,结合微博用户关系网络与用户兴趣对用户进行社区划分,使得同一社区的用户不仅具有较稠密的链接关系,还具有相似的兴趣。然后,面向用户社区,在每个社区内部检测用户关心的话题,给出了一种面向用户社区的、融合词重要度与ε近邻图的微博话题发现方法。该算法能够有效地去除微博噪声、快速准确检测出每个用户社区内关心的话题并对话题进行热度排行。
- Abstract:
-
Microblog topic detection is a special type of topic detection. The traditional topic detection algorithms do not work well in special situations for Chinese microblogs. In this paper, a topic detection method cater to the user community of microblogs is proposed. Firstly, the users’ interests were analyzed by using the LDA(Latent Dirichlet Allocation) topic model on the text of microblogs generated by users/bloggers. Then the user/follower network associated with users’ interests was created and partitioned into different communities so that the users in the same group were not only densely connected but also shared similar interests. Then, the topics of interest in each community were detected. Together, this provides a microblog topic finding method that faces a user’s community and combines the importance of words as well as an ε neighboring graph. The experimental tests show that the method can effectively eliminate microblog noise, compute the importance of words, and rapidly and accurately obtain the topics of interest of each community.
备注/Memo
收稿日期:2016-3-19;改回日期:。
基金项目:国家自然科学基金面上项目(61473030)、中央高校基本科研业务专项基金项目(2014JBM031).
作者简介:刘志雄,1990年生,男,硕士研究生,主要研究领域为数据挖掘、机器学习、复杂网络。贾彩燕,1976年生,女,副教授,博士生导师,中国人工智能学会粗糙集与软计算专业委员会委员,主要研究方向为数据挖掘、社会计算、文本挖掘及生物信息学。近年来主持国家自然科学基金面上项目、青年基金面上项目各1项;参加国家自然科学基金重点项目、科技重大专项、北京市自然科学基金各1项;获湖南省科学技术进步二等奖1项。
通讯作者:刘志雄.E-mail:523129791@qq.com.
更新日期/Last Update:
1900-01-01