[1]刘志雄,贾彩燕.面向用户兴趣与社区关系的微博话题检测方法[J].智能系统学报编辑部,2016,11(3):294-299.[doi:10.11992/tis.201603341]
 LIU Zhixiong,JIA Caiyan.Micro-blog topic detection based on users’ interests and communities[J].CAAI Transactions on Intelligent Systems,2016,11(3):294-299.[doi:10.11992/tis.201603341]
点击复制

面向用户兴趣与社区关系的微博话题检测方法(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年3期
页码:
294-299
栏目:
出版日期:
2016-06-25

文章信息/Info

Title:
Micro-blog topic detection based on users’ interests and communities
作者:
刘志雄12 贾彩燕12
1. 北京交通大学 计算机与信息技术学院, 北京 100044;
2. 北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044
Author(s):
LIU Zhixiong12 JIA Caiyan12
1. School of Computer and Information Technology, University of Beijing Jiaotong, Beijing 100044, China;
2. University of Beijing Jiaotong Beijing Key Lab of Traffic Data Analysis and Mining, Beijing 100044, China
关键词:
微博社区网络文本话题兴趣噪声主题
Keywords:
microblogcommunitynetworktexttopicinterestnoisetheme
分类号:
TP393
DOI:
10.11992/tis.201603341
摘要:
微博话题检测是一种特殊形式的话题检测,传统的话题检测方法并不能取得很好的效果。提出了一种面向微博用户社区的话题检测方法。该方法首先在用户发表的微博文本上,利用LDA主题模型分析用户的兴趣分布。接着,结合微博用户关系网络与用户兴趣对用户进行社区划分,使得同一社区的用户不仅具有较稠密的链接关系,还具有相似的兴趣。然后,面向用户社区,在每个社区内部检测用户关心的话题,给出了一种面向用户社区的、融合词重要度与ε近邻图的微博话题发现方法。该算法能够有效地去除微博噪声、快速准确检测出每个用户社区内关心的话题并对话题进行热度排行。
Abstract:
Microblog topic detection is a special type of topic detection. The traditional topic detection algorithms do not work well in special situations for Chinese microblogs. In this paper, a topic detection method cater to the user community of microblogs is proposed. Firstly, the users’ interests were analyzed by using the LDA(Latent Dirichlet Allocation) topic model on the text of microblogs generated by users/bloggers. Then the user/follower network associated with users’ interests was created and partitioned into different communities so that the users in the same group were not only densely connected but also shared similar interests. Then, the topics of interest in each community were detected. Together, this provides a microblog topic finding method that faces a user’s community and combines the importance of words as well as an ε neighboring graph. The experimental tests show that the method can effectively eliminate microblog noise, compute the importance of words, and rapidly and accurately obtain the topics of interest of each community.

参考文献/References:

[1] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. The journal of machine learning research, 2003, 3(4-5): 993-1002.
[2] VON LUXBURG U. A tutorial on spectral clustering[J]. Statistics and computing, 2007, 17(4): 395-416.
[3] 郭庆琳, 李艳梅, 唐琦. 基于VSM的文本相似度计算的研究[J]. 计算机应用研究, 2008, 25(11): 3256-3258. GUO Qinglin, Li Yanmei, TANG Qi. Similarity computing of documents based on VSM[J]. Application research of computers, 2008, 25(11): 3256-3258.
[4] 周刚, 邹鸿程, 熊小兵, 等. MB-SinglePass: 基于组合相似度的微博话题检测[J]. 计算机科学, 2012, 39(10): 198-202. ZHOU Gang, ZOU Hongcheng, XIONG Xiaobing, et al. MB-SinglePass: microblog topic detection based on combined similarity[J]. Computer science, 2012, 39(10): 198-202.
[5] 郑斐然, 苗夺谦, 张志飞, 等. 一种中文微博新闻话题检测的方法[J]. 计算机科学, 2012, 39(1): 138-141. ZHENG Feiran, MIAO Duoqian, ZHANG Zhifei, et al. News topic detection approach on Chinese microblog[J]. Computer science, 2012, 39(1): 138-141.
[6] 方然, 苗夺谦, 张志飞. 一种基于情感的中文微博话题检测方法[J]. 智能系统学报, 2013, 8(3): 208-213. FANG Ran, MIAO Duoqian, ZHANG Zhifei, et al. An emotion-based method of topic detection from Chinese microblogs[J]. CAAI transactions on intelligent systems, 2013, 8(3): 004: 208-213.
[7] Heinrich G. Parameter estimation for text analysis[R]. Technical report, Darmstadt, Germany: Fraunhofer IGD, 2004.
[8] 乔健. 面向新浪微博的链接和内容相结合的社区划分方法[D]. 北京: 北京交通大学, 2015. QIAO Jian. Community detection by using link and content and it’s application in sina microblog[D]. Beijing: Beijing Jiaotong University, 2015.
[9] JIANG Yawen, JIA Caiyan, YU Jian. An efficient community detection method based on rank centrality[J]. Physica A: statistical mechanics and its applications, 2013, 392(9): 2182-2194.
[10] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the Web[R]. Stanford InfoLab, 1999: 189-194.
[11] KOJIMA K. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability[J]. American journal of human genetics, 1969, 21(4): 407-408.
[12] MIHALCEA R, TARAU P. TextRank: bringing order into texts[C]//Proceedings of EMNLP 2004: association for computational linguistics. Barcelona, Spain, 2004.
[13] CHATURVEDI P, DHARA M, ARORA D. community detection in complex network via BGLL algorithm[J]. International journal of computer applications, 2012, 48(1): 32-42.
[14] ZANGHI H, VOLANT S, AMBROISE C. Clustering based on random graph model embedding vertex features[J]. Pattern recognition letters, 2010, 31(9): 830-836.
[15] XU Zhiqiang, KE Yiping, WANG Yi, et al. A model-based approach to attributed graph clustering[C]//Proceedings of the 2012 ACM SIGMOD international conference on management of data. New York, NY, USA, 2012: 505-516.
[16] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review E, 2004, 69(6): 066133.
[17] KARYPIS G, KUMAR V. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0[Z]. Minnesota: University of Minnesota, Department of Computer, 1995: 202-205.
[18] MIMNO D, WALLACH H M, TALLEY E, et al. Optimizing semantic coherence in topic models[C]//Proceedings of the conference on empirical methods in natural language processing. Stroudsburg, PA, USA, 2011: 262-272.
[19] HU Yanqing, LI Menghui, ZHANG Peng, et al. Community detection by signaling on complex networks[J]. Physical review E, 2008, 78(1): 016115.
[20] BURK C F, HORTON F W. Infomap: a complete guide to discovering corporate information resources[J]. Lincoln: Prentice Hall, 1988.

相似文献/References:

[1]赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报编辑部,2012,7(05):444.
 ZHAO Wenqing,HOU Xiaoke.News topic recognition of Chinese microblog based on word cooccurrence graph[J].CAAI Transactions on Intelligent Systems,2012,7(3):444.
[2]赵文清,侯小可,沙海虹.语义规则在微博热点话题情感分析中的应用[J].智能系统学报编辑部,2014,9(01):121.[doi:10.3969/j.issn.1673-4785.201208020]
 ZHAO Wenqing,HOU Xiaoke,SHA Haihong.Application of semantic rules to sentiment analysis of microblog hot topics[J].CAAI Transactions on Intelligent Systems,2014,9(3):121.[doi:10.3969/j.issn.1673-4785.201208020]

备注/Memo

备注/Memo:
收稿日期:2016-3-19;改回日期:。
基金项目:国家自然科学基金面上项目(61473030)、中央高校基本科研业务专项基金项目(2014JBM031).
作者简介:刘志雄,1990年生,男,硕士研究生,主要研究领域为数据挖掘、机器学习、复杂网络。贾彩燕,1976年生,女,副教授,博士生导师,中国人工智能学会粗糙集与软计算专业委员会委员,主要研究方向为数据挖掘、社会计算、文本挖掘及生物信息学。近年来主持国家自然科学基金面上项目、青年基金面上项目各1项;参加国家自然科学基金重点项目、科技重大专项、北京市自然科学基金各1项;获湖南省科学技术进步二等奖1项。
通讯作者:刘志雄.E-mail:523129791@qq.com.
更新日期/Last Update: 1900-01-01