[1]李春英,汤庸,陈国华,等.面向学术社区的专家推荐模型[J].智能系统学报,2012,7(04):365-369.
 LI Chunying,TANG Yong,CHEN Guohua,et al.Research on an expert recommendation model based on the scholar community SCHOLAT[J].CAAI Transactions on Intelligent Systems,2012,7(04):365-369.
点击复制

面向学术社区的专家推荐模型(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第7卷
期数:
2012年04期
页码:
365-369
栏目:
出版日期:
2012-08-25

文章信息/Info

Title:
Research on an expert recommendation model based on the scholar community SCHOLAT
文章编号:
1673-4785(2012)04-0365-05
作者:
李春英1汤庸2陈国华2 汤志康3
1.肇庆学院 计算机学院,广东 肇庆 526061;
2.华南师范大学 计算机学院,广东 广州 510631;
3.广东技术师范学院 计算机学院,广东 广州 510665
Author(s):
LI Chunying1 TANG Yong2 CHEN Guohua2 TANG Zhikang3
1.School of Computer, Zhaoqing University, Zhaoqing 526061, China;
2.School of Computer Science, South China Normal University, Guangzhou 510631, China;
 3.School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665, China
关键词:
学术专家推荐H参数概率主题模型查询扩展
Keywords:
expert recommendation H index probabilistic topic model query expansion
分类号:
TP393
文献标志码:
A
摘要:
在学术社区提供的服务中,对于研究者特别是青年研究者来说,专家推荐是一个必不可少的部分.目前提供学术信息服务的所有中文搜索引擎中,都没有提供用户感兴趣的专家推荐服务.因此,提出了一个面向学术社区的专家推荐模型.使用改进的H参数对学者n年时间内发表的论文成果进行量化,获取专家列表;使用概率主题模型从作者发表的论文中提取主题向量作为学者的研究方向;根据矩阵奇异值分解对构建的词项〖KG-*1/3〗-〖KG-*1/3〗文档矩阵进行降维,进而生成词项〖KG-*1/3〗-〖KG-*1/3〗词项关系矩阵,实现对搜索关键词的查询扩展,并计算查询扩展向量与作者主题向量之间的相关度,根据相关度大小进行排序推荐.在SCHOLAT(学者网)数据集上验证模型的有效性,实验结果表明提出的模型达到了预期的效果.
Abstract:
Among the services offered by the academic community, expert recommendation is an indispensable component for researchers, especially young researchers. At present, expert recommendation services have not been offered to users on all of the Chinese search engines offering academic information services. Thus, a scholar community oriented expert recommendation model was proposed. The Hindex was improved to quantify the achievements of a scholar based on the published papers in the last n years, and then the expert list was given based on the improved Hindex. The research interests of a researcher were obtained based on the topics extracted by the probabilistic topic model. In order to carry out high recall retrieval, a query expansion strategy was used: the singular value decomposition step was applied to the termdocument matrix to reduce the dimensionality of the matrix and obtain the termterm relationship matrix, and then the highly related terms were selected to make up the expanded query. Finally, the relevance between the expanded query and the scholar’s topic vectors was calculated and the results were represented in a descending order. An experiment was conducted on the dataset collected from an existing scholar community, SCHOLAT, to verify the effectiveness of the proposed model. The experimental results demonstrate that the proposed model produces the expected results.

参考文献/References:

[1]HUANG J, ZHUANG Z, LI J, et al. Collaboration over time: characterizing and modeling network evolution[C]//Proceedings of the International Conference on Web Search and Web Data Mining. Palo Alto, USA, 2008: 107116.
[2]陈国华, 汤庸, 彭泽武,等. 基于学术社区的学术搜索引擎设计[J]. 计算机科学, 2011, 38(8): 171175.
 CHEN Guohua, TANG Yong, PENG Zewu, et al. Design of an academic search engine based on the scholar community[J]. Computer Science, 2011, 38(8): 171175.
[3]WANG Chong, BLEI D M. Collaborative topic modeling for recommending scientific articles[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2011: 448456.
[4]CHEN H H, GOU Liang, ZHANG Xiaolong, et al. Collabseer: a search engine for collaboration discovery[C]//Proceedings of JCDL. Ottawa, Canada, 2011: 231240.
[5]HIRSCH J E. An index to quantify an individual’s scientificresearch output[J]. The National Academy of Sciences of the USA, 2005, 102(46): 1656916572.
[6]POPOV S B. A parameter to quantify dynamics of a researcher’s scientific activity[EB/OL]. [20111103]. http://arxiv.org/abs/physics/0508113.
[7]BATISTA P D, CAMPITELI M G, KINOUCHI O, et al.A complementary index to quantify an individual’s scientific research output[J]. Scientometrics, 2006, 68 (1): 179189.
[8]BORNMANN L, DANIEL H D. Does the hindex for ranking of scientists really work?〖KG-*1/2〗[J]. Scientometrics, 2005, 65(3): 391392.
[9]BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 9931022.
[10]徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 14231436.
 XU Ge, WANG Houfeng. The development of topic models in natural language processing[J]. Chinese Journal of Computers, 2011, 34(8): 14231436.
[11]DEERWESTER S, DUMAIS S T, LANDAUER T K, et al. Indexing by latent semantic analysis[J]. Journal of The American Society for Information Science, 1990, 41(6): 391407.

备注/Memo

备注/Memo:
收稿日期: 2012-05-24.
网络出版日期:2012-07-20.
基金项目:国家自然科学基金资助项目(60970044);广东省科技计划资助项目(2010B010600031);广州市科技计划资助项目(2010JD00511).
通信作者:李春英.
E_mail:zqxylcy@163.com.
作者简介:
李春英,女,1978年生,讲师,CCF会员(E200019159M),主要研究方向为学术信息检索与推荐、人工智能.
 汤庸,男,1964年生,教授,博士生导师,博士,中国计算机学会协同计算专委会副主任,中国人工智能学会网络专委会副主任,广东省计算机学会常务副理事长,广东省网络文化协会副会长.主要研究方向为数据库、协同计算、云服务软件,发表学术论文多篇.
陈国华,男,1984年生,讲师,博士,主要研究方向为学术信息检索、机器学习.
更新日期/Last Update: 2012-09-26