WANG Pei,XIAN Yantuan,GUO Jianyi,et al.A novel method using word vector and graphical models for entity disambiguation in specific topic domains[J].CAAI Transactions on Intelligent Systems,2016,11(3):366-374.[doi:10.11992/tis.201603044]





A novel method using word vector and graphical models for entity disambiguation in specific topic domains
汪沛1 线岩团12 郭剑毅12 文永华12 陈玮12 王红斌12
1. 昆明理工大学 信息工程与自动化学院, 云南 昆明 650500;
2. 昆明理工大学 智能信息处理重点实验室, 云南 昆明 650500
WANG Pei1 XIAN Yantuan12 GUO Jianyi12 WEN Yonghua12 CHEN Wei12 WANG Hongbin12
1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China
entity disambiguationentity linkingWord2VecWikipediagraphical modelrandom walking
In this paper, a novel method based on word vector and graph models is proposed to deal with entity disambiguation in specific topic domains. Take the tourism topic domain as an example. The method firstly chooses the web-pages of the tourism category in a Wikipedia offline database to build a knowledge base; then, the tool Word2Vec is used to build a word vector model with the texts in the knowledge base and texts taken from several tourism websites. Combined with a manual annotation graph, a random walk algorithm based on the graph is used to compute similarity to accurately calculate the similarity between words within the tourism domain. Next, the method extracts several keywords from the background text of the entity to be disambiguated and compares them with the keyword text in the knowledge base that describes the candidate entities. Finally, the method uses the trained Word2Vec model and graphical model to calculate the similarity between the keywords of name mention and the keywords of candidate entities. The method then chooses the candidate entities which have the maximum average similarity to the target entity. Experimental results show that this new method can effectively capture the similarity between name mention and a target entity; thus, it can accurately achieve entity disambiguation of a topic-specific domain.


