[1]WANG Pei,XIAN Yantuan,GUO Jianyi,et al.A novel method using word vector and graphical models for entity disambiguation in specific topic domains[J].CAAI Transactions on Intelligent Systems,2016,11(3):366-374.[doi:10.11992/tis.201603044]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
11
Number of periods:
2016 3
Page number:
366-374
Column:
学术论文—自然语言处理与理解
Public date:
2016-06-25
- Title:
-
A novel method using word vector and graphical models for entity disambiguation in specific topic domains
- Author(s):
-
WANG Pei1; XIAN Yantuan1; 2; GUO Jianyi1; 2; WEN Yonghua1; 2; CHEN Wei1; 2; WANG Hongbin1; 2
-
1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China
-
- Keywords:
-
entity disambiguation; entity linking; Word2Vec; Wikipedia; graphical model; random walking
- CLC:
-
TP393
- DOI:
-
10.11992/tis.201603044
- Abstract:
-
In this paper, a novel method based on word vector and graph models is proposed to deal with entity disambiguation in specific topic domains. Take the tourism topic domain as an example. The method firstly chooses the web-pages of the tourism category in a Wikipedia offline database to build a knowledge base; then, the tool Word2Vec is used to build a word vector model with the texts in the knowledge base and texts taken from several tourism websites. Combined with a manual annotation graph, a random walk algorithm based on the graph is used to compute similarity to accurately calculate the similarity between words within the tourism domain. Next, the method extracts several keywords from the background text of the entity to be disambiguated and compares them with the keyword text in the knowledge base that describes the candidate entities. Finally, the method uses the trained Word2Vec model and graphical model to calculate the similarity between the keywords of name mention and the keywords of candidate entities. The method then chooses the candidate entities which have the maximum average similarity to the target entity. Experimental results show that this new method can effectively capture the similarity between name mention and a target entity; thus, it can accurately achieve entity disambiguation of a topic-specific domain.