[1]ZHANG Xiaokun,LIU Yan,CHEN Jing.Representation learning using network embedding based on external word vectors[J].CAAI Transactions on Intelligent Systems,2019,14(5):1056-1063.[doi:10.11992/tis.201809037]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
14
Number of periods:
2019 5
Page number:
1056-1063
Column:
学术论文—自然语言处理与理解
Public date:
2019-09-05
- Title:
-
Representation learning using network embedding based on external word vectors
- Author(s):
-
ZHANG Xiaokun; LIU Yan; CHEN Jing
-
Mathematical Engineering and Advanced Computing State Key Laboratory, Zhengzhou 450000
-
- Keywords:
-
network embedding; content information network; auto-encoder; external word vectors; vertex classification; word vectors; distributed representation; representation learning
- CLC:
-
TP181
- DOI:
-
10.11992/tis.201809037
- Abstract:
-
Network embedding, which preserves a network’s sophisticated features, can effectively learn the low-dimensional embedding of vertices in order to lower the computing and storage costs. Content information networks (such as Twitter), which contain rich text information, are commonly used in daily life. Most studies on content information network are based on the information of the network itself. Distributed word vectors are becoming increasingly popular in natural language processing tasks. As a low-dimensional representation of the semantic feature space, word vectors can preserve syntactic and semantic regularities. By introducing external word vectors into the modeling process, we can use the external syntactic and semantic features. Hence, in this paper, we propose network embedding based on external word vectors (NE-EWV), whereby the feature fusion representation is learned from both semantic feature space as well as structural feature space. Empirical experiments were conducted using real-world content information network datasets to validate the effectiveness of the model. The results show that in link prediction task, the AUC of the model was 7% to 19% higher than that of the model that considers only the structural features, and in most cases was 1% to 12% higher than the model that considers structural and text features. In node classification tasks, the performance is comparable with that of context-aware network embedding (CANE), which was the state-of-the-art baseline model.