[1]陈培,景丽萍.融合语义信息的矩阵分解词向量学习模型[J].智能系统学报,2017,12(5):661-667.[doi:10.11992/tis.201706012]
CHEN Pei,JING Liping.Word representation learning model using matrix factorization to incorporate semantic information[J].CAAI Transactions on Intelligent Systems,2017,12(5):661-667.[doi:10.11992/tis.201706012]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
12
期数:
2017年第5期
页码:
661-667
栏目:
学术论文—自然语言处理与理解
出版日期:
2017-10-25
- Title:
-
Word representation learning model using matrix factorization to incorporate semantic information
- 作者:
-
陈培, 景丽萍
-
北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044
- Author(s):
-
CHEN Pei, JING Liping
-
Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
-
- 关键词:
-
自然语言处理; 词向量; 矩阵分解; 语义信息; 知识库
- Keywords:
-
natural language processing; word representation; matrix factorization; semantic information; knowledge base
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201706012
- 摘要:
-
词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库(包含丰富的词语义信息),文中提出一种融合语义信息的词向量学习方法(KbEMF),该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。
- Abstract:
-
Word representation plays an important role in natural language processing and has attracted a great deal of attention from many researchers due to its simplicity and effectiveness. However, traditional methods for learning word representations generally rely on a large amount of unlabeled training data, while neglecting the semantic information of words, such as the semantic relationship between words. To sufficiently utilize knowledge bases that contain rich semantic word information in existing fields, in this paper, we propose a word representation learning method that incorporates semantic information (KbEMF). In this method, we use matrix factorization to incorporate field knowledge constraint items into a learning word representation model, which identifies words with strong semantic relationships as being relatively approximate to the obtained word representations. The results of word analogy reasoning tasks and word similarity measurement tasks obtained using actual data show the performance of KbEMF to be superior to that of existing models.
备注/Memo
收稿日期:2017-06-06。
基金项目: 国家自然科学基金项目(61370129,61375062,61632004)
作者简介:陈培,女,1990年生,硕士研究生,主要研究方向为自然语言处理、情感分析;景丽萍,女,1978年生,教授,博士,主要研究方向为数据挖掘、文本挖掘、生物信息学、企业智能。
通讯作者:景丽萍.E-mail:lpjing@bjtu.edu.cn
更新日期/Last Update:
2017-10-25