[1]周浩,王莉.融合语义与语法信息的中文评价对象提取[J].智能系统学报,2019,14(1):171-178.[doi:10.11992/tis.201809029]
ZHOU Hao,WANG Li.Chinese opinion target extraction based on fusion of semantic and syntactic information[J].CAAI Transactions on Intelligent Systems,2019,14(1):171-178.[doi:10.11992/tis.201809029]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第1期
页码:
171-178
栏目:
学术论文—自然语言处理与理解
出版日期:
2019-01-05
- Title:
-
Chinese opinion target extraction based on fusion of semantic and syntactic information
- 作者:
-
周浩1, 王莉2
-
1. 太原理工大学 信息与计算机学院, 山西 晋中 030600;
2. 太原理工大学 大数据学院, 山西 晋中 030600
- Author(s):
-
ZHOU Hao1, WANG Li2
-
1. College of Information and Computer Science, Taiyuan University of Technology, Jinzhong 030600, China;
2. College of Big Data, Taiyuan University of Technology, Jinzhong 030600, China
-
- 关键词:
-
中文评价对象; 语义; 语法; 序列标注; 双向长短期记忆网络; 条件随机场; 提取模型
- Keywords:
-
Chinese opinion target; semantic; syntactic; sequence labeling; bidirectional long short-term memory; conditional random field; extraction model
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201809029
- 摘要:
-
鉴于常规的序列化标注方法提取中文评价对象准确率低,存在忽略中文语义与语法信息的缺陷,提出了融合语义与语法信息的中文评价对象提取模型。该模型在原始字向量的基础上通过优化字符含义策略强化语义特征,弥补忽略的字符与词语的内部信息;并通过词性序列标注,对句子的词性信息进行表征,深化输入的语法特征。网络训练使用双向长短期记忆网络并用条件随机场克服标注标签的偏差,提高了提取准确率。该模型在BDCI2017数据集上进行验证,与未融入语义和语法的提取模型相比,中文主题词与情感词提取准确率分别提高了2.1%与1.68%,联合提取的准确率为77.16%,具备良好的中文评价对象提取效果。
- Abstract:
-
The regular method of Chinese opinion target extraction has poor accuracy, and it ignores Chinese semantics and syntactic information. Therefore, a Chinese opinion target extraction model that combines semantic and syntactic information has been proposed. On the basis of the original word vector, the model strengthens the semantic features by optimizing the character meaning strategy, so as to make up for the internal information between the ignored characters and words, and through part-of-speech sequence annotation, the word-of-speech information of the sentence is characterized, and it represents the input syntactic information in depth. Through the bidirectional long short-term memory and the conditional random field, the deviation of the labeled label is avoided, improving the extraction accuracy. The model was validated on the BDCI2017 dataset. When compared with a unincorporated semantics and grammar extraction model, the accuracy of Chinese keyword and sentiment extraction increased by 2.1% and 1.68%, respectively. The accuracy of joint extraction was 77.16%, indicating a good effect on Chinese opinion target extraction.
备注/Memo
收稿日期:2018-09-14。
基金项目:国家自然科学基金项目(61872260);山西省重点研发计划国际合作项目(201703D421013).
作者简介:周浩,男,1993年生,硕士研究生,主要研究方向为自然语言处理、数据挖掘、情感分析;王莉,女,1971年生,教授,博士生导师,主要研究方向为社会网络计算、大数据分析与计算、深度学习。
通讯作者:王莉.E-mail:wangli@tyut.edu.cn
更新日期/Last Update:
1900-01-01