[1]周浩,王莉.融合语义与语法信息的中文评价对象提取[J].智能系统学报,2019,14(01):171-178.[doi:10.11992/tis.201809029]
 ZHOU Hao,WANG Li.Chinese opinion target extraction based on fusion of semantic and syntactic information[J].CAAI Transactions on Intelligent Systems,2019,14(01):171-178.[doi:10.11992/tis.201809029]
点击复制

融合语义与语法信息的中文评价对象提取(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年01期
页码:
171-178
栏目:
出版日期:
2019-01-05

文章信息/Info

Title:
Chinese opinion target extraction based on fusion of semantic and syntactic information
作者:
周浩1 王莉2
1. 太原理工大学 信息与计算机学院, 山西 晋中 030600;
2. 太原理工大学 大数据学院, 山西 晋中 030600
Author(s):
ZHOU Hao1 WANG Li2
1. College of Information and Computer Science, Taiyuan University of Technology, Jinzhong 030600, China;
2. College of Big Data, Taiyuan University of Technology, Jinzhong 030600, China
关键词:
中文评价对象语义语法序列标注双向长短期记忆网络条件随机场提取模型
Keywords:
Chinese opinion targetsemanticsyntacticsequence labelingbidirectional long short-term memoryconditional random fieldextraction model
分类号:
TP391
DOI:
10.11992/tis.201809029
摘要:
鉴于常规的序列化标注方法提取中文评价对象准确率低,存在忽略中文语义与语法信息的缺陷,提出了融合语义与语法信息的中文评价对象提取模型。该模型在原始字向量的基础上通过优化字符含义策略强化语义特征,弥补忽略的字符与词语的内部信息;并通过词性序列标注,对句子的词性信息进行表征,深化输入的语法特征。网络训练使用双向长短期记忆网络并用条件随机场克服标注标签的偏差,提高了提取准确率。该模型在BDCI2017数据集上进行验证,与未融入语义和语法的提取模型相比,中文主题词与情感词提取准确率分别提高了2.1%与1.68%,联合提取的准确率为77.16%,具备良好的中文评价对象提取效果。
Abstract:
The regular method of Chinese opinion target extraction has poor accuracy, and it ignores Chinese semantics and syntactic information. Therefore, a Chinese opinion target extraction model that combines semantic and syntactic information has been proposed. On the basis of the original word vector, the model strengthens the semantic features by optimizing the character meaning strategy, so as to make up for the internal information between the ignored characters and words, and through part-of-speech sequence annotation, the word-of-speech information of the sentence is characterized, and it represents the input syntactic information in depth. Through the bidirectional long short-term memory and the conditional random field, the deviation of the labeled label is avoided, improving the extraction accuracy. The model was validated on the BDCI2017 dataset. When compared with a unincorporated semantics and grammar extraction model, the accuracy of Chinese keyword and sentiment extraction increased by 2.1% and 1.68%, respectively. The accuracy of joint extraction was 77.16%, indicating a good effect on Chinese opinion target extraction.

参考文献/References:

[1] QIU Guang, LIU Bing, BU Jiajun, et al. Expanding domain sentiment lexicon through double propagation[C]//Proceedings of the 21st International Jont Conference on Artifical Intelligence. Pasadena, USA, 2009:1199-1204.
[2] ZHAI Zhongwu, XU Hua, KANG Bada, et al. Exploiting effective features for Chinese sentiment classification[J]. Expert systems with applications, 2011, 38(8):9139-9146.
[3] ZHANG Lei, LIU Bing, LIM S H, et al. Extracting and ranking product features in opinion documents[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Posters. Beijing, China, 2010:1462-1470.
[4] 孟园, 王洪伟. 中文评论产品特征与观点抽取方法研究[J]. 现代图书情报技术, 2016, 32(2):16-24 MENG Yuan, WANG Hongwei. Extracting product feature and user opinion from Chinese reviews[J]. Data analysis and knowledge discovery, 2016, 32(2):16-24
[5] 廖祥文, 陈兴俊, 魏晶晶, 等. 基于多层关系图模型的中文评价对象与评价词抽取方法[J]. 自动化学报, 2017, 43(3):462-471 LIAO Xiangwen, CHEN Xingjun, WEI Jingjing, et al. A multi-layer relation graph model for extracting opinion targets and opinion words[J]. Acta automatica sinica, 2017, 43(3):462-471
[6] 丁晟春, 吴婧婵媛, 李霄. 基于CRFs和领域本体的中文微博评价对象抽取研究[J]. 中文信息学报, 2016, 30(4):159-166 DING Shengchun, WU Jingchanyuan, LI Xiao. Opinion targets extraction from Chinese microblogs based on conditional random fields and domain ontology[J]. Journal of Chinese information processing, 2016, 30(4):159-166
[7] PENG Nanyun, DREDZE M. Improving named entity recognition for Chinese social media with word segmentation representation learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, 2016:149-155.
[8] MA Xuezhe, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, 2016:1064-1074.
[9] PHAM T H, LE-HONG P. End-to-end recurrent neural network models for Vietnamese named entity recognition:word-level vs. Character-level[C]//Proceedings of the 15th International Conference of the Pacific Association for Computational Linguistics. Yangon, Myanmar, 2017:219-232.
[10] JEBBARA S, CIMIANO P. Improving opinion-target extraction with character-level word embeddings[C]//Proceedings of the 1st Workshop on Subword and Character Level Models in NLP. Copenhagen, Denmark, 2017:159-167.
[11] HAMMERTON J. Named entity recognition with long short-term memory[C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. Edmonton, Canada, 2003:172-175.
[12] CHEN Xinxiong, XU Lei, LIU Zhiyuan, et al. Joint learning of character and word embeddings[C]//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina, 2015:1236-1242.
[13] YU Mo, DREDZE M. Improving lexical embeddings with semantic knowledge[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA, 2014:545-550.
[14] DOS SANTOS C N, GATTI M. Deep convolutional neural networks for sentiment analysis of short texts[C]//Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics. Dublin, Ireland, 2014:69-78.
[15] ZHENG Xiaoqing, CHEN Hanyang, XU Tianyu. Deep learning for Chinese word segmentation and POS tagging[C]//Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA, 2013:647-657.
[16] SPITKOVSKY V I, ALSHAWI H, JURAFSKY D, et al. Viterbi training improves unsupervised dependency parsing[C]//Proceedings of the 14th Conference on Computational Natural Language Learning. Uppsala, Sweden, 2010:9-17.
[17] YADAV V, BETHARD S. A survey on recent advances in named entity recognition from deep learning models[C]//Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, USA, 2018:2145-2158.

相似文献/References:

[1]朱 倩,程显毅,韩 飞.汉语句子语义三维表示模型[J].智能系统学报,2009,4(02):122.
 ZHU Qian,CHENG Xian-yi,HAN Fei.A threedimensional representative model of Chinese sentence semantics[J].CAAI Transactions on Intelligent Systems,2009,4(01):122.
[2]马甲林,张永军,王志坚.基于概念簇的多主题提取算法[J].智能系统学报,2015,10(02):261.[doi:10.3969/j.issn.1673-4785.201405066]
 MA Jialin,ZHANG Yongjun,WANG Zhijian.Multi-topic extraction algorithm based on concept clusters[J].CAAI Transactions on Intelligent Systems,2015,10(01):261.[doi:10.3969/j.issn.1673-4785.201405066]
[3]陶星,李卫华,汪中飞.基于知网的可拓领域信息元库的构建方法[J].智能系统学报,2015,10(5):790.[doi:10.11992/tis.201412006]
 TAO Xing,LI Weihua,WANG Zhongfei.Construction of HowNet-based extendable domain information element base[J].CAAI Transactions on Intelligent Systems,2015,10(01):790.[doi:10.11992/tis.201412006]
[4]毛莉娜,李卫华.利用智能引导和KDML增强可拓模型人机建模能力研究[J].智能系统学报,2017,12(03):348.[doi:10.11992/tis.201610017]
 MAO Lina,LI Weihua.Research on enhancing the human-machine modeling ability for an extension model using the intelligent guide and KDML[J].CAAI Transactions on Intelligent Systems,2017,12(01):348.[doi:10.11992/tis.201610017]
[5]张冬慧,程显毅.认知视角下的舆论观点句情感计算[J].智能系统学报,2017,12(04):498.[doi:10.11992/tis.201607023]
 ZHANG Donghui,CHENG Xianyi.Research on computation of affect in public opinion sentences from the cognition viewpoint[J].CAAI Transactions on Intelligent Systems,2017,12(01):498.[doi:10.11992/tis.201607023]

备注/Memo

备注/Memo:
收稿日期:2018-09-14。
基金项目:国家自然科学基金项目(61872260);山西省重点研发计划国际合作项目(201703D421013).
作者简介:周浩,男,1993年生,硕士研究生,主要研究方向为自然语言处理、数据挖掘、情感分析;王莉,女,1971年生,教授,博士生导师,主要研究方向为社会网络计算、大数据分析与计算、深度学习。
通讯作者:王莉.E-mail:wangli@tyut.edu.cn
更新日期/Last Update: 1900-01-01