[1]曲昭伟,吴春叶,王晓茹.半监督自训练的方面提取[J].智能系统学报,2019,14(04):635-641.[doi:10.11992/tis.201806006]
 QU Zhaowei,WU Chunye,WANG Xiaoru.Aspects extraction based on semi-supervised self-training[J].CAAI Transactions on Intelligent Systems,2019,14(04):635-641.[doi:10.11992/tis.201806006]
点击复制

半监督自训练的方面提取(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年04期
页码:
635-641
栏目:
出版日期:
2019-07-02

文章信息/Info

Title:
Aspects extraction based on semi-supervised self-training
作者:
曲昭伟1 吴春叶1 王晓茹2
1. 北京邮电大学 网络技术研究院, 北京 100876;
2. 北京邮电大学 计算机学院, 北京 100876
Author(s):
QU Zhaowei1 WU Chunye1 WANG Xiaoru2
1. Institute of Network Technology, Beijing University of Posts and Telecommunication, Beijing 100876, China;
2. College of Computer Science, Beijing University of Posts and Telecommunication, Beijing 100876, China
关键词:
方面提取词向量半监督自训练未标签数据观点挖掘种子词相似词
Keywords:
aspect extractionword vectorsemi-supervisedself-trainingunlabeled dataopinion miningseed wordssimilar words
分类号:
TP18
DOI:
10.11992/tis.201806006
摘要:
方面提取是观点挖掘和情感分析任务中的关键一步,随着社交网络的发展,用户越来越倾向于根据评论信息来帮助进行决策,并且用户也更加关注评论的细粒度的信息,因此,从海量的网络评论数据中快速挖掘方面信息对于用户快速决策具有重要意义。大部分基于主题模型和聚类的方法在方面提取的一致性上效果并不好,传统的监督学习的方法效果虽然表现很好,但是需要大量的标注文本作为训练数据,标注文本需要消耗大量的人力成本。基于以上问题,本文提出一种基于半监督自训练的方面提取方法,充分利用现存的大量未标签的数据价值,在未标签数据集上通过词向量模型寻找方面种子词的相似词,对每个方面建立与数据集最相关的方面表示词集合,本文方法避免了大量的文本标注,充分利用未标签数据的价值,并且本文方法在中文和英文数据集上都表现出了理想的效果。
Abstract:
Aspect extraction is a key step in opinion mining and sentiment analysis. With the development of social networks, users are increasingly inclined to make decisions based on review information and pay more attention to the fine-grained information of comments. Therefore, it is important to help users to make these decisions by quickly mining information from massive comments. Most topic-based models and clustering methods do not work well in terms of consistency in aspect extraction. The traditional supervised learning method works well, but it requires a large amount of annotation text as training data, and labeling text requires a lot of labor costs. Based on the above issues, a method for aspects extraction based on semi-supervised self-training (AESS) is proposed in this paper. The method takes full advantage of the large amount of unlabeled data that exist in the web. Words similar to seed words on the unlabeled datasets using a word vector model are found, and multiple aspects word sets that are most related to the data set are constructed. Our approach avoids a large number of text annotations and makes full use of the value of unlabeled data, and our method has made good performance in both Chinese and English datasets.

参考文献/References:

[1] LIU Bing. Sentiment analysis and opinion mining[C]//Proceedings of the Synthesis Lectures on Human Language Technologies. Toronto, Canada, 2012:152-153.
[2] 刘倩. 观点挖掘中评价对象抽取方法的研究[D]. 南京:东南大学, 2016. LIU Qian. Research on approaches to opinion target extraction in opinion mining[D]. Nanjing:Southeast University, 2016.
[3] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[4] TITOV I, MCDONALD R. Modeling online reviews with multi-grain topic models[C]//Proceedings of the 17th International Conference on World Wide Web. Beijing, China, 2008:111-120.
[5] BRODY S, ELHADAD N. An unsupervised aspect-sentiment model for online reviews[C]//Proceedings of the Human Language Technologies:the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, USA, 2010:804-812.
[6] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
[7] PORIA S, CAMBRIA E, GELBUKH A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis[C]//Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015:2539-2544.
[8] PORIA S, CAMBRIA E, GELBUKH A. Aspect extraction for opinion mining with a deep convolutional neural network[J]. Knowledge-Based Systems, 2016, 108:42-49.
[9] HE Ruidan, LEE W S, NG H T, et al. An unsupervised neural attention model for aspect extraction[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, 2017:388-397.
[10] 韩忠明, 李梦琪, 刘雯, 等. 网络评论方面级观点挖掘方法研究综述[J]. 软件学报, 2018, 29(2):417-441 HAN Zhongming, LI Mengqi, LIU Wen, et al. Survey of studies on aspect-based opinion mining of internet[J]. Journal of Software, 2018, 29(2):417-441
[11] JIN Wei, HO H H. A novel lexicalized HMM-based learning framework for web opinion mining[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada, 2009:465-472.
[12] LI Fangtao, HAN Chao, HUANG Minle, et al. Structure-aware review mining and summarization[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China, 2010:653-661.
[13] JIN Wei, HO H H, SRIHARI R K. OpinionMiner:a novel machine learning system for web opinion mining and extraction[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 2009:1195-1204.
[14] WANG Wenya, PAN S J, DAHLMEIER D, et al. Recursive neural conditional random fields for aspect-based sentiment analysis[J]. arXiv preprint arXiv:1603.06679, 2016.
[15] CHEN Huimin, SUN Maosong, TU Cunchao, et al. Neural sentiment classification with user and product attention[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin, USA, 2016:1650-1659.
[16] CHINSHA T C, JOSEPH S. A syntactic approach for aspect based opinion mining[C]//Proceedings of 2015 IEEE International Conference on Semantic Computing. Anaheim, USA, 2015:24-31.
[17] YAN Xiaohui, GUO Jiafeng, LAN Yanyan, et al. A biterm topic model for short texts[C]//Proceedings of the 22nd International Conference on World Wide Web. Rio de Janeiro, Brazil, 2013:1445-1456.
[18] MAAS A L, DALY R E, PHAM P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Portland, USA, 2011:142-150.
[19] WANG Linlin, LIU Kang, CAO Zhu, et al. Sentiment-aspect extraction based on restricted boltzmann machines[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China, 2015.
[20] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
[21] MIKOLOV T, SUTSKEVER I, CHEN Kai, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:3111-3119.
[22] GANU G, ELHADAD N, MARIAN A. Beyond the stars:improving rating predictions using review text content[C]//Proceedings of the 12th International Workshop on the Web and Databases. Rhode Island, USA, 2009.
[23] ZHAO W X, JIANG Jing, YAN Hongfei, et al. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid[C]//Proceedings of 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, Massachusetts, USA, 2010:56-65.
[24] MUKHERJEE A, LIU Bing. Aspect extraction through semi-supervised modeling[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers. Jeju Island, Korea, 2012:339-348.

相似文献/References:

[1]陈培,景丽萍.融合语义信息的矩阵分解词向量学习模型[J].智能系统学报,2017,12(05):661.[doi:10.11992/tis.201706012]
 CHEN Pei,JING Liping.Word representation learning model using matrix factorization to incorporate semantic information[J].CAAI Transactions on Intelligent Systems,2017,12(04):661.[doi:10.11992/tis.201706012]
[2]张潇鲲,刘琰,陈静.引入外部词向量的文本信息网络表示学习[J].智能系统学报,2019,14(05):1056.[doi:10.11992/tis.201809037]
 ZHANG Xiaokun,LIU Yan,CHEN Jing.Representation learning using network embedding based on external word vectors[J].CAAI Transactions on Intelligent Systems,2019,14(04):1056.[doi:10.11992/tis.201809037]

备注/Memo

备注/Memo:
收稿日期:2018-06-02。
基金项目:国家自然科学基金项目(61672108).
作者简介:曲昭伟,男,1970年生,教授,主要研究方向为数据挖掘、人工智能、无线传感器网络。承担多项横向课题。发表学术论文50余篇;吴春叶,女,1992年生,硕士研究生,主要研究方向为数据挖掘、Web挖掘、机器学习和Web搜索引擎;王小茹,女,1980年生,副教授,主要研究方向为人工智能、计算机视觉、图像理解、精准搜索与大数据数据挖掘。获得国家发明专利3项。发表学术论文36篇,出版学术著作6部,译著2部。
通讯作者:曲昭伟.E-mail:zwqu@bupt.edu.cn
更新日期/Last Update: 2019-08-25