[1]秦海菲,杜军平.酒店在线评论数据的特征挖掘[J].智能系统学报,2018,13(06):1006-1014.[doi:10.11992/tis.201806016]
 QIN Haifei,DU Junping.Feature mining based on online hotel review[J].CAAI Transactions on Intelligent Systems,2018,13(06):1006-1014.[doi:10.11992/tis.201806016]
点击复制

酒店在线评论数据的特征挖掘(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年06期
页码:
1006-1014
栏目:
出版日期:
2018-10-25

文章信息/Info

Title:
Feature mining based on online hotel review
作者:
秦海菲1 杜军平2
1. 楚雄师范学院 信息科学与技术学院, 云南 楚雄 675000;
2. 北京邮电大学 计算机学院, 北京 100876
Author(s):
QIN Haifei1 DU Junping2
1. School of Information Science and Technology, Chuxiong Normal University, Chuxiong 675000, China;
2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
关键词:
酒店在线点评数据获取特征抽取特征挖掘聚类分析分类智能推荐
Keywords:
hotelonline reviewdata capturefeature extractfeature miningcluster analysisclassificationintelligent recommendation
分类号:
TP391
DOI:
10.11992/tis.201806016
摘要:
论文以酒店在线评论数据为研究对象,对酒店在线评论数据的特征挖掘进行了研究。论文首先从酒店在线评论数据的获取出发,经过数据清洗、词性分析、特征抽取、指标确定、特征筛选、特征确定、特征校验几个环节,实现了酒店在线评论数据特征挖掘的目的。论文以词频为基础,融合了词性分析、聚类分析等方法,利用词频数(TF)、词频率(TF1)、词频权重(TTW)、评论频率(DF)、逆文档频率(IDF)和TF1-IDF等指标对候选特征词进行降维,得出酒店在线评论数据的特征,并对特征词进行校验,完成了酒店在线评论数据的特征挖掘的过程。论文将为以评论为依据的客户分类、酒店分类、智能推荐奠定基础。
Abstract:
In this study, the feature mining of online hotel review data is investigated. First, online hotel reviews data were obtained. To mine features from the review data, data cleaning, part-of-speech analysis, feature extraction, index determination, feature selection, feature determination, feature checking were carried out. Based on the word frequency, integrating part-of-speech analysis, and cluster analysis, the word frequency (TF), word frequency rate (TF1), word frequency weight (TTW), comment frequency (DF), inverse document frequency (IDF), and TF1-IDF of candidate feature words were applied to reduce dimension. The online hotel review data features were obtained, and then the feature words were verified. This paper will lay a solid foundation for the classification of hotels and customers and intelligent recommendation based on online reviews.

参考文献/References:

[1] 吴维芳,高宝俊, 杨海霞, 等. 评论文本对酒店满意度的影响:基于情感分析的方法[J]. 数据分析与知识发现, 2017, 1(3):62-71 WU Weifang, GAO Baojun, YANG Haixia, et al. The impacts of reviews on hotel satisfaction:a sentiment analysis method[J]. Data analysis and knowledge discovery, 2017, 1(3):62-71
[2] GAVILAN D, AVELLO M, MARTINEZ-NAVARRO G. The influence of online ratings and reviews on hotel booking consideration[J]. Tourism management, 2018, 66:53-61.
[3] TAN Sangsang, NA J C. Mining semantic patterns for sentiment analysis of product reviews[C]//Proceedings of the 21st International Conference on Theory and Practice of Digital Libraries Research and Advanced Technology for Digital Libraries. Thessaloniki, Greece, 2017:382-393.
[4] PENG Honggang, ZHANG Hongyu, WANG Jianqiang. Cloud decision support model for selecting hotels on TripAdvisor.com with probabilistic linguistic information[J]. International journal of hospitality management, 2018, 68:124-138.
[5] GAVILAN D, AVELLO M, MARTINEZ-NAVARRO G. The influence of online ratings and reviews on hotel booking consideration[J]. Tourism management, 2018, 66:53-61.
[6] XIE K L, ZHANG Zili, ZHANG Ziqiong. The business value of online consumer reviews and management response to hotel performance[J]. International journal of hospitality management, 2014, 43:1-12.
[7] LIU Bing. Sentiment analysis and opinion mining[J]. Synthesis lectures on human language technologies, 2012, 5(1):1-16.
[8] RAVI K, RAVI V. A survey on opinion mining and sentiment analysis[J]. Knowledge-based systems, 2015, 89(C):14-46.
[9] GUELLIL I, BOUKHALFA K. Social big data mining:a survey focused on opinion mining and sentiments analysis[C]//Proceedings of the 12th International Symposium on Programming and Systems. Algiers, Algeria, 2015:1-10.
[10] RANA T A, CHEAH Y N. Aspect extraction in sentiment analysis:comparative analysis and survey[J]. Artificial intelligence review, 2016, 46(4):459-483.
[11] SUN Shiliang, LUO Chen, CHEN Junyu. A review of natural language processing techniques for opinion mining systems[J]. Information fusion, 2017, 36:10-25.
[12] 李建华, 刘功申, 林祥. 情感倾向性分析及应用研究综述[J]. 信息安全学报, 2017, 2(2):48-62 LI Jianhua, LIU Gongshen, LIN Xiang. Survey on sentiment orientation analysis and its applications[J]. Journal of cyber security, 2017, 2(2):48-62
[13] 韩忠明, 李梦琪, 刘雯, 等. 网络评论方面级观点挖掘方法研究综述[J]. 软件学报, 2018, 29(2):417-441 HAN Zhongming, LI Mengqi, LIU Wen, et al. Survey of studies on aspect-based opinion mining of internet[J]. Journal of software, 2018, 29(2):417-441
[14] YU Zheng, WANG Haixun, LIN Xuemin, et al. Understanding short texts through semantic enrichment and hashing[J]. IEEE transactions on knowledge and data engineering, 2016, 28(2):566-579.
[15] 王仲远, 程健鹏, 王海勋, 等. 短文本理解研究[J]. 计算机研究与发展, 2016, 53(2):262-269 WANG Zhongyuan, CHENG Jianpeng, WANG Haixun, et al. Short text understanding:a survey[J]. Journal of computer research and development, 2016, 53(2):262-269
[16] 常耀成, 张宇翔, 王红, 等. 特征驱动的关键词提取算法综述[J]. 软件学报, 2018, 29(7):2046-2070 CHANG Yaocheng, ZHANG Yuxiang, WANG Hong, et al. Features Oriented survey of state-of-the-art keyphrase extraction algorithms[J]. Journal of software, 2018, 29(7):2046-2070
[17] 赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J]. 软件学报, 2017, 28(9):2431-2449 ZHAO Jingsheng, ZHU Qiaoming, ZHOU Guodong, et al. Review of research in automatic keyword extraction[J]. Journal of software, 2017, 28(9):2431-2449
[18] 杜政霖, 李云. 基于特征聚类集成技术的在线特征选择[J]. 计算机应用, 2017, 37(3):866-870 DU Zhenglin, LI Yun. Online feature selection based on feature clustering ensemble technology[J]. Journal of computer applications, 2017, 37(3):866-870
[19] 王斌会. 多元统计分析及R语言建模[M]. 4版. 暨南大学出版社, 2016:159-181. WANG Binhui. Multivariate statistical analysis and modeling for R language[M]. 4th ed. Jinan University Press, 2016:159-181.
[20] FANG Lei, LIU Biao, HUANG Minlie. Leveraging large data with weak supervision for joint feature and opinion word extraction[J]. Journal of computer science and technology, 2015, 30(4):903-916.

备注/Memo

备注/Memo:
收稿日期:2018-06-05。
基金项目:国家自然科学基金项目(61320106006,61532006,61772083).
作者简介:秦海菲,女,1980年生,副教授,主要研究方向为数据库、数据仓库、数据挖掘;杜军平,女,1963年生,教授,博士生导师,主要研究方向为人工智能、社交网络分析、数据挖掘、运动图像处理,主持国家"863"、"973"计划项目、国家自然科学基金重点项目、国家自然科学基金重大国际合作项目、北京市自然科学基金重点项目等多项,发表学术论文多篇。
通讯作者:杜军平.E-mail:junpingdu@126.com
更新日期/Last Update: 2018-12-25