[1]何明,常盟盟,刘郭洋,等.基于SQL-on-Hadoop查询引擎的日志挖掘及其应用[J].智能系统学报,2017,12(5):717-728.[doi:10.11992/tis.201706016]
 HE Ming,CHANG Mengmeng,LIU Guoyang,et al.Log mining and application based on sql-on-hadoop query engine[J].CAAI Transactions on Intelligent Systems,2017,12(5):717-728.[doi:10.11992/tis.201706016]
点击复制

基于SQL-on-Hadoop查询引擎的日志挖掘及其应用

参考文献/References:
[1] OLINER A, GANAPATHI A, XU W. Advances and challenges in log analysis[J]. Communications of the ACM, 2012, 55(2):55-61.
[2] 李国杰,程学旗. 大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J]. 中国科学院院刊,2012, 27(6):647-657.LI Guojie, CHENG Xueqi. Research status and scientific thinking of big data[J]. Bulletin of Chinese academy of sciences, 2012, 27(6):647-657.
[3] 王元卓,靳小龙,程学旗. 网络大数据:现状与展望[J]. 计算机学报, 2013, 36(6):1125-1138.WANG Yuanzhuo, JIN Xiaolong, CHENG Xueqi. Network big data:present and future[J]. Chinese journal of computer, 2013, 36(6):1125-1138.
[4] 孟小峰,慈祥. 大数据管理:概念、技术与挑战[J]. 计算机研究与发展, 2013, 50(1):146-149.MENG Xiaofeng, CI Xiang. Big data management:Concepts, techniques and challenges[J]. Journal of computer research and development, 2013, 50(1):146-149.
[5] JOSHI S B. Apache hadoop performance-tuning methodologies and best practices[C]//Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering. New York, USA, 2012:241-242.
[6] LAMB W. The storyteller, the scribe, and a missing man:hidden influences from printed sources in the gaelic tales of duncan and neil macdonald[J]. Oral tradition, 2012, 27(1):109-160.
[7] Apache.org. Apache Chukwa[EB/OL].[2017-06-07].http://chukwa.apache.org/
[8] GOODHOPE K, KOSHY J, KREPS J, et al. Building LinkedIn’s real-time activity data pipeline[J]. Data engineering, 2012, 35(2):33-45.
[9] APACHE ORG. Apache Flume[EB/OL].[2017-06-07]. https://flume.apache.org.
[10] GHEMAWAAT S, GOBIOFF H, LEUNG S T. The Google file system[C]//Proc of the 19th ACM Symp on Operating Systems Principles. New York, USA, 2003:29-43.
[11] THUSOO A, SARMA J S, JAIN N, et al. Hive-a petabyte scale data warehouse using Hadoop[C]//Proc of 2010 IEEE 26th International Conference. Piscataway, NJ, 2010:996-1005.
[12] APACHE ORG. Apache HBase[EB/OL].[2017-06-07]. https://Hbase.apache.org.
[13] APACHE ORG. Hadoop Streaming[EB/OL].[2017-06-07].http://hadoop.apache.org/docs/r1.2.1/streaming.html.
[14] WEI J, ZHAO Y, JIANG K, et al. Analysis farm:A cloud-based scalable aggregation and query platform for network log analysis[C]//International Conference on Cloud and Service Computing. Hong Kong, China, 2011:354-359.
[15] RABKIN A, KATZ R H. Chukwa:a system for reliable large-scale log collection[C]//International Conference on Large Installation System Administration. New York,USA, 2010:163-177.
[16] LOGOTHETIS D, TREZZO C, WEBB K, et al. In-situ mapreduce for log processing[C]//Usenix Conference on Hot Topics in Cloud Computing. Berkeley, USA, 2012:26-26.
[17] TREZZO C J. Continuous mapreduce:an architecture for large-scale in-situ data processing[J]. Dissertations and theses-gradworks, 2010, 126(7):14.
[18] Apache.org. HDFS Architecture Guide[EB/OL].[2017-06-07]. http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html.
[19] DEAN J, GHEMAWAT S. Mapreduce:simplified data processing on large culsters[C]//Proc of the 6th Symp on Operating System Design and Implementation. San Francisco, USA, 2004:137-150.
[20] HAN U G, AHN J. Dynamic load balancing method for apache flume log processing[C]//Information Science and Technology. Shenzhen, China, 2014:83-86.
[21] Apache.org. Apache sqoop[EB/OL].[2017-06-07]. http://sqoop.apache.org/.
[22] BITTORF M, BOBROVYTSKY T, ERICKSON CCACJ, et al. Impala:a modern, open-source SQL engine for Hadoop[C]//Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. CA, USA, 2015:4-7.
[23] FLORATOU A, MINHAS U F, OZCAN F. SQL-on-Hadoop:full circle back to shared-nothing database architectures[J]. Proc of the VLDB endowment, 2014, 7(12):1199-1208.
[24] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark:cluster computing with working sets[J]. Book of extremes, 2010, 15(1):1765-1773.
[25] HE Y, LEE R, HUAI Y, et al. RCFile:a fast and space-efficient data placement structure in MapReduce-based warehouse systems.[C]//Proc of 27th IEEE Int Conf on Data Engineering. CA:IEEE Computer Society, 2011:1199-1208.
[26] MELNIK S, GUBAREV A, LONG J J, et al. Dremel:interactive analysis of web-scale datasets[J]. Communications of the Acm, 2011, 3(12):114-123.
相似文献/References:
[1]夏 虎,傅 彦,方育柯,等.一种自反馈垃圾信息综合过滤方法[J].智能系统学报,2010,5(2):117.
 XIA Hu,FU Yan,FANG Yu-ke,et al.A selffeedback synthesis method for spam filtering[J].CAAI Transactions on Intelligent Systems,2010,5():117.
[2]辛雨璇,闫子飞.基于手绘草图的图像检索技术研究进展[J].智能系统学报,2015,10(2):167.[doi:10.3969/j.issn.1673-4785.201401045]
 XIN Yuxuan,YAN Zifei.Research progress of image retrieval based on hand-drawn sketches[J].CAAI Transactions on Intelligent Systems,2015,10():167.[doi:10.3969/j.issn.1673-4785.201401045]
[3]王德文,孙志伟.一种基于内存计算的电力用户聚类分析方法[J].智能系统学报,2015,10(4):569.[doi:10.3969/j.issn.1673-4785.201411011]
 WANG Dewen,SUN Zhiwei.A method for cluster analysis of electric power consumers based on in-memory computing[J].CAAI Transactions on Intelligent Systems,2015,10():569.[doi:10.3969/j.issn.1673-4785.201411011]
[4]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(4):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10():607.[doi:10.3969/j.issn.1673-4785.201411036]
[5]黄河燕,曹朝,冯冲.大数据情报分析发展机遇及其挑战[J].智能系统学报,2016,11(6):719.[doi:10.11992/tis.201610025]
 HUANG Heyan,CAO Zhao,FENG Chong.Opportunities and challenges of big data intelligence analysis[J].CAAI Transactions on Intelligent Systems,2016,11():719.[doi:10.11992/tis.201610025]
[6]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[7]苗夺谦,张清华,钱宇华,等.从人类智能到机器实现模型——粒计算理论与方法[J].智能系统学报,2016,11(6):743.[doi:10.11992/tis.201612014]
 MIAO Duoqian,ZHANG Qinghua,QIAN Yuhua,et al.From human intelligence to machine implementation model: theories and applications based on granular computing[J].CAAI Transactions on Intelligent Systems,2016,11():743.[doi:10.11992/tis.201612014]
[8]严新平,柳晨光.智能航运系统的发展现状与趋势[J].智能系统学报,2016,11(6):807.[doi:10.11992/tis.201605007]
 YAN Xinping,LIU Chenguang.Review and prospect for intelligent waterway transportation system[J].CAAI Transactions on Intelligent Systems,2016,11():807.[doi:10.11992/tis.201605007]
[9]许立波,潘旭伟,袁平,等.知识智能涌现创新:概念、体系与路径[J].智能系统学报,2017,12(1):47.[doi:10.11992/tis.201610014]
 XU Libo,PAN Xuwei,YUAN Ping,et al.Knowledge innovation by intelligent emergence—concept, framework and its pathway[J].CAAI Transactions on Intelligent Systems,2017,12():47.[doi:10.11992/tis.201610014]
[10]马钰,张岩,王宏志,等.面对智能导诊的个性化推荐算法[J].智能系统学报,2018,13(3):352.[doi:10.11992/tis.201711036]
 MA Yu,ZHANG Yan,WANG Hongzhi,et al.A personalized recommendation algorithm for intelligent guidance[J].CAAI Transactions on Intelligent Systems,2018,13():352.[doi:10.11992/tis.201711036]

备注/Memo

收稿日期:2017-06-07。
基金项目:国家自然科学基金项目(91646201,91546111,60803086);国家科技支撑计划子课题(2013BAH21B02-01);北京市自然科学基金项目(4153058,4113076);北京市教委重点项目(KZ20160005009);北京市教委面上项目(KM201710005023).
作者简介:何明,男,1975年生,博士,主要研究方向为大数据、推荐系统、机器学习;常盟盟,男,1987年生,硕士研究生,主要研究方向为数据挖掘、机器学习;刘郭洋,男,1986年生,硕士研究生,主要研究方向为大数据、数据挖掘。
通讯作者:何明.E-mail:heming@bjut.edu.cn

更新日期/Last Update: 2017-10-25
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com