[1]HE Ming,CHANG Mengmeng,LIU Guoyang,et al.Log mining and application based on sql-on-hadoop query engine[J].CAAI Transactions on Intelligent Systems,2017,12(5):717-728.[doi:10.11992/tis.201706016]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
12
Number of periods:
2017 5
Page number:
717-728
Column:
学术论文—智能系统
Public date:
2017-10-25
- Title:
-
Log mining and application based on sql-on-hadoop query engine
- Author(s):
-
HE Ming1; CHANG Mengmeng1; LIU Guoyang2; GU Chengxiang2; PENG Jike2
-
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
2. Information Technology Management Department, Haitong Securities Co., Ltd., Shanghai 200001, China
-
- Keywords:
-
big data; log analysis; data mining; Hadoop; query engine; data collection; indexed storage; securities business
- CLC:
-
TP391
- DOI:
-
10.11992/tis.201706016
- Abstract:
-
With the rapid development of computing and networking technologies, and the increase in the number of data acquisition methods, the demand for real-time processing of massive amounts of log data is increasing every day, and there is a calculation bottleneck when traditional log analysis technology is used to process massive amounts of data. With the development of open processing platforms in the era of big data, a number of big data processing systems have emerged for dealing with large-scale and diverse data. To effectively apply the advantages of Hadoop to the original businesses, in this study, we first investigated network log analysis methods based on big data technology and constructed a network log analysis platform for the acquisition, analysis, storage, high-efficiency and flexible queries, and the calculation of trillions of log entries. In addition, we compared and analyzed three representative SQL-on-Hadoop query systems including Hive, Impala, and Spark SQL, and identified the performance characteristics of this type of system. We used the TPC-H testing reference to test and assess their decision-making support abilities. We drew some useful conclusions from the analysis of the experimental data. We also suggest a few typical applications for this analysis and processing system for massive log data in the securities fields, which provides a solid foundation for further research.