[1]QI Xiaogang,HU Qiuqiu,LIU Lifang.Parallel anomaly algorithm based on MapReduce[J].CAAI Transactions on Intelligent Systems,2019,14(2):224-230.[doi:10.11992/tis.201809007]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
14
Number of periods:
2019 2
Page number:
224-230
Column:
学术论文—机器学习
Public date:
2019-03-05
- Title:
-
Parallel anomaly algorithm based on MapReduce
- Author(s):
-
QI Xiaogang1; HU Qiuqiu1; LIU Lifang2
-
1. School of Mathematics and Statistics, Xidian University, Xi’an 710071, China;
2. School of Computer Science and Technology, Xidian University, Xi’an 710071, China
-
- Keywords:
-
data mining; anomaly detection; local outlier factor; Hadoop; MapReduce; Distributed File System; parallel computing; local density
- CLC:
-
TP311
- DOI:
-
10.11992/tis.201809007
- Abstract:
-
To improve the accuracy, sensitivity, and efficiency of anomaly detection algorithm in data mining when the amount of data increases, a parallel anomaly detection algorithm (MR-LOF) based on the MapReduce framework and the local outlier factor (LOF) algorithm is proposed in this paper. First, the dataset, stored in the Hadoop distributed file system (HDFS), is logically divided into multiple data blocks. Then, the MapReduce principle is used to process the data in each data block in parallel, so that the k-distance and LOF value of each data point is calculated only in a single block. It greatly improves the efficiency of the algorithm. Simultaneously, the concept of k-distance is redefined. It avoids the situation where the local density is infinite because more than k repeated points exist in the dataset. Finally, the data points whose LOF value is larger than threshold are merged, and the LOF values of combined data are recalculated. This process can effectively improve the accuracy and sensitivity. Experiments with real-world datasets demonstrate the validity, high efficiency, and extendibility of the MR-DLOF algorithm.