<-Previous Article Next Article->

[1]QI Xiaogang,HU Qiuqiu,LIU Lifang.Parallel anomaly algorithm based on MapReduce[J].CAAI Transactions on Intelligent Systems,2019,14(2):224-230.[doi:10.11992/tis.201809007]

Copy

Parallel anomaly algorithm based on MapReduce

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 14 Number of periods: 2019 2 Page number: 224-230 Column: 学术论文—机器学习 Public date: 2019-03-05

Title:: Parallel anomaly algorithm based on MapReduce

Author(s):: QI Xiaogang¹; HU Qiuqiu¹; LIU Lifang²; 1. School of Mathematics and Statistics, Xidian University, Xi’an 710071, China;
2. School of Computer Science and Technology, Xidian University, Xi’an 710071, China

Keywords:: data mining; anomaly detection; local outlier factor; Hadoop; MapReduce; Distributed File System; parallel computing; local density

CLC:: TP311

DOI:: 10.11992/tis.201809007

Abstract:: To improve the accuracy, sensitivity, and efficiency of anomaly detection algorithm in data mining when the amount of data increases, a parallel anomaly detection algorithm (MR-LOF) based on the MapReduce framework and the local outlier factor (LOF) algorithm is proposed in this paper. First, the dataset, stored in the Hadoop distributed file system (HDFS), is logically divided into multiple data blocks. Then, the MapReduce principle is used to process the data in each data block in parallel, so that the k-distance and LOF value of each data point is calculated only in a single block. It greatly improves the efficiency of the algorithm. Simultaneously, the concept of k-distance is redefined. It avoids the situation where the local density is infinite because more than k repeated points exist in the dataset. Finally, the data points whose LOF value is larger than threshold are merged, and the LOF values of combined data are recalculated. This process can effectively improve the accuracy and sensitivity. Experiments with real-world datasets demonstrate the validity, high efficiency, and extendibility of the MR-DLOF algorithm.

References:: [1] HAN Jiawei, KAMBER M. Data mining:concepts and techniques[M]. San Francisco:Morgan Kaufmann Publishers Inc., 2006:1-18.
[2] AGGARWAL C C. Outlier analysis[M]. New York:Springer, 2013:75-99.
[3] CHEN Feng, DENG Pan, WAN Jiafu, et al. Data mining for the internet of things:literature review and challenges[J]. International journal of distributed sensor networks, 2015, 2015:431047.
[4] 吴镜锋, 金炜东, 唐鹏. 数据异常的监测技术综述[J]. 计算机科学, 2017, 44(S2):24-28 WU Jingfeng, JIN Weidong, TANG Peng. Survey on monitoring techniques for data abnormalities[J]. Computer science, 2017, 44(S2):24-28
[5] STEINWART I, HUSH D, SCOVEL C. A classification framework for anomaly detection[J]. The journal of machine learning research, 2005, 6(1):211-232.
[6] 邓红莉, 杨韬. 一种基于深度学习的异常检测方法[J]. 信息通信, 2015(3):3-4 DENG Hongli, YANG Tao. An anomaly detection method based on deep learning[J]. Information and communications, 2015(3):3-4
[7] ZHAO Xuanqiang, WANG Guoying, LI Zhixing. Unsupervised network anomaly detection based on abnormality weights and subspace clustering[C]//Proceedings of the 6th International Conference on Information Science and Technology. Dalian, China, 2016:482-486.
[8] 左进, 陈泽茂. 基于改进K均值聚类的异常检测算法[J]. 计算机科学, 2016, 43(8):258-261 ZUO Jin, CHEN Zemao. Anomaly detection algorithm based on improved K-means clustering[J]. Computer science, 2016, 43(8):258-261
[9] 邹云峰, 张昕, 宋世渊, 等. 基于局部密度的快速离群点检测算法[J]. 计算机应用, 2017, 37(10):2932-2937 ZOU Yunfeng, ZHANG Xin, SONG Shiyuan, et al. Fast outlier detection algorithm based on local density[J]. Journal of computer applications, 2017, 37(10):2932-2937
[10] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF:identifying density-based local outliers[C]//Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. Dallas, Texas, USA, 2000:93-104.
[11] BHATT V, SHARMA K G, RAM A. An enhanced approach for LOF in data mining[C]//Proceedings of 2013 International Conference on Green High Performance Computing. Nagercoil, India, 2013:1-3.
[12] MIAO Dandan, QIN Xiaowei, WANG Weidong. Anomalous cell detection with kernel density-based local outlier factor[J]. China communications, 2015, 12(9):64-75.
[13] TANG Bo, HE Haibo. A local density-based approach for outlier detection[J]. Neurocomputing, 2017, 241:171-180.
[14] DEAN J, GHEMAWAT S. MapReduce:simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1):107-113.
[15] GHEMAWAT S, GOBIOFF H, LEUNG S T. The google file system[C]//Proceedings of the 19th ACM Symposium on Operating Systems Principles. Bolton Landing, NY, USA, 2003:29-43.
[16] CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable:a distributed storage system for structured data[C]//Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. Seattle, WA, 2006:15.
[17] MU Yashuang, LIU Xiaodong, YANG Zhihao, et al. A parallel C4.5 decision tree algorithm based on MapReduce[J]. Concurrency and computation practice and experience, 2017, 29(8):e4015.
[18] 侯泳旭, 段磊, 秦江龙, 等. 基于Isolation Forest的并行化异常探测设计[J]. 计算机工程与科学, 2017, 39(2):236-244 HOU Yongxu, DUAN Lei, QIN Jianglong, et al. Parallel anomaly detection based on isolation forest[J]. Computer engineering & science, 2017, 39(2):236-244
[19] 叶海琴, 孟彩霞, 王意锋, 等. 一种基于MapReduce的频繁模式挖掘算法[J]. 南京理工大学学报, 2018, 42(1):62-67 YE Haiqin, MENG Caixia, WANG Yifeng, et al. Frequent pattern mining algorithm based on MapReduce[J]. Journal of Nanjing University of Science and Technology, 2018, 42(1):62-67
[20] 刘义, 景宁, 陈荦, 等. MapReduce框架下基于R-树的k-近邻连接算法[J]. 软件学报, 2013, 24(8):1836-1851 LIU Yi, JING Ning, CHEN Luo, et al. Algorithm for processing k-nearest join based on R-tree in MapReduce[J]. Journal of software, 2013, 24(8):1836-1851
[21] WHITE T. Hadoop:the definitive guide[M]. 4th ed. Gravenstein Highway North:O’reilly Media Inc., 2015:1-4.
[22] http://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/.

Similar References:

Memo

Last Update: 2019-04-25

Parallel anomaly algorithm based on MapReduce PDF DownloadHTML

Memo

Parallel anomaly algorithm based on MapReduce

PDF Download HTML