<-Previous Article Next Article->

[1]SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(4):607-614.[doi:10.3969/j.issn.1673-4785.201411036]

Copy

An optimized algorithm of K-means based on data set partition on CMP systems

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 10 Number of periods: 2015 4 Page number: 607-614 Column: 学术论文—机器学习 Public date: 2015-08-25

Title:: An optimized algorithm of K-means based on data set partition on CMP systems

Author(s):: SHEN Yan¹; 2; ZHU Yuquan²; 1. Department of Information Management and Information System, Jiangsu University, Zhenjiang 212013, China;
2. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

Keywords:: k-means; clustering algorithm; CMP; massive data set; data mining; unsupervised learning; big data

CLC:: TP181

DOI:: 10.3969/j.issn.1673-4785.201411036

Abstract:: The traditional K-means clustering algorithm is not designed to focus on parallelization, which can not make use of the multi-core computing capability of the modern CPU. Therefore, the clustering efficiency of the traditional K-means for massive data set should be further improved. In this paper, a novel algorithm named Multi-core K-means (MC-K-means) after redesigning the original K-means that focuses on parallelization in a chip multi-processor CMP environment is proposed. In order to utilize the multi-core computing capability of the modern CPU, MC-K-means partitions the clustering tasks into some independent and balanced subtasks and distributes these subtasks to the threads to execute parallel. The experimental results showed that the MC-K-means algorithm received the relatively higher speedup rate compared to the K-means algorithm, which improves the handling capacity for massive data set.

References:: [1] SUBRAMANIAM V. Programming concurrency on the JVM mastering synchronization, STM, and actors[M]. Beijing: China Machine Press,2013:1-27.
[2] AARON B, TAMIR D E, RISHE N D, et al. Dynamic incremental K-means clustering[C]// Proc of the 2014 International Conference on Computational Science and Computational Intelligence, CSCI 2014. Los Alamitos, CA: IEEE Computer Society, 2014: 308-313.
[3] SARMA T H, VISWANATH P, REDDY B E. Single pass kernel k-means clustering method[J]. Sadhana-Academy Proceedings in Engineering Sciences, 2013, 38(3): 407-419.
[4] BRADLEY P, FAYYAD U, REINA C. Scaling clustering algorithms to large databases[R]. Redmond:Microsoft Research Report,1998:9-15.
[5] 陈光平,王文鹏,黄俊. 一种改进初始聚类中心选择的K-means算法[J]. 小型微型计算机系统,2012,33(6): 1320-1323. CHEN Guangping, WANG Wenpeng, HUANG Jun. Improved initial clustering center selection method for k-means algorith[J]. Journal of Chinese Computer Systems, 2012, 33(6): 1320-1323.
[6] MAHMUD M S, RAHMAN M M, AKHTAR M N. Improvement of k-means clustering algorithm with better initial centroids based on weighted average[C]//Proc of the 7th International Conference on Electrical and Computer Engineering, ICECE 2012. Los Alamitos, CA: IEEE Computer Society, 2012: 647-650.
[7] PATIL R, JONDHALE K C. Edge based technique to estimate number of clusters in k-means color image segmentation[C]//Proc of the 3rd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2010. Piscataway, NJ: IEEE Computer Society, 2010: 117-121.
[8] JING Liping, NG M K, HUANG zhexue. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026-1041.
[9] BISHNU P S, BHATTACHERJEE V. A dimension reduction technique for k-means clustering algorithm[C]//Proc of the 1st International Conference on Recent Advances in Information Technology, RAIT-2012. Piscataway, NJ: IEEE Computer Society,2012: 531-535.
[10] DOBBELIN R, SCHUTT T, REINEFELD A. An analysis of SMP memory allocators: mapreduce on large shared-memory systems[C] //Proc of the 41st International Conference on Parallel Processing Workshops (ICPPW), 2012. Piscataway, NJ: IEEE, 2012: 48-54.
[11] DI F G, BLASA F, CAFIERO S, et al. Fault tolerant decentralised k-means clustering for asynchronous large-scale networks[J]. Journal of Parallel and Distributed Computing, 2013, 73(3): 317-329.
[12] 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10): 166-169.ZHAO Weizhong, MA Huifang, FU Yanxiang, et al. Reasearch on parallel k-means algorithm design based on hadoop platform [J]. Computer Science, 2011,38(10): 166-169.
[13] 王晓华. MapReduce 2.0源码分析与编程实战[M].北京:人民邮电出版社, 2014:1-55.
[14] MARTHA, V, ZHAO Weizhong, XV Xiaowei. H-MapReduce: A framework for workload balancing in MapReduce[C]// Proc of the International Conference on Advanced Information Networking and Applications, AINA. Piscataway, NJ: IEEE, 2013: 637-644.
[15] ZHAO Weizhong, MA Huifang, HE Qing. Parallel k-means clustering based on mapreduce[C]//Proc of the 1st International Conference on Cloud Computing, CloudCom 2009. Germany: Springer Verlag, 2009: 674-679.
[16] FAHIM A M. Parallel implementation of k-means on multi-core processors[J]. Computer Science and Telecommunications, 2014, 1(41): 53-61.
[17] ZALIK K R. An efficient k-means clustering algorithm[J]. Pattern Recognition Letters, 2008, 29(9): 1385-1391.
[18] HERBERT S, DALE S. A comprehansive introduction[M]. Beijing: China Machine Press, 2013
[19] JAVIER F G. Java 7 concurrency cookbook[M]. Beijing: Posts & Telecom Press, 2014.
[20] Monitoring and managing java se 6 platform applications[EB/OL]. [2005-12-18].http://java.sun.com/developer/technicalArticles/J2SE/monitoring.

Similar References:

Memo

Last Update: 2015-08-28

An optimized algorithm of K-means based on data set partition on CMP systems PDF DownloadHTML

Memo

An optimized algorithm of K-means based on data set partition on CMP systems

PDF Download HTML