[1]SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(4):607-614.[doi:10.3969/j.issn.1673-4785.201411036]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
10
Number of periods:
2015 4
Page number:
607-614
Column:
学术论文—机器学习
Public date:
2015-08-25
- Title:
-
An optimized algorithm of K-means based on data set partition on CMP systems
- Author(s):
-
SHEN Yan1; 2; ZHU Yuquan2
-
1. Department of Information Management and Information System, Jiangsu University, Zhenjiang 212013, China;
2. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
-
- Keywords:
-
k-means; clustering algorithm; CMP; massive data set; data mining; unsupervised learning; big data
- CLC:
-
TP181
- DOI:
-
10.3969/j.issn.1673-4785.201411036
- Abstract:
-
The traditional K-means clustering algorithm is not designed to focus on parallelization, which can not make use of the multi-core computing capability of the modern CPU. Therefore, the clustering efficiency of the traditional K-means for massive data set should be further improved. In this paper, a novel algorithm named Multi-core K-means (MC-K-means) after redesigning the original K-means that focuses on parallelization in a chip multi-processor CMP environment is proposed. In order to utilize the multi-core computing capability of the modern CPU, MC-K-means partitions the clustering tasks into some independent and balanced subtasks and distributes these subtasks to the threads to execute parallel. The experimental results showed that the MC-K-means algorithm received the relatively higher speedup rate compared to the K-means algorithm, which improves the handling capacity for massive data set.