字符串 ') and Issue_No=(select Issue_No from OA where Script_ID=@Script_ID) order by ID ' 后的引号不完整。 ') and Issue_No=(select Issue_No from OA where Script_ID=@Script_ID) order by ID ' 附近有语法错误。 CMP上基于数据集划分的K-means多核优化算法-《智能系统学报》编辑部

[1]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报编辑部,2015,10(04):607-614.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(04):607-614.[doi:10.3969/j.issn.1673-4785.201411036]
点击复制

CMP上基于数据集划分的K-means多核优化算法(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第10卷
期数:
2015年04期
页码:
607-614
栏目:
出版日期:
2015-08-25

文章信息/Info

Title:
An optimized algorithm of K-means based on data set partition on CMP systems
作者:
申彦12 朱玉全2
1. 江苏大学 信息管理与信息系统系, 江苏 镇江 212013;
2. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212013
Author(s):
SHEN Yan12 ZHU Yuquan2
1. Department of Information Management and Information System, Jiangsu University, Zhenjiang 212013, China;
2. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
关键词:
K均值算法聚类算法单片多核大规模数据集数据挖掘无监督学习大数据
Keywords:
k-meansclustering algorithmCMPmassive data setdata miningunsupervised learningbig data
分类号:
TP181
DOI:
10.3969/j.issn.1673-4785.201411036
文献标志码:
A
摘要:
虽然现在多核CPU非常普及,但传统K-means聚类算法由于没有专门进行并行化设计,不能充分利用现代CPU的多核计算能力,算法针对大规模数据集的聚类效率有待进一步提高。因此,对K-means算法进行CMP并行化改进,提出了一种Multi-core K-means(MC-K-means)算法。该算法对K-means的聚类任务进行了分解,设计了独立且均衡的聚类子任务并分配给各线程并行执行,以此利用现代CPU的多核计算能力。实验结果表明,MC-K-means相比K-means获得了较高的多核加速比,提高了针对大规模数据集的聚类能力。
Abstract:
The traditional K-means clustering algorithm is not designed to focus on parallelization, which can not make use of the multi-core computing capability of the modern CPU. Therefore, the clustering efficiency of the traditional K-means for massive data set should be further improved. In this paper, a novel algorithm named Multi-core K-means (MC-K-means) after redesigning the original K-means that focuses on parallelization in a chip multi-processor CMP environment is proposed. In order to utilize the multi-core computing capability of the modern CPU, MC-K-means partitions the clustering tasks into some independent and balanced subtasks and distributes these subtasks to the threads to execute parallel. The experimental results showed that the MC-K-means algorithm received the relatively higher speedup rate compared to the K-means algorithm, which improves the handling capacity for massive data set.

参考文献/References:

[1] SUBRAMANIAM V. Programming concurrency on the JVM mastering synchronization, STM, and actors[M]. Beijing: China Machine Press,2013:1-27.
[2] AARON B, TAMIR D E, RISHE N D, et al. Dynamic incremental K-means clustering[C]// Proc of the 2014 International Conference on Computational Science and Computational Intelligence, CSCI 2014. Los Alamitos, CA: IEEE Computer Society, 2014: 308-313.
[3] SARMA T H, VISWANATH P, REDDY B E. Single pass kernel k-means clustering method[J]. Sadhana-Academy Proceedings in Engineering Sciences, 2013, 38(3): 407-419.
[4] BRADLEY P, FAYYAD U, REINA C. Scaling clustering algorithms to large databases[R]. Redmond:Microsoft Research Report,1998:9-15.
[5] 陈光平,王文鹏,黄俊. 一种改进初始聚类中心选择的K-means算法[J]. 小型微型计算机系统,2012,33(6): 1320-1323. CHEN Guangping, WANG Wenpeng, HUANG Jun. Improved initial clustering center selection method for k-means algorith[J]. Journal of Chinese Computer Systems, 2012, 33(6): 1320-1323.
[6] MAHMUD M S, RAHMAN M M, AKHTAR M N. Improvement of k-means clustering algorithm with better initial centroids based on weighted average[C]//Proc of the 7th International Conference on Electrical and Computer Engineering, ICECE 2012. Los Alamitos, CA: IEEE Computer Society, 2012: 647-650.
[7] PATIL R, JONDHALE K C. Edge based technique to estimate number of clusters in k-means color image segmentation[C]//Proc of the 3rd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2010. Piscataway, NJ: IEEE Computer Society, 2010: 117-121.
[8] JING Liping, NG M K, HUANG zhexue. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026-1041.
[9] BISHNU P S, BHATTACHERJEE V. A dimension reduction technique for k-means clustering algorithm[C]//Proc of the 1st International Conference on Recent Advances in Information Technology, RAIT-2012. Piscataway, NJ: IEEE Computer Society,2012: 531-535.
[10] DOBBELIN R, SCHUTT T, REINEFELD A. An analysis of SMP memory allocators: mapreduce on large shared-memory systems[C] //Proc of the 41st International Conference on Parallel Processing Workshops (ICPPW), 2012. Piscataway, NJ: IEEE, 2012: 48-54.
[11] DI F G, BLASA F, CAFIERO S, et al. Fault tolerant decentralised k-means clustering for asynchronous large-scale networks[J]. Journal of Parallel and Distributed Computing, 2013, 73(3): 317-329.
[12] 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10): 166-169.ZHAO Weizhong, MA Huifang, FU Yanxiang, et al. Reasearch on parallel k-means algorithm design based on hadoop platform [J]. Computer Science, 2011,38(10): 166-169.
[13] 王晓华. MapReduce 2.0源码分析与编程实战[M].北京:人民邮电出版社, 2014:1-55.
[14] MARTHA, V, ZHAO Weizhong, XV Xiaowei. H-MapReduce: A framework for workload balancing in MapReduce[C]// Proc of the International Conference on Advanced Information Networking and Applications, AINA. Piscataway, NJ: IEEE, 2013: 637-644.
[15] ZHAO Weizhong, MA Huifang, HE Qing. Parallel k-means clustering based on mapreduce[C]//Proc of the 1st International Conference on Cloud Computing, CloudCom 2009. Germany: Springer Verlag, 2009: 674-679.
[16] FAHIM A M. Parallel implementation of k-means on multi-core processors[J]. Computer Science and Telecommunications, 2014, 1(41): 53-61.
[17] ZALIK K R. An efficient k-means clustering algorithm[J]. Pattern Recognition Letters, 2008, 29(9): 1385-1391.
[18] HERBERT S, DALE S. A comprehansive introduction[M]. Beijing: China Machine Press, 2013
[19] JAVIER F G. Java 7 concurrency cookbook[M]. Beijing: Posts & Telecom Press, 2014.
[20] Monitoring and managing java se 6 platform applications[EB/OL]. [2005-12-18].http://java.sun.com/developer/technicalArticles/J2SE/monitoring.

相似文献/References:

[1]朱 林,王士同,修 宇.鲁棒的模糊方向相似性聚类算法[J].智能系统学报编辑部,2008,3(01):43.
 ZHU Lin,WANG Shi-tong,XIU Yu.A robust clustering algorithm with fuzzy directional similarity[J].CAAI Transactions on Intelligent Systems,2008,3(04):43.
[2]郭瑛洁,王士同,许小龙.基于最大间隔理论的组合距离学习算法[J].智能系统学报编辑部,2015,10(6):843.[doi:10.11992/tis.201504027]
 GUO Yingjie,WANG Shitong,XU Xiaolong.Learning a linear combination of distances based on the maximum-margin theory[J].CAAI Transactions on Intelligent Systems,2015,10(04):843.[doi:10.11992/tis.201504027]
[3]陈爱国,王士同.基于极大熵的知识迁移模糊聚类算法[J].智能系统学报编辑部,2017,12(01):95.[doi:10.11992/tis.201602003]
 CHEN Aiguo,WANG Shitong.A maximum entropy-based knowledge transfer fuzzy clustering algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(04):95.[doi:10.11992/tis.201602003]
[4]淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法[J].智能系统学报编辑部,2017,12(02):229.[doi:10.11992/tis.201512036]
 GAN Wenyan,LIU Chong.An improved clustering algorithm that searches and finds density peaks[J].CAAI Transactions on Intelligent Systems,2017,12(04):229.[doi:10.11992/tis.201512036]

备注/Memo

备注/Memo:
收稿日期:2014-11-28;改回日期:。
基金项目:国家自然科学基金资助项目(71271117);国家科技支撑计划基金资助项目(2010BAI88B00);江苏省自然科学基础研究计划基金资助项目(BK2010331);江苏省博士研究生创新计划基金资助项目(CX10B_016X);江苏省博士后科研资助计划项目(1401056C);江苏大学高级人才基金资助项目(13JDG127).
作者简介:申彦,男,1982年生,讲师,博士,主要研究方向为数据挖掘、智能信息系统。获2014年度中国商业联合会科学技术奖三等奖。发表学术论文11篇,其中被EI检索5篇;朱玉全,男,1965年生,教授,博士生导师,主要研究方向为数据挖掘、智能信息系统、信息系统集成。获2014年度中国商业联合会科学技术奖三等奖,全国多媒体课件大赛一等奖和江苏省优秀软件产品奖(金慧奖)各1项,省部级科技进步奖4次,申请发明专利10项,其中授权发明专利3项,获批计算机软件著作权7部。发表学术论文70余篇,10多篇被EI检索,出版编著2部。
通讯作者:申彦.E-mail:104186179@qq.com.
更新日期/Last Update: 2015-08-28