[1]刘贝贝,马儒宁,丁军娣.基于密度的统计合并聚类算法[J].智能系统学报编辑部,2015,10(5):712-721.[doi:10.11992/tis.201410028]
 LIU Beibei,MA Runing,DING Jundi.Density-based statistical merging clustering algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(5):712-721.[doi:10.11992/tis.201410028]
点击复制

基于密度的统计合并聚类算法(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第10卷
期数:
2015年5期
页码:
712-721
栏目:
出版日期:
2015-10-25

文章信息/Info

Title:
Density-based statistical merging clustering algorithm
作者:
刘贝贝1 马儒宁1 丁军娣2
1. 南京航空航天大学 理学院, 江苏 南京 211100;
2. 南京理工大学 计算机科学与技术学院, 江苏 南京 210094
Author(s):
LIU Beibei1 MA Runing1 DING Jundi2
1. College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China;
2. School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China
关键词:
数据点密度随机变量合并聚类噪声
Keywords:
data pointsdensityrandom variablemergingclustering algorithmnoise
分类号:
O235;TP311
DOI:
10.11992/tis.201410028
文献标志码:
A
摘要:
针对现有聚类算法处理噪声能力差和速度较慢的问题,提出了一种基于密度的统计合并聚类算法(DSMC)。该算法将数据点的每一个特征看作一组独立随机变量,根据独立有限差分不等式得出统计合并判定准则;同时,结合数据点的密度信息,把密度从大到小的排序作为凝聚过程中的合并顺序,实现了各类数据点的统计合并。人工数据集和真实数据集的实验结果表明,DSMC算法不仅可以处理凸状数据集,对于非凸、重叠、加入噪声的数据集也有良好的聚类效果,充分表明了该算法的适用性和有效性。
Abstract:
The ability of existing clustering algorithms to deal with noise is poor, and the speed is slow, instead this paper proposes a density-based statistical merging clustering algorithm (DSMC). The new algorithm takes each group of data points as a set of independent random variables, and gathers statistical criteria from the independent bounded difference inequality. Meanwhile, combined with the density information of the data points, the DSMC algorithm takes the descending order of the density as the merging order in the process of condensation, and thereby achieves statistical merging of different types of data points. The experimental results with both artificial datasets and real datasets show that the DSMC algorithm can not only deal with convex data set, and also has good clustering effects on nonconvex shaped, overlapped and noisy, data sets. This proves that the algorithm has good applicability and validity.

参考文献/References:

[1] XU Rui, WUNSCHII D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3):645-678.
[2] JAIN A K, MURTY M N, FLYNN P J. Data clustering:a review[J]. Acm Computing Surveys, 1999, 31(2):264-323.
[3] MURTAGH F, CONTRERAS P. Algorithms for hierarchical clustering:an overview[J]. Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery, 2012, 2(1):86-97.
[4] TSENG L Y, YANG S B. A genetic approach to the automatic clustering problem[J]. Pattern Recognition, 2001, 34(2):415-424.
[5] FORGY E W. Cluster analysis of multivariate data:efficiency versus interpretability of classifications[J]. Biometrics, 1965, 21:768-769.
[6] SHI J, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
[7] BEZDEK J C, EHRLICH R, FULL W. FCM:The fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2-3):191-203.
[8] KRISHNAPURAM R, KELLER J M. A possibilistic approach to clustering[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(2):98-110.
[9] ALPERT C J, KAHNG A B. Recent directions in netlist partitioning:a survey[J]. Integration, the VLSI Journal, 1995, 19(1):1-81.
[10] ACKERMANN M R, BLÖMER J, KUNTZE D, et al. Analysis of agglomerative clustering[J]. Algorithmica, 2014, 69(1):184-215.
[11] GUHA S, RASTOGI R, SHIM K. Cure:an efficient clustering algorithm for large databases[J]. Information Systems, 2001, 26(1):35-58.
[12] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining. Portland, USA, 1996:226-231.
[13] ZHOU Shuigeng, ZHAO Yue, GUAN Jihong, et al. A neighborhood-based clustering algorithm[M]//Advances in Knowledge Discovery and Data Mining. Berlin/Heidelberg:Springer, 2005:361-371.
[14] 马儒宁, 王秀丽, 丁军娣. 多层核心集凝聚算法[J]. 软件学报, 2013, 24(3):490-506. MA Runing, WANG Xiuli, DING Jundi. Multilevel core-sets based aggregation clustering algorithm[J]. Journal of Software, 2013, 24(3):490-506.
[15] ZHUANG Xuan, HUANG Yan, PALANIAPPAN K, et al. Gaussian mixture density modeling, decomposition, and applications[J]. IEEE Transactions on Image Processing, 1996, 5(9):1293-1302.
[16] MACLACHLAN G J, KRISHNAN T. The EM algorithm and extensions[J]. Series in Probability & Statistics, 1997, 15(1):154-156.
[17] NOCK R, NIELSEN F. Statistical region merging[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(11):1452-1458.
[18] HABIB M, MCDIARMID C, RAMIREZ-ALFONSIN J, et al. Probabilistic methods for algorithmic discrete mathematics[M]. Berlin:Springer-Verlag, 1998:1-54.

相似文献/References:

[1]郑文萍,张浩杰,王杰.基于稠密子图的社区发现算法[J].智能系统学报编辑部,2016,11(3):426.[doi:10.11992/tis.201603045]
 ZHENG Wenping,ZHANG Haojie,WANG Jie.Community detection algorithm based on dense subgraphs[J].CAAI Transactions on Intelligent Systems,2016,11(5):426.[doi:10.11992/tis.201603045]
[2]王俊红,段冰倩.一种基于密度的SMOTE方法研究[J].智能系统学报编辑部,2017,12(06):865.[doi:10.11992/tis.201706049]
 WANG Junhong,DUAN Bingqian.Research on the SMOTE method based on density[J].CAAI Transactions on Intelligent Systems,2017,12(5):865.[doi:10.11992/tis.201706049]

备注/Memo

备注/Memo:
收稿日期:2014-10-21;改回日期:。
基金项目:国家自然科学基金资助项目(61103058).
作者简介:刘贝贝,女,1990年生,硕士研究生,主要研究方向为模式识别;马儒宁,男,1976年生,副教授,博士,主要研究方向为应用数学、模式识别。参与完成国家自然科学基金项目10余项。发表学术论文20余篇,其中被SCI、EI收录10余篇;丁军娣,女,1978年生,副教授,博士,中国计算机学会会员,主要研究方向为模式识别、计算机视觉。主持并完成国家自然科学基金项目10余项。发表学术论文20余篇,其中被SCI、EI收录10余篇。
通讯作者:丁军娣.E-mail:dingjundi2010@njust.edu.cn.
更新日期/Last Update: 2015-11-16