[1]冯柳伟,常冬霞,邓勇,等.最近最远得分的聚类性能评价指标[J].智能系统学报,2017,12(01):67-74.[doi:10.11992/tis.201611007]
 FENG Liuwei,CHANG Dongxia,DENG Yong,et al.A clustering evaluation index based on the nearest and furthest score[J].CAAI Transactions on Intelligent Systems,2017,12(01):67-74.[doi:10.11992/tis.201611007]
点击复制

最近最远得分的聚类性能评价指标(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第12卷
期数:
2017年01期
页码:
67-74
栏目:
出版日期:
2017-02-25

文章信息/Info

Title:
A clustering evaluation index based on the nearest and furthest score
作者:
冯柳伟12 常冬霞12 邓勇3 赵耀12
1. 北京交通大学 信息科学研究所, 北京 100044;
2. 北京交通大学 计算机与信息科学学院, 北京 100044;
3. 中国科学院 软件研究所, 北京 100190
Author(s):
FENG Liuwei12 CHANG Dongxia12 DENG Yong3 ZHAO Yao12
1. Institute of Information Science, Beijing Jiaotong University Beijing 100044, China;
2. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;
3. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
关键词:
最近邻一致性最远邻相异性K-means聚类算法评分机制评价指标层次聚类
Keywords:
the nearest neighbor consistencythe furthest neighbor differenceK-means clustering algorithmscoring mechanismevaluation indexhierarchical clustering
分类号:
TP391
DOI:
10.11992/tis.201611007
摘要:
聚类算法是数据分析中广泛使用的方法之一,而类别数往往是决定聚类算法性能的关键。目前,大部分聚类算法需要预先给定类别数,在很多情况下,很难根据数据集的先验知识获得有效的类别数。因此,为了获得数据集的类别数,本文基于最近邻一致性和最远邻相异性的准则,提出了一种最近最远得分评价指标,并在此基础上提出了一种自动确定类别数的聚类算法。实验结果证明了所提评价指标在确定类别数时的有效性和可行性。
Abstract:
The clustering algorithm is one of the widely-used methods in data analysis. However, the number of clusters is essential to determine the performance of the clustering algorithm. At present, the number of clusters usually need to be specified in advance. In most cases, it is difficult to obtain the valid cluster number according to a priori knowledge of the dataset. To obtain the number of clusters automatically, a Nearest and Furthest Score (NFS) index was proposed based on the principles of the nearest neighbor consistency and the furthest neighbor difference. Moreover, an Automatic Clustering NFS (ACNFS) algorithm was also proposed, which can determine the number of clusters automatically. The experimental results prove the index is reasonable and practicable to determine the cluster number.

参考文献/References:

[1] 刘恋, 常冬霞, 邓勇. 动态小生境人工鱼群算法的图像分割[J]. 智能系统学报, 2015, 10(5): 669-674. LIU Lian, CHANG Dongxia, DENG Yong. An image segmentation method based on dynamic niche artificial fish-swarm algorithm[J]. CAAI transactions on intelligent systems, 2015, 10(5): 669-674.
[2] NIKOLAOU T G, KOLOKOTSA D S, STAVRAKAKIS G S, et al. On the application of clustering techniques for office buildings’ energy and thermal comfort classification[J]. IEEE transactions on smart grid, 2012, 3(4): 2196-2210.
[3] CHANG Hong, YEUNG D Y. Robust path-based spectral clustering with application to image segmentation[C]//Proceedings of the Tenth IEEE International Conference on Computer Vision. Beijing, China, 2005, 1: 278-285.
[4] SHI Jianbo, MALIK J. Normalized cuts and image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2000, 22(8): 888-905.
[5] XIE X L, BENI G. A validity measure for Fuzzy clustering[J]. IEEE transactions on pattern analysis and machine intelligence, 1991, 13(8): 841-847.
[6] PAL N R, BEZDEK J C. On cluster validity for the fuzzy c-means model[J]. IEEE transactions on fuzzy systems, 1995, 3(3): 370-379.
[7] 郑宏亮, 徐本强, 赵晓慧, 等. 新的模糊聚类有效性指标[J]. 计算机应用, 2014, 34(8): 2166-2169. ZHENG Hongliang, XU Benqiang, ZHAO Xiaohui, et al. Novel validity index for fuzzy clustering[J]. Journal of computer applications, 2014, 34(8): 2166-2169.
[8] 岳士弘, 黄媞, 王鹏龙. 基于矩阵特征值分析的模糊聚类有效性指标[J]. 天津大学学报: 自然科学与工程技术版, 2014, 47(8): 689-696. YUE Shihong, HUANG Ti, WANG Penglong. Matrix eigenvalue analysis-based clustering validity index[J]. Journal of Tianjin university: science and technology, 2014, 47(8): 689-696.
[9] 卿铭, 孙晓梅. 一种新的聚类有效性函数: 模糊划分的模糊熵[J]. 智能系统学报, 2015, 10(1): 75-80. QING Mei, SUN Xiaomei. A new clustering effectiveness function: fuzzy entropy of fuzzy partition[J]. CAAI transactions on intelligent systems, 2015, 10(1): 75-80.
[10] 王开军, 李健, 张军英, 等. 聚类分析中类数估计方法的实验比较[J]. 计算机工程, 2008, 34(9): 198-199, 202. WANG Kaijun, LI Jian, ZHANG Junying, et al. Experimental comparison of clusters number estimation for cluster analysis[J]. Computer engineering, 2008, 34(9): 198-199, 202.
[11] 王勇, 唐靖, 饶勤菲, 等. 高效率的K-means最佳聚类数确定算法[J]. 计算机应用, 2014, 34(5): 1331-1335. WANG Yong, TANG Jing, RAO Qinfei, et al. High efficient K-means algorithm for determining optimal number of clusters[J]. Journal of computer applications, 2014, 34(5): 1331-1335.
[12] CALI?SKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in statistics, 1974, 3(1): 1-27.
[13] DAVIES D L, BOULDIN D W. A cluster separation measure[J]. IEEE transactions on pattern analysis and machine intelligence, 1979, PAMI-1(2): 224-227.
[14] DIMITRIADOU E, DOLNICˇAR S, WEINGESSEL A. An examination of indexes for determining the number of clusters in binary data sets[J]. Psychometrika, 2002, 67(1): 137-159.
[15] KRZANOWSKI W J, LAI Y T. A criterion for determining the number of groups in a data set using sum-of-squares clustering[J]. Biometrics, 1988, 44(1): 23-34.
[16] 周世兵, 徐振源, 唐旭清. K-means算法最佳聚类数确定方法[J]. 计算机应用, 2010, 30(8): 1995-1998. ZHOU Shibing, XU Zhenyuan, TANG Xuqing. Method for determining optimal number of clusters in K-means clustering algorithm[J]. Journal of computer applications, 2010, 30(8): 1995-1998.
[17] KAPP A V, TIBSHIRANI R. Are clusters found in one dataset present in another dataset[J]. Biostatistics, 2007, 8(1): 9-31.
[18] 周世兵. 聚类分析中的最佳聚类数确定方法研究及应用[D]. 无锡: 江南大学, 2011. ZHOU Shibing. Research and application on determining optimal number of cluster in cluster analysis[D]. Wuxi: Jiangnan University, 2011.
[19] Gower J C, Ross G J S. Minimum spanning trees and single linkage cluster analysis[J]. Journal of the royal statistical society, 1969, 18(1):54-64.
[20] MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967: 281-297.

备注/Memo

备注/Memo:
收稿日期:2016-11-7;改回日期:。
基金项目:国家自然科学基金“重点”项目(61532005).
作者简介:冯柳伟,女,1992年生,硕士研究生,研究方向为聚类算法;常冬霞,女,1977年生,副教授,硕士生导师,主要研究方向为进化计算、非监督分类算法、图像分割以及图像分类。发表学术论文10余篇,其中SCI检索5篇,EI检索2篇;邓勇,男,1974年生,副研究员,博士,主要研究方向为智能信息处理、数据库系统技术及应用等。主持和参与国家“863”计划1项,北京市自然科学基金项目1项。发表学术论文20余篇,其中收录10余篇。
通讯作者:常冬霞.E-mail:dxchang@bjtu.edu.cn.
更新日期/Last Update: 1900-01-01