[1]XIE Juanying,ZHOU Ying,WANG Mingzhao,et al.New criteria for evaluating the validity of clustering[J].CAAI Transactions on Intelligent Systems,2017,12(6):873-882.[doi:10.11992/tis.201706029]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
12
Number of periods:
2017 6
Page number:
873-882
Column:
学术论文—人工智能基础
Public date:
2017-12-25
- Title:
-
New criteria for evaluating the validity of clustering
- Author(s):
-
XIE Juanying; ZHOU Ying; WANG Mingzhao; JIANG Weiliang
-
School of Computer Science, Shaanxi Normal University, Xi’an 710062, China
-
- Keywords:
-
clustering; validity of clustering; evaluation index; external criteria; internal criteria; F-measure; Adjusted Rand Index; STDI; S2; PS2
- CLC:
-
TP108
- DOI:
-
10.11992/tis.201706029
- Abstract:
-
There are two kinds of criteria for evaluating the clustering ability of a clustering algorithm, internal and external. The current external evaluation indexes fails to consider the skewed clustering result; it is difficult to get optimum cluster numbers from the clustering validity inspection results from the internal evaluation indexes. Considering the defects in the present internal and external clustering evaluation indices, we propose two external evaluation indexes, which consider both positive and negative information and which are respectively based on the contingency table and sample pairs for the evaluation of clustering results from a dataset with arbitrary distribution. The variance is proposed to measure the tightness of a cluster and the separability between clusters, and the ratio of these parameters is used as an internal evaluation index for the measurement index. Experiments on the datesets from UCI (University of California in Iven) machine learning repository and artificially simulated datasets show that the proposed new internal index can be used to effectively find the truenumber of clusters in a dataset. The proposed external indexes based on the contingency table and sample pairs are a very effective external evaluation indexes and can be used to evaluate the clustering results from existing types of skewed and noisy data.