[1]DU Hangyuan,ZHANG Jing,WANG Wenjian.A deep self-supervised clustering ensemble algorithm[J].CAAI Transactions on Intelligent Systems,2020,15(6):1113-1120.[doi:10.11992/tis.202006050]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 6
Page number:
1113-1120
Column:
学术论文—机器学习
Public date:
2020-11-05
- Title:
-
A deep self-supervised clustering ensemble algorithm
- Author(s):
-
DU Hangyuan1; ZHANG Jing2; WANG Wenjian1; 2
-
1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
-
- Keywords:
-
feature space; clustering algorithm; consistency function; graph representation; similarity measure; self-supervised learning; graphical data; neural network model
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202006050
- Abstract:
-
In this study, we propose a deep self-supervised clustering ensemble algorithm to obtain the design of a consensus function in a clustering ensemble. In this algorithm, a weighted connected-triple algorithm is applied to the cluster components for estimating the similarity matrix of the samples, based on which the adjacency relation can be determined. Thus, the cluster components can be transformed from data representation in the feature space to graph data representation. On this basis, the consistency integration problem of cluster components is transformed into a graph clustering problem for the graph data representation of cluster components. Further, a graph neural network is used to construct the self-supervised clustering ensemble model. This model uses a graph autoencoder to obtain the low-dimensional embedding of the graph, and the target distribution of the cluster ensemble can be estimated based on the likelihood distribution generated via low-dimensional embedding. The clustering ensemble guides the learning of low-dimensional embedding. The above methods ensure that the low-dimensional embedding and clustering ensemble results obtained by the model are consistent and optimal. Simulation experiments were conducted on a large number of data sets. Results show that the proposed algorithm improves the accuracy of the clustering ensemble result compared with the accuracies obtained using algorithms such as HGPA, CSPA, and MCLA.