[1]ZHANG Min,ZHOU Zhiping.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI Transactions on Intelligent Systems,2020,15(4):687-696.[doi:10.11992/tis.201911039]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 4
Page number:
687-696
Column:
学术论文—机器学习
Public date:
2020-07-05
- Title:
-
An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation
- Author(s):
-
ZHANG Min1; ZHOU Zhiping1; 2
-
1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China
-
- Keywords:
-
large-scale datasets; metric fusion; landmark representation; relative mass; sparse representation; stacked autoencoder; joint learning; embedded representation
- CLC:
-
TP18
- DOI:
-
10.11992/tis.201911039
- Abstract:
-
Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale similarity matrix storage. Aiming at these problems, this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation. First, instead of random sampling, the concept of relative mass is introduced to evaluate node quality. Based on this, the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation. Meanwhile, considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points, so as to increase the clustering precision. A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition, and the obtained similarity matrix is taken as the autoencoder’s input. The clustering accuracy is further improved by joint learning of embedded representation and clustering. Experiments on five large-scale datasets validate the effectiveness of our algorithm.