[1]张敏,周治平.结合度量融合和地标表示的自编码谱聚类算法[J].智能系统学报,2020,15(4):687-696.[doi:10.11992/tis.201911039]
ZHANG Min,ZHOU Zhiping.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI Transactions on Intelligent Systems,2020,15(4):687-696.[doi:10.11992/tis.201911039]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第4期
页码:
687-696
栏目:
学术论文—机器学习
出版日期:
2020-07-05
- Title:
-
An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation
- 作者:
-
张敏1, 周治平1,2
-
1. 江南大学 物联网工程学院,江苏 无锡 214122;
2. 江南大学 物联网技术应用教育部工程研究中心,江苏 无锡 214122
- Author(s):
-
ZHANG Min1, ZHOU Zhiping1,2
-
1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
大规模数据集; 度量融合; 地标表示; 相对质量; 稀疏表示; 栈式自编码器; 联合学习; 嵌入表示
- Keywords:
-
large-scale datasets; metric fusion; landmark representation; relative mass; sparse representation; stacked autoencoder; joint learning; embedded representation
- 分类号:
-
TP18
- DOI:
-
10.11992/tis.201911039
- 摘要:
-
针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问题,提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估,选取最具代表性的点作为地标点,通过稀疏表示近似获得图相似度矩阵,以降低存储开销。同时考虑到近邻样本的几何分布和拓扑分布的信息,融合欧氏距离与Kendall Tau距离来度量地标点与其他样本之间的相似度,提高聚类精度;以栈式自编码器取代拉普拉斯矩阵特征分解,将所获得的相似度矩阵作为自编码器的输入,通过联合学习嵌入表示和聚类来进一步提高聚类精度。在5个大规模数据集上的实验验证了本文算法的有效性。
- Abstract:
-
Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale similarity matrix storage. Aiming at these problems, this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation. First, instead of random sampling, the concept of relative mass is introduced to evaluate node quality. Based on this, the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation. Meanwhile, considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points, so as to increase the clustering precision. A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition, and the obtained similarity matrix is taken as the autoencoder’s input. The clustering accuracy is further improved by joint learning of embedded representation and clustering. Experiments on five large-scale datasets validate the effectiveness of our algorithm.
更新日期/Last Update:
2020-07-25