[1]张敏,周治平.结合度量融合和地标表示的自编码谱聚类算法[J].智能系统学报,2020,15(4):687-696.[doi:10.11992/tis.201911039]
 ZHANG Min,ZHOU Zhiping.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI Transactions on Intelligent Systems,2020,15(4):687-696.[doi:10.11992/tis.201911039]
点击复制

结合度量融合和地标表示的自编码谱聚类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年4期
页码:
687-696
栏目:
学术论文—机器学习
出版日期:
2020-07-05

文章信息/Info

Title:
An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation
作者:
张敏1 周治平12
1. 江南大学 物联网工程学院,江苏 无锡 214122;
2. 江南大学 物联网技术应用教育部工程研究中心,江苏 无锡 214122
Author(s):
ZHANG Min1 ZHOU Zhiping12
1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China
关键词:
大规模数据集度量融合地标表示相对质量稀疏表示栈式自编码器联合学习嵌入表示
Keywords:
large-scale datasetsmetric fusionlandmark representationrelative masssparse representationstacked autoencoderjoint learningembedded representation
分类号:
TP18
DOI:
10.11992/tis.201911039
摘要:
针对大多数现有谱聚类算法处理大规模数据集时面临聚类精度低、大规模相似度矩阵存储开销大的问题,提出一种结合度量融合和地标表示的自编码谱聚类算法。引入相对质量概念进行节点评估,选取最具代表性的点作为地标点,通过稀疏表示近似获得图相似度矩阵,以降低存储开销。同时考虑到近邻样本的几何分布和拓扑分布的信息,融合欧氏距离与Kendall Tau距离来度量地标点与其他样本之间的相似度,提高聚类精度;以栈式自编码器取代拉普拉斯矩阵特征分解,将所获得的相似度矩阵作为自编码器的输入,通过联合学习嵌入表示和聚类来进一步提高聚类精度。在5个大规模数据集上的实验验证了本文算法的有效性。
Abstract:
Most existing spectral clustering algorithms are faced with low clustering accuracy and costly large-scale similarity matrix storage. Aiming at these problems, this paper proposes an autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation. First, instead of random sampling, the concept of relative mass is introduced to evaluate node quality. Based on this, the most representative nodes are selected as the landmark points and the graph similarity matrix is approximately obtained by sparse representation. Meanwhile, considering the geometric and topological distribution of the nearest neighbor samples,the Euclidean distance and Kendall Tau distance are integrated to measure the similarity between the landmarks and the other points, so as to increase the clustering precision. A stacked autoencoder is then used to replace the Laplace matrix eigen-decomposition, and the obtained similarity matrix is taken as the autoencoder’s input. The clustering accuracy is further improved by joint learning of embedded representation and clustering. Experiments on five large-scale datasets validate the effectiveness of our algorithm.

参考文献/References:

[1] WANG Lijuan, DING Shifei, JIA Hongjie. An improvement of spectral clustering via message passing and density sensitive similarity[J]. IEEE access, 2019, 7: 101054-101062.
[2] LI Xinning, ZHAO Xiaoxiao, CHU Derun, et al. An autoencoder-based spectral clustering algorithm[J]. Soft computing, 2020, 24(3): 1661-1671.
[3] 王一宾, 李田力, 程玉胜. 结合谱聚类的标记分布学习[J]. 智能系统学报, 2019, 14(5): 966-973
WANG Yibin, LI Tianli, CHENG Yusheng. Label distribution learning based on spectral clustering[J]. CAAI transactions on intelligent systems, 2019, 14(5): 966-973
[4] 赵晓晓, 周治平. 结合稀疏表示与约束传递的半监督谱聚类算法[J]. 智能系统学报, 2018, 13(5): 855-863
ZHAO Xiaoxiao, ZHOU Zhiping. A semi-supervised spectral clustering algorithm combined with sparse representation and constraint propagation[J]. CAAI transactions on intelligent systems, 2018, 13(5): 855-863
[5] LANGONE R, SUYKENS J A K. Fast kernel spectral clustering[J]. Neurocomputing, 2017, 268: 27-33.
[6] ZHAN Qiang, MAO Yu. Improved spectral clustering based on Nystr?m method[J]. Multimedia tools and applications, 2017, 76(19): 20149-20165.
[7] YANG Xiaojun, YU Weizhong, WANG Rong, et al. Fast spectral clustering learning with hierarchical bipartite graph for large-scale data[J]. Pattern recognition letters, 2020, 130(2): 345-352.
[8] CHEN Xinlei, CAI Deng. Large scale spectral clustering with landmark-based representation[C]//Proceedings of the 24th AAAI Conference on Artificial Intelligence. San Francisco, USA, 2011: 313-318.
[9] CAI Deng, CHEN Xinlei. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE trans cybern, 2015, 45(8): 1669-1680.
[10] 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2): 278-284
YE Mao, LIU Wenfen. Large scale spectral clustering based on fast landmark sampling[J]. Journal of electronics and information technology, 2017, 39(2): 278-284
[11] ZHANG Xianchao, ZONG Linlin, YOU Quanzeng, et al. Sampling for Nystr?m extension-based spectral clustering: incremental perspective and novel analysis[J]. ACM transactions on knowledge discovery from data, 2016, 11(1): 1-25.
[12] 邓思宇, 刘福伦, 黄雨婷, 等. 基于PageRank的主动学习算法[J]. 智能系统学报, 2019, 14(3): 551-559
DENG Siyu, LIU Fulun, HUANG Yuting, et al. Active learning through PageRank[J]. CAAI transactions on intelligent systems, 2019, 14(3): 551-559
[13] RAFAILID D, CONSTANTINOU E, MANOLOPOULOS Y. Landmark selection for spectral clustering based on weighted PageRank[J]. Future generation computer systems, 2017, 68: 465-472.
[14] LIU Li, SUN Letian, CHEN Shiping, et al. K-PRSCAN: A clustering method based on PageRank[J]. Neurocomputing, 2016, 175: 65-80.
[15] JIA Hongjie, DING Shifei, DU Mingjing, et al. Approximate normalized cuts without eigen-decomposition[J]. Information sciences, 2016, 374: 135-150.
[16] TIAN Fei, GAO Bin, CUI Qing, et al. Learning deep representations for graph clustering[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec, Canada, 2014: 1293-1299.
[17] BANIJAMALI E, GHODSI A. Fast spectral clustering using autoencoders and landmarks[C]//Proceedings of International Conference Image Analysis and Recognition. Montreal, Canada, 2017: 380-388.
[18] 光俊叶, 邵伟, 孙亮, 等. 基于融合欧氏距离与Kendall Tau距离度量的谱聚类算法[J]. 控制理论与应用, 2017, 34(6): 783-789
GUANG Junye, SHAO Wei, SUN Liang, et al. Spectral clustering with mixed Euclidean and Kendall Tau metrics[J]. Control theory & applications, 2017, 34(6): 783-789
[19] WEI Kai, TIAN Pingfang, GU Jingguang, et al. RDF data assessment based on metrics and improved PageRank algorithm[C]//Proceedings of International Conference on Database Systems for Advanced Applications. Suzhou, China, 2017: 204-212.
[20] 谢娟英, 丁丽娟. 完全自适应的谱聚类算法[J]. 电子学报, 2019, 47(5): 1000-1008
XIE Juanying, DING Lijuan. The true self-adaptive spectral clustering algorithms[J]. Acta electronica sinica, 2019, 47(5): 1000-1008
[21] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]//Proceedings of Neural Information Processing Systems 14, NIPS 2001. Vancouver, British Columbia, Canada, 2002: 849-856.
[22] XIE Juanying, ZHOU Ying, DING Lijuan. Local standard deviation spectral clustering[C]// Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp). Shanghai, China, 2018: 242-250.
[23] WANG Bo, JIANG Jiayan, WANG Wei, et al. Unsupervised metric fusion by cross diffusion[C]//Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, Rhode Island, 2012: 2997-3004.
[24] COIFMAN R R, LAFON S. Diffusion maps[J]. Applied and computational harmonic analysis, 2006, 21(1): 5-30.
[25] XIE Junyuan, GIRSHICK R B, FARHADI A. Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33nd International Conference on Machine Learning. New York, USA, 2016: 478-487.

相似文献/References:

[1]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(04):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(4):607.[doi:10.3969/j.issn.1673-4785.201411036]
[2]李滔,王士同.适合大规模数据集的增量式模糊聚类算法[J].智能系统学报,2016,11(2):188.[doi:10.11992/tis.201507013]
 LI Tao,WANG Shitong.Incremental fuzzy (c+p)-means clustering for large data[J].CAAI Transactions on Intelligent Systems,2016,11(4):188.[doi:10.11992/tis.201507013]
[3]杨梦铎,栾咏红,刘文军,等.基于自编码器的特征迁移算法[J].智能系统学报,2017,12(06):894.[doi:10.11992/tis.201706037]
 YANG Mengduo,LUAN Yonghong,LIU Wenjun,et al.Feature transfer algorithm based on an auto-encoder[J].CAAI Transactions on Intelligent Systems,2017,12(4):894.[doi:10.11992/tis.201706037]
[4]史荧中,王士同,邓赵红,等.基于核心向量机的多任务概念漂移数据快速分类[J].智能系统学报,2018,13(06):935.[doi:10.11992/tis.201712019]
 SHI Yingzhong,WANG Shitong,DENG Zhaohong,et al.The core vector machine-based rapid classification of multi-task concept drift dataset[J].CAAI Transactions on Intelligent Systems,2018,13(4):935.[doi:10.11992/tis.201712019]

备注/Memo

备注/Memo:
收稿日期:2019-12-02。
作者简介:张敏,硕士研究生,主要研究方向为数据挖掘;周治平,教授,博士,主要研究方向为智能检测、网络安全。发表学术论文80余篇
通讯作者:张敏.E-mail:15061882373_1@163.com
更新日期/Last Update: 2020-07-25