[1]储德润,周治平.加权PageRank改进地标表示的自编码谱聚类算法[J].智能系统学报,2020,15(2):302-309.[doi:10.11992/tis.201904021]
 CHU Derun,ZHOU Zhiping.An autoencoder spectral clustering algorithm for improving landmark representation by weighted PageRank[J].CAAI Transactions on Intelligent Systems,2020,15(2):302-309.[doi:10.11992/tis.201904021]
点击复制

加权PageRank改进地标表示的自编码谱聚类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年2期
页码:
302-309
栏目:
学术论文—机器学习
出版日期:
2020-07-05

文章信息/Info

Title:
An autoencoder spectral clustering algorithm for improving landmark representation by weighted PageRank
作者:
储德润 周治平
江南大学 物联网技术应用教育部工程研究中心, 江苏 无锡 214122
Author(s):
CHU Derun ZHOU Zhiping
Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China
关键词:
机器学习数据挖掘聚类分析地标点聚类谱聚类加权PageRank自动编码器聚类损失
Keywords:
machine learningdata miningcluster analysislandmark spectral clusteringspectral clusteringweighted pagerankautoencoderclustering loss
分类号:
TP18
DOI:
10.11992/tis.201904021
摘要:
针对传统谱聚类算法在处理大规模数据集时,聚类精度低并且存在相似度矩阵存储开销大和拉普拉斯矩阵特征分解计算复杂度高的问题。提出了一种加权PageRank改进地标表示的自编码谱聚类算法,首先选取数据亲和图中权重最高的节点作为地标点,以选定的地标点与其他数据点之间的相似关系来逼近相似度矩阵作为叠加自动编码器的输入。然后利用聚类损失同时更新自动编码器和聚类中心的参数,从而实现可扩展和精确的聚类。实验表明,在几种典型的数据集上,所提算法与地标点谱聚类算法和深度谱聚类算法相比具有更好的聚类性能。
Abstract:
Several problems, such as low clustering precision, large memory overhead of the similarity matrix, and high computational complexity of the Laplace matrix eigenvalue decomposition, are encountered when using the traditional spectral clustering algorithm to deal with large-scale datasets. To solve these problems, an autoencoder spectral clustering algorithm for improving landmark representation by weighted PageRank is proposed in this study. First, the nodes with the highest weight in the data affinity graph were selected as the landmark points. The similarity matrix was approximated by the similarity relation between the selected ground punctuation points and other data points. The result was further used as the input of the superimposed automatic encoder. At the same time, the parameters of the automatic encoder and cluster center were updated simultaneously using clustering loss. Thus, extensible and accurate clustering can be achieved. The experimental results show that the proposed autoencoder spectral clustering algorithm has better clustering performance than the landmark and depth spectral clustering algorithms on several typical datasets.

参考文献/References:

[1] LI Mu, BI Wei, KWOK J T, et al. Large-scale nystr?m kernel matrix approximation using randomized SVD[J]. IEEE transactions on neural networks and learning systems, 2015, 26(1): 152-164.
[2] 赵晓晓, 周治平. 结合稀疏表示与约束传递的半监督谱聚类算法[J]. 智能系统学报, 2018, 13(5): 855-863
ZHAO Xiaoxiao, ZHOU Zhiping. A semi-supervised spectral clustering algorithm combined with sparse representation and constraint propagation[J]. CAAI transactions on intelligent systems, 2018, 13(5): 855-863
[3] DING Shifei, JIA Hongjie, DU Mingjing, et al. A semi-supervised approximate spectral clustering algorithm based on HMRF model[J]. Information sciences, 2018, 429: 215-228.
[4] HE Li, RAY N, GUAN Yisheng, et al. Fast large-scale spectral clustering via explicit feature mapping[J]. IEEE transactions on cybernetics, 2019, 49(3): 1058-1071.
[5] 林大华, 杨利锋, 邓振云, 等. 稀疏样本自表达子空间聚类算法[J]. 智能系统学报, 2016, 11(5): 696-702
LIN Dahua, YANG Lifeng, DENG Zhenyun, et al. Sparse sample self-representation for subspace clustering[J]. CAAI transactions on intelligent systems, 2016, 11(5): 696-702
[6] YANG Xiaojun, YU Weizhong, WANG Rong, et al. Fast spectral clustering learning with hierarchical bipartite graph for large-scale data[J/OL]. Pattern recognition letters: (2018-06-22). https://www.sciencedirect.com/science/article/abs/pii/S016786551830271X. DOI: 10.1016/J.PATREC.2018.06.024.
[7] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[8] SHAHAM U, STANTON K, LI H, et al. SpectralNet: spectral clustering using deep neural networks[J]. arXiv:1801.01587, 2018.
[9] TIAN Fei, GAO Bin, CUI Qing, et al. Learning deep representations for graph clustering[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. Québec City, Canada, 2014: 1293?1299.
[10] SHAO Ming, LI Sheng, DING Zhengming, et al. Deep linear coding for fast graph clustering[C]//Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina, 2015: 3798?3804.
[11] SONG Chunfeng, LIU Feng, HUANG Yongzhen, et al. Auto-encoder based data clustering[C]//Proceedings of the 18th Iberoamerican Congress on Pattern Recognition. Havana, Cuba, 2013: 117?124.
[12] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the web. SIDL?WP?1999?0120[R]. Technical Report, California: Stanford Digital Libraries, 1999.
[13] XING W, GHORBANI A. Weighted PageRank algorithm[C]//Proceedings of Second Annual Conference on Communication Networks and Services Research. Fredericton, Canada, 2004: 305?314.
[14] CHEN Xinlei, CAI Deng. Large scale spectral clustering with landmark-based representation[C]//Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. San Francisco, USA, 2011: 313?318.
[15] CAI Deng, CHEN Xinlei. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE transactions on cybernetics, 2015, 45(8): 1669-1680.
[16] BENGIO Y, COURVILLE A, VINCENT P. Representation learning: a review and new perspectives[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8): 1798-1828.
[17] XIE Junyuan, GIRSHICK R B, FARHADI A. Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA, 2016: 478?487.
[18] LI Mu, ZHANG Tong, CHEN Yuqiang, et al. Efficient mini-batch training for stochastic optimization[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2014: 661?670.
[19] LI Mu, KWOK J T, LU Baoliang. Making large-scale Nystr?m approximation possible[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, 2010: 631?638.
[20] CHEN W Y, SONG Yangqiu, Bai Hongjie, et al. Parallel spectral clustering in distributed systems[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 33(3): 568-586.

相似文献/References:

[1]张继福,张素兰,胡立华.约束概念格及其构造方法[J].智能系统学报,2006,1(02):31.
 ZHANG Ji-fu,ZHANG Su-lan,HU Li-hua.Constrained concept lattice and its construction method[J].CAAI Transactions on Intelligent Systems,2006,1(2):31.
[2]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(2):148.
[3]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,2(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2(2):45.
[4]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报,2007,2(04):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2(2):9.
[5]王国胤,张清华,胡 军.粒计算研究综述[J].智能系统学报,2007,2(06):8.
 WANG Guo-yin,ZHANG Qing-hua,HU Jun.An overview of granular computing[J].CAAI Transactions on Intelligent Systems,2007,2(2):8.
[6]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(05):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6(2):396.
[7]何清.物联网与数据挖掘云服务[J].智能系统学报,2012,7(03):189.
 HE Qing.The Internet of things and the data mining cloud service[J].CAAI Transactions on Intelligent Systems,2012,7(2):189.
[8]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(2):95.[doi:10.3969/j.issn.1673-4785.201208012]
[9]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(2):183.[doi:10.3969/j.issn.1673-4785.201209056]
[10]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(03):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8(2):261.
[11]储德润,周治平.公理化模糊共享近邻自适应谱聚类算法[J].智能系统学报,2019,14(05):897.[doi:10.11992/tis.201810002]
 CHU Derun,ZHOU Zhiping.Shared nearest neighbor adaptive spectral clustering algorithm based on axiomatic fuzzy set theory[J].CAAI Transactions on Intelligent Systems,2019,14(2):897.[doi:10.11992/tis.201810002]
[12]严远亭,吴亚亚,赵姝,等.构造性覆盖下不完整数据修正填充方法[J].智能系统学报,2019,14(06):1225.[doi:10.11992/tis.201906015]
 YAN Yuanting,WU Yaya,ZHAO Shu,et al.Improving missing data recovery with a constructive covering algorithm[J].CAAI Transactions on Intelligent Systems,2019,14(2):1225.[doi:10.11992/tis.201906015]

备注/Memo

备注/Memo:
收稿日期:2019-04-09。
作者简介:储德润,硕士研究生,主要研究方向为数据挖掘;周治平,教授,博士,主要研究方向为智能检测、网络安全,发表学术论文20余篇
通讯作者:储德润.E-mail:CDR0727@163.com
更新日期/Last Update: 1900-01-01