[1]杜航原,张晶,王文剑.一种深度自监督聚类集成算法[J].智能系统学报,2020,15(6):1113-1120.[doi:10.11992/tis.202006050]
 DU Hangyuan,ZHANG Jing,WANG Wenjian.A deep self-supervised clustering ensemble algorithm[J].CAAI Transactions on Intelligent Systems,2020,15(6):1113-1120.[doi:10.11992/tis.202006050]
点击复制

一种深度自监督聚类集成算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年6期
页码:
1113-1120
栏目:
学术论文—机器学习
出版日期:
2020-11-05

文章信息/Info

Title:
A deep self-supervised clustering ensemble algorithm
作者:
杜航原1 张晶2 王文剑12
1. 山西大学 计算机与信息技术学院, 山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
Author(s):
DU Hangyuan1 ZHANG Jing2 WANG Wenjian12
1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
关键词:
特征空间聚类算法一致性函数图表示相似性度量自监督学习图数据神经网络模型
Keywords:
feature spaceclustering algorithmconsistency functiongraph representationsimilarity measureself-supervised learninggraphical dataneural network model
分类号:
TP391
DOI:
10.11992/tis.202006050
摘要:
针对聚类集成中一致性函数设计问题,本文提出一种深度自监督聚类集成算法。该算法首先根据基聚类划分结果采用加权连通三元组算法计算样本之间的相似度矩阵,基于相似度矩阵表达邻接关系,将基聚类由特征空间中的数据表示变换至图数据表示;在此基础上,基聚类的一致性集成问题被转化为对基聚类图数据表示的图聚类问题。为此,本文利用图神经网络构造自监督聚类集成模型,一方面采用图自动编码器学习图的低维嵌入,依据低维嵌入似然分布估计聚类集成的目标分布;另一方面利用聚类集成目标对低维嵌入过程进行指导,确保模型获得的图低维嵌入与聚类集成结果是一致最优的。在大量数据集上进行了仿真实验,结果表明本文算法相比HGPA、CSPA和MCLA等算法可以进一步提高聚类集成结果的准确性。
Abstract:
In this study, we propose a deep self-supervised clustering ensemble algorithm to obtain the design of a consensus function in a clustering ensemble. In this algorithm, a weighted connected-triple algorithm is applied to the cluster components for estimating the similarity matrix of the samples, based on which the adjacency relation can be determined. Thus, the cluster components can be transformed from data representation in the feature space to graph data representation. On this basis, the consistency integration problem of cluster components is transformed into a graph clustering problem for the graph data representation of cluster components. Further, a graph neural network is used to construct the self-supervised clustering ensemble model. This model uses a graph autoencoder to obtain the low-dimensional embedding of the graph, and the target distribution of the cluster ensemble can be estimated based on the likelihood distribution generated via low-dimensional embedding. The clustering ensemble guides the learning of low-dimensional embedding. The above methods ensure that the low-dimensional embedding and clustering ensemble results obtained by the model are consistent and optimal. Simulation experiments were conducted on a large number of data sets. Results show that the proposed algorithm improves the accuracy of the clustering ensemble result compared with the accuracies obtained using algorithms such as HGPA, CSPA, and MCLA.

参考文献/References:

[1] HAN Jiawei, KAMBER M, PEI Jian. Data mining: concepts and techniques[M]. 3rd ed. Amsterdam: Elsevier, 2012: 223-259.
[2] 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61
SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of software, 2008, 19(1): 48-61
[3] JUDD D, MCKINLEY P K, JAIN A K. Large-scale parallel data clustering[J]. IEEE transactions on pattern analysis and machine intelligence, 1998, 20(8): 871-876.
[4] BHATIA S K, DEOGUN J S. Conceptual clustering in information retrieval[J]. IEEE transactions on systems, man, and cybernetics, part B (cybernetics), 1998, 28(3): 427-436.
[5] FRIGUI H, KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE transactions on pattern analysis and machine intelligence, 1999, 21(5): 450-465.
[6] FERN X Z, LIN Wei. Cluster ensemble selection[J]. Statistical analysis and data mining, 2008, 1(3): 128-141.
[7] 罗会兰. 聚类集成关键技术研究[D]. 杭州: 浙江大学, 2007.
LUO Huilan. Research on key technologies of clustering ensemble[D]. Hangzhou: Zhejiang University, 2007.
[8] FRED A L N. Finding consistent clusters in data partitions[C]//Proceedings of the 2nd International Workshop on Multiple Classifier Systems. Cambridge, UK, 2001: 309-318.
[9] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of machine learning research, 2003, 3: 583-617.
[10] FRED A L N, JAIN A K. Data clustering using evidence Accumulation[C]//Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02). Quebec City, Canada, 2002: 276-280.
[11] WANG Xi, YANG Chunyu, ZHOU Jie. Clustering aggregation by probability accumulation[J]. Pattern recognition, 2009, 42(5): 668-675.
[12] 杨草原, 刘大有, 杨博, 等. 聚类集成方法研究[J]. 计算机科学, 2011, 38(2): 166-170
YANG Caoyuan, LIU Dayou, YANG Bo, et al. Research on cluster aggregation approaches[J]. Computer science, 2011, 38(2): 166-170
[13] ZHOU Zhihua, TANG Wei. Clusterer ensemble[J]. Knowledge-based systems, 2006, 19(1): 77-83.
[14] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on neural networks, 2009, 20(1): 61-80.
[15] WU Z, PAN S, CHEN F. A comprehensive survey on graph neural networks[J]. IEEE transactions on neural networks and learning systems, 2019(02): 4-24.
[16] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008: 1096-1103.
[17] TIAN Fei, GAO Bin, CUI Qing, et al. Learning deep representations for graph clustering[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada, 2014: 1293-1299.
[18] IAM-ON N, BOONGOEN T, GARRETT S. LCE: a link-based cluster ensemble method for improved gene expression data analysis[J]. Bioinformatics, 2010, 26(12): 1513-1519.
[19] WANG Chun, PAN Shirui, HU Ruiqi, et al. Attributed graph clustering: a deep Attentional embedding approach[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China, 2019: 3670-3676.
[20] KIPF T N, WELLING M. Variational graph auto-encoders[J/OL]. Available: http://axrxiv.org/abs/1611.07308.2016.
[21] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of machine learning research, 2008, 9(86): 2579-2605.
[22] XIE J, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[J]. Computer science, 2015: 478-487.
[23] VON LUXBURG U. A tutorial on spectral clustering[J]. Statistics and computing, 2007, 17(4): 395-416.

相似文献/References:

[1]朱 林,王士同,修 宇.鲁棒的模糊方向相似性聚类算法[J].智能系统学报,2008,3(01):43.
 ZHU Lin,WANG Shi-tong,XIU Yu.A robust clustering algorithm with fuzzy directional similarity[J].CAAI Transactions on Intelligent Systems,2008,3(6):43.
[2]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(04):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(6):607.[doi:10.3969/j.issn.1673-4785.201411036]
[3]郭瑛洁,王士同,许小龙.基于最大间隔理论的组合距离学习算法[J].智能系统学报,2015,10(6):843.[doi:10.11992/tis.201504027]
 GUO Yingjie,WANG Shitong,XU Xiaolong.Learning a linear combination of distances based on the maximum-margin theory[J].CAAI Transactions on Intelligent Systems,2015,10(6):843.[doi:10.11992/tis.201504027]
[4]陈爱国,王士同.基于极大熵的知识迁移模糊聚类算法[J].智能系统学报,2017,12(01):95.[doi:10.11992/tis.201602003]
 CHEN Aiguo,WANG Shitong.A maximum entropy-based knowledge transfer fuzzy clustering algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(6):95.[doi:10.11992/tis.201602003]
[5]淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法[J].智能系统学报,2017,12(02):229.[doi:10.11992/tis.201512036]
 GAN Wenyan,LIU Chong.An improved clustering algorithm that searches and finds density peaks[J].CAAI Transactions on Intelligent Systems,2017,12(6):229.[doi:10.11992/tis.201512036]

备注/Memo

备注/Memo:
收稿日期:2020-06-29。
基金项目:国家自然科学基金项目(61902227,61673249,61773247,U1805263);山西省国际合作重点研发计划项目(201903D421050);山西省基础研究计划项目(201901D211192);山西省应用基础研究计划项目(201701D121053);山西省1331工程项目
作者简介:杜航原,副教授,博士,主要研究方向为机器学习、社会网络。主持和参与国家级、省部级科研项目7项。发表学术论文10余篇。;张晶,硕士研究生,主要研究方向为数据挖掘与机器学习。;王文剑,教授,博士生导师,博士,国家自然科学基金委信息学部自动化学科会评专家,中国人工智能学会理事、中国人工智能学会机器学习专委会常务委员、知识工程与分布智能专委会委员、粗糙集与软计算专业委员会委员,中国计算机学会人工智能与模式识别专委会委员,中国计算机学会太原分部监督委员会主席、ACM太原分部副主席,并担任多个国际国内学术会议的程序委员会主席或委员,主要研究方向为机器学习与数据挖掘。主持国家自然科学基金项目4项。发表学术论文150余篇。
通讯作者:王文剑.E-mail:wjwang@sxu.edu.cn
更新日期/Last Update: 2020-12-25