[1]王丽娟,丁世飞.一种基于ELM-AE特征表示的谱聚类算法[J].智能系统学报,2021,16(3):560-566.[doi:10.11992/tis.202005021]
 WANG Lijuan,DING Shifei.A spectral clustering algorithm based on ELM-AE feature representation[J].CAAI Transactions on Intelligent Systems,2021,16(3):560-566.[doi:10.11992/tis.202005021]
点击复制

一种基于ELM-AE特征表示的谱聚类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年3期
页码:
560-566
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2021-05-05

文章信息/Info

Title:
A spectral clustering algorithm based on ELM-AE feature representation
作者:
王丽娟12 丁世飞1
1. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116;
2. 徐州工业职业技术学院 信息工程学院,江苏 徐州 221114
Author(s):
WANG Lijuan12 DING Shifei1
1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;
2. School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou 221114, China
关键词:
谱聚类特征表示极限学习机自编码器极限学习机自编码器机器学习聚类分析数据挖掘
Keywords:
spectral clusteringfeature representationextreme machine learningauto-encoderextreme learning machine as autoencodermachine learningclustering analysisdata mining
分类号:
TP391
DOI:
10.11992/tis.202005021
摘要:
在实际应用中,数据点中包含的冗余特征和异常值(噪声)严重影响了聚类中更显著的特征的发现,大大降低了聚类性能。本文提出了一种基于ELM-AE (extreme learning machine as autoencoder)特征表示的谱聚类算法(spectral clustering via extreme learning machine as autoencoder, SC-ELM-AE)。ELM-AE通过奇异值分解学习源数据主要特征表示,使用输出权值实现从特征空间到原输入数据的重构;再将该特征表示空间作为输入进行谱聚类。实验表明,在5个UCI数据集验证中,SC-ELM-AE算法性能优于传统的K-Means、谱聚类等现有算法,特别是在复杂高维数据集PEMS-SF和TDT2_10上,聚类平均精确度均提高30%以上。
Abstract:
In practice, redundant features and outliers (noise) in data points heavily influence the discovery of more prominent features in clustering and significantly impair clustering performance. In this study, we propose a spectral clustering (SC) based on extreme machine learning as autoencoder (ELM-AE) feature representation (SC-ELM-AE). ELM-AE learns the principal feature representation of the source data via singular value decomposition and uses the output weights to realize reconstruction from feature representation space to the original input data. The reconstructed feature representation space is fed to the SC as input. The experimental results show that the proposed algorithm is 30% more accurate in the average clustering than the conventional K-means, SC, and other existing algorithms in the verification of five UCI datasets, particularly on complex high-dimensional datasets, such as PEMS-SF and TDT2_10.

参考文献/References:

[1] BERKHIN P. A survey of clustering data mining techniques[M]//KOGAN J, NICHOLAS C, TEBOULLE M. Grouping Multidimensional Data. Berlin, Heidelberg:Springer, 2006:25-71.
[2] 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1):48-61
SUN Jigui, LIU Jie, ZHAO Lianyu. Clustering algorithms research[J]. Journal of software, 2008, 19(1):48-61
[3] 刘兵. Web数据挖掘[M]. 俞勇, 薛贵荣, 韩定一, 译. 北京:清华大学出版社, 2011.
[4] WU Junjie, LIU Hongfu, XIONG Hui, et al. K-means-based consensus clustering:a unified view[J]. IEEE transactions on knowledge and data engineering, 2015, 27(1):155-169.
[5] WANG Yangtao, CHEN Lihui. Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources[J]. Expert systems with applications, 2017, 72:457-466.
[6] VAN LUXBURG U. A tutorial on spectral clustering[J]. Statistics and computing, 2007, 17(4):395-416.
[7] JIA Hongjie, DING Shifei, XU Xinzheng, et al. The latest research progress on spectral clustering[J]. Neural computing and applications, 2014, 24(7/8):1477-1486.
[8] 蔡晓妍, 戴冠中, 杨黎斌. 谱聚类算法综述[J]. 计算机科学, 2008, 35(7):14-18
CAI Xiaoyan, DAI Guanzhong, YANG Libin. Survey on spectral clustering algorithms[J]. Computer science, 2008, 35(7):14-18
[9] HUANG Guangbin, CHEN Lei, SIEW C K. Universal approximation using incremental constructive feedforward networks with random hidden nodes[J]. IEEE transactions on neural networks, 2006, 17(4):879-892.
[10] ZHANG Rui, LAN Yuan, HUANG Guangbin, et al. Universal approximation of extreme learning machine with adaptive growth of hidden nodes[J]. IEEE transactions on neural networks and learning systems, 2012, 23(2):365-371.
[11] HUANG Guangbin, ZHOU Hongming, DING Xiaojian, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE transactions on systems, man, and cybernetics, part B (cybernetics), 2012, 42(2):513-529.
[12] DA SILVA B L S, INABA F K, SALLES E O T, et al. Outlier Robust Extreme Machine Learning for multi-target regression[J]. Expert systems with applications, 2020, 140:112877.
[13] ZENG Yijie, LI Yue, CHEN Jichao, et al. ELM embedded discriminative dictionary learning for image classification[J]. Neural networks, 2020, 123:331-342.
[14] WU Chao, LI Yaqian, ZHAO Zhibiao, et al. Extreme learning machine with multi-structure and auto encoding receptive fields for image classification[J]. Multidimensional systems and signal processing, 2020, 31(4):1277-1298.
[15] BARTLETT P L. The sample complexity of pattern classification with neural networks:the size of the weights is more important than the size of the network[J]. IEEE transactions on information theory, 1998, 44(2):525-536.
[16] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[17] BENGIO Y, YAO Li, ALAIN G, et al. Generalized denoising auto-encoders as generative models[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2013:899-907.
[18] BALDI P. Autoencoders, unsupervised learning and deep architectures[C]//Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop. Washington, USA:JMLR. org, 2011:37-50.
[19] 袁非牛, 章琳, 史劲亭, 等. 自编码神经网络理论及应用综述[J]. 计算机学报, 2019, 42(1):203-230
YUAN Feiniu, ZHANG Lin, SHI Jinting, et al. Theories and applications of auto-encoder neural networks:a literature survey[J]. Chinese journal of computers, 2019, 42(1):203-230
[20] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion[J]. Journal of machine learning research, 2010, 11(12):3371-3408.
[21] 刘帅师, 程曦, 郭文燕, 等. 深度学习方法研究新进展[J]. 智能系统学报, 2016, 11(5):567-577
LIU Shuaishi, CHENG Xi, GUO Wenyan, et al. Progress report on new research in deep learning[J]. CAAI Transactions on intelligent systems, 2016, 11(5):567-577
[22] 李建元, 周脚根, 关佶红, 等. 谱图聚类算法研究进展[J]. 智能系统学报, 2011, 6(5):405-414
LI Jianyuan, ZHOU Jiaogen, GUAN Jihong, et al. A survey of clustering algorithms based on spectra of graphs[J]. CAAI transactions on intelligent systems, 2011, 6(5):405-414
[23] FILIPPONE M, CAMASTRA F, MASULLI F, et al. A survey of kernel and spectral methods for clustering[J]. Pattern recognition, 2008, 41(1):176-190.
[24] NG A Y, JORDAN M I, WEISS Y. On spectral clustering:analysis and an algorithm[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Vancouver, British Columbia, Canada:MIT Press, 2001:849-856.
[25] KASUN L L C, ZHOU H, HUANG G B, et al. Representational learning with extreme learning machine for big data[J]. IEEE intelligent systems, 2013, 28(6):31-34.
[26] DING Shifei, ZHANG Nan, ZHANG Jian, et al. Unsupervised extreme learning machine with representational features[J]. International journal of machine learning and cybernetics, 2017, 8(2):587-595.

相似文献/References:

[1]汪 中,刘贵全,陈恩红.基于模糊K-harmonic means的谱聚类算法[J].智能系统学报,2009,4(02):95.
 WANG Zhong,LIU Gui-quan,CHEN En-hong.A spectral clustering algorithm based on fuzzy Kharmonic means[J].CAAI Transactions on Intelligent Systems,2009,4(3):95.
[2]张伟伟,薄 华,王晓峰.多特征-谱聚类的SAR图像溢油分割[J].智能系统学报,2010,5(06):551.
 ZHANG Wei-wei,BO Hua,WANG Xiao-feng.SAR oil spill image segmentationbased on a multispectral clustering algorithm[J].CAAI Transactions on Intelligent Systems,2010,5(3):551.
[3]辛雨璇,闫子飞.基于手绘草图的图像检索技术研究进展[J].智能系统学报,2015,10(02):167.[doi:10.3969/j.issn.1673-4785.201401045]
 XIN Yuxuan,YAN Zifei.Research progress of image retrieval based on hand-drawn sketches[J].CAAI Transactions on Intelligent Systems,2015,10(3):167.[doi:10.3969/j.issn.1673-4785.201401045]
[4]李海林,梁叶.分段聚合近似和数值导数的动态时间弯曲方法[J].智能系统学报,2016,11(2):249.[doi:10.11992/tis.201507064]
 LI Hailin,LIANG Ye.Dynamic time warping based on piecewise aggregate approximation and data derivatives[J].CAAI Transactions on Intelligent Systems,2016,11(3):249.[doi:10.11992/tis.201507064]
[5]林大华,杨利锋,邓振云,等.稀疏样本自表达子空间聚类算法[J].智能系统学报,2016,11(5):696.[doi:10.11992/tis.201601005]
 LIN Dahua,YANG Lifeng,DENG Zhenyun,et al.Sparse sample self-representation for subspace clustering[J].CAAI Transactions on Intelligent Systems,2016,11(3):696.[doi:10.11992/tis.201601005]
[6]赵晓晓,周治平.结合稀疏表示与约束传递的半监督谱聚类算法[J].智能系统学报,2018,13(05):855.[doi:10.11992/tis.201703013]
 ZHAO Xiaoxiao,ZHOU Zhiping.A semi-supervised spectral clustering algorithm combined with sparse representation and constraint propagation[J].CAAI Transactions on Intelligent Systems,2018,13(3):855.[doi:10.11992/tis.201703013]
[7]储德润,周治平.公理化模糊共享近邻自适应谱聚类算法[J].智能系统学报,2019,14(5):897.[doi:10.11992/tis.201810002]
 CHU Derun,ZHOU Zhiping.Shared nearest neighbor adaptive spectral clustering algorithm based on axiomatic fuzzy set theory[J].CAAI Transactions on Intelligent Systems,2019,14(3):897.[doi:10.11992/tis.201810002]
[8]王一宾,李田力,程玉胜.结合谱聚类的标记分布学习[J].智能系统学报,2019,14(5):966.[doi:10.11992/tis.201809019]
 WANG Yibin,LI Tianli,CHENG Yusheng.Label distribution learning based on spectral clustering[J].CAAI Transactions on Intelligent Systems,2019,14(3):966.[doi:10.11992/tis.201809019]
[9]储德润,周治平.加权PageRank改进地标表示的自编码谱聚类算法[J].智能系统学报,2020,15(2):302.[doi:10.11992/tis.201904021]
 CHU Derun,ZHOU Zhiping.An autoencoder spectral clustering algorithm for improving landmark representation by weighted PageRank[J].CAAI Transactions on Intelligent Systems,2020,15(3):302.[doi:10.11992/tis.201904021]
[10]刘金平,周嘉铭,贺俊宾,等.面向不均衡数据的融合谱聚类的自适应过采样法[J].智能系统学报,2020,15(4):732.[doi:10.11992/tis.201909062]
 LIU Jinping,ZHOU Jiaming,HE Junbin,et al.Spectral clustering-fused adaptive synthetic oversampling approach for imbalanced data processing[J].CAAI Transactions on Intelligent Systems,2020,15(3):732.[doi:10.11992/tis.201909062]

备注/Memo

备注/Memo:
收稿日期:2020-05-17。
基金项目:国家自然科学基金项目(61672522,61976216);江苏省高校哲学社会科学研究项目(2019SJA1013);江苏高校 “青蓝工程”
作者简介:王丽娟,副教授,博士研究生,CCF会员,主要研究方向为机器学习、聚类分析;丁世飞,教授,博士生导师,博士,CCF杰出会员,第八届吴文俊人工智能科学技术奖获得者,主要研究方向为人工智能与模式识别,机器学习与数据挖掘。主持国家重点基础研究计划课题1项、国家自然科学基金面上项目3项。出版专著5部,发表学术论文200余篇
通讯作者:丁世飞.E-mail:dingsf@cumt.edu.cn
更新日期/Last Update: 2021-06-25