[1]王丽娟,丁世飞.一种基于ELM-AE特征表示的谱聚类算法[J].智能系统学报,2021,16(3):560-566.[doi:10.11992/tis.202005021]
WANG Lijuan,DING Shifei.A spectral clustering algorithm based on ELM-AE feature representation[J].CAAI Transactions on Intelligent Systems,2021,16(3):560-566.[doi:10.11992/tis.202005021]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
16
期数:
2021年第3期
页码:
560-566
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2021-05-05
- Title:
-
A spectral clustering algorithm based on ELM-AE feature representation
- 作者:
-
王丽娟1,2, 丁世飞1
-
1. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116;
2. 徐州工业职业技术学院 信息工程学院,江苏 徐州 221114
- Author(s):
-
WANG Lijuan1,2, DING Shifei1
-
1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;
2. School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou 221114, China
-
- 关键词:
-
谱聚类; 特征表示; 极限学习机; 自编码器; 极限学习机自编码器; 机器学习; 聚类分析; 数据挖掘
- Keywords:
-
spectral clustering; feature representation; extreme machine learning; auto-encoder; extreme learning machine as autoencoder; machine learning; clustering analysis; data mining
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202005021
- 摘要:
-
在实际应用中,数据点中包含的冗余特征和异常值(噪声)严重影响了聚类中更显著的特征的发现,大大降低了聚类性能。本文提出了一种基于ELM-AE (extreme learning machine as autoencoder)特征表示的谱聚类算法(spectral clustering via extreme learning machine as autoencoder, SC-ELM-AE)。ELM-AE通过奇异值分解学习源数据主要特征表示,使用输出权值实现从特征空间到原输入数据的重构;再将该特征表示空间作为输入进行谱聚类。实验表明,在5个UCI数据集验证中,SC-ELM-AE算法性能优于传统的K-Means、谱聚类等现有算法,特别是在复杂高维数据集PEMS-SF和TDT2_10上,聚类平均精确度均提高30%以上。
- Abstract:
-
In practice, redundant features and outliers (noise) in data points heavily influence the discovery of more prominent features in clustering and significantly impair clustering performance. In this study, we propose a spectral clustering (SC) based on extreme machine learning as autoencoder (ELM-AE) feature representation (SC-ELM-AE). ELM-AE learns the principal feature representation of the source data via singular value decomposition and uses the output weights to realize reconstruction from feature representation space to the original input data. The reconstructed feature representation space is fed to the SC as input. The experimental results show that the proposed algorithm is 30% more accurate in the average clustering than the conventional K-means, SC, and other existing algorithms in the verification of five UCI datasets, particularly on complex high-dimensional datasets, such as PEMS-SF and TDT2_10.
更新日期/Last Update:
2021-06-25