<-上一篇/Previous Article 下一篇/Next Article->

[1]毕志臻,杨德刚,冯骥.面向超大规模数据的自适应谱聚类算法[J].智能系统学报,2023,18(2):251-259.[doi:10.11992/tis.202110038]
　BI Zhizhen,YANG Degang,FENG Ji.Self-adaptive spectral clustering algorithm for ultra-large-scale data[J].CAAI Transactions on Intelligent Systems,2023,18(2):251-259.[doi:10.11992/tis.202110038]

点击复制

面向超大规模数据的自适应谱聚类算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第2期页码: 251-259 栏目: 学术论文—机器学习出版日期: 2023-05-05

Title:: Self-adaptive spectral clustering algorithm for ultra-large-scale data

作者:: 毕志臻¹, 杨德刚^1,2, 冯骥^1,2; 1. 重庆师范大学计算机与信息科学学院，重庆 401331;
2. 重庆师范大学教育大数据智能感知与应用重庆市工程研究中心，重庆 401331

Author(s):: BI Zhizhen¹, YANG Degang^1,2, FENG Ji^1,2; 1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;
2. Chongqing Engineering Research Center of Educational Big Data Intelligent Perception and Application, Chongqing Normal University, Chongqing 401331, China

关键词:: 数据聚类; 超大规模; 近似自然近邻; 谱聚类; 自然邻居; 二部图; 自适应; 无参数

Keywords:: data clustering; ultra-scalable; approximate natural neighbor; spectral clustering; natural neighbor; bipartite graph; adaptive; no parameter

分类号:: TP311

DOI:: 10.11992/tis.202110038

摘要:: 针对超大规模数据聚类过程中人为设定邻域参数及计算量庞大等问题，提出了一种基于近似自然近邻的自适应超大规模谱聚类算法(approximate natural nearest neighbor based self-adaptive ultra-scalable spectral clustering algorithm, AN³-SUSC)。该算法首先通过混合代表选取缩小数据规模，在此基础上利用近似自然近邻自适应地确定局部邻域参数并构建相似矩阵，最后运用二部图进行迁移分割将数据空间映射到原超大规模数据空间中并完成谱聚类分析。超大规模数据集实验结果表明，该算法对超大规模数据集聚类效果有所提升，并且降低计算规模同时具有较高的鲁棒性和较强的自适应性。

Abstract:: An approximate natural neighbor-based self-adaptive ultra-scalable spectral clustering algorithm (AN³-SUSC) is proposed to address the problems of artificially set neighborhood parameters and huge calculation amounts in the process of super-large-scale data clustering. First, the data size is reduced by the algorithm through mixed random selection. Then, approximate natural neighbors are used to determine local neighborhood parameters adaptively, and a similarity matrix is constructed. Finally, the bipartite graph is utilized for migration and segmentation to map the data space to the original ultra-large-scale data space, thereby completing the spectral clustering analysis. Experimental results on super-large-scale data sets show that the algorithm improves the clustering effect of super-large-scale data sets and reduces the computational scale while having high robustness and strong adaptability.

参考文献/References:: [1] HARTIGAN J A, WONG M A. A K-means clustering algorithm[J]. Journal of the royal statistical society:series C (applied statistics), 1979, 28(1): 100–108.
[2] B?CKLUND H, HEDBLOM A, NEIJMAN N. A density-based spatial clustering of application with noise[J]. Data mining TNM033, 2011, 33: 11–30.
[3] KARYPIS G, HAN E H, KUMAR V. Chameleon: hierarchical clustering using dynamic modeling[J]. Computer, 1999, 32(8): 68–75.
[4] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]//Advances in neural information processing systems. Vancouver: MIT Press, 2002: 849?856.
[5] WANG Liang, BEZDEK J C, LECKIE C, et al. Selective sampling for approximate clustering of very large data sets[J]. International journal of intelligent systems, 2008, 23(3): 313–331.
[6] CHEN Wenyen, SONG Yangqiu, BAI Hongjie, et al. Parallel spectral clustering in distributed systems[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 33(3): 568–586.
[7] CAI Deng, CHEN Xinlei. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE transactions on cybernetics, 2015, 45(8): 1669–1680.
[8] WU Lingfei, CHEN Pinyu, YEN I E H, et al. Scalable spectral clustering using random binning features[C]//KDD ’18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2018: 2506?2515.
[9] YANG Libo, LIU Xuemei, NIE Feiping, et al. Large-scale spectral clustering based on representative points[J]. Mathematical problems in engineering, 2019, 2019: 5864020.
[10] 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(02): 278–284
YE Mao, LIU Wenfen. A large-scale spectral clustering algorithm based on fast landmark sampling[J]. Journal of electronics and information, 2017, 39(02): 278–284
[11] HUANG Dong, WANG Changdong, WU Jiansheng, et al. Ultra-scalable spectral clustering and ensemble clustering[J]. IEEE transactions on knowledge and data engineering, 2020, 32(6): 1212–1226.
[12] ZHU Qingsheng, FENG Ji, HUANG Jinlong. Natural neighbor: a self-adaptive neighborhood method without parameter K[J]. Pattern recognition letters, 2016, 80: 30–36.
[13] 冯骥. 自然邻居思想概念及其在数据挖掘领域的应用[D]. 重庆: 重庆大学, 2016: 25?28.
FENG Ji. Natural neighbor: the concepts and applications in data mining[D]. Chongqing: Chongqing University, 2016: 25?28.
[14] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Natural neighbor-based clustering algorithm with local representatives[J]. Knowledge-based systems, 2017, 123: 238–253.
[15] 朱庆生, 陈治, 张程. 基于自然邻居流形排序图像检索技术研究[J]. 计算机应用研究, 2016, 33(04): 1265–1268+1276
ZHU Qingsheng, CHEN Zhi, ZHANG Cheng. Research on image retrieval techniques based on natural neighbor stream shape sorting[J]. Computer application research, 2016, 33(04): 1265–1268+1276
[16] 张忠林, 赵昱, 闫光辉. 自然邻居密度极值聚类算法[J]. 计算机工程与应用, 2021, 57(23): 200–210
ZHANG Zhonglin, ZHAO Yu, YAN Guanghui. Natural neighborhood density extreme value clustering algorithm[J]. Computer engineering and applications, 2021, 57(23): 200–210
[17] 冯骥, 冉瑞生, 魏延. 基于自然邻居邻域图的无参数离群检测算法[J]. 智能系统学报, 2019, 14(5): 998–1006
FENG Ji, RAN Ruisheng, WEI Yan. A parameter-free outlier detection algorithm based on natural neighborhood graph[J]. CAAI transactions on intelligent systems, 2019, 14(5): 998–1006
[18] YUAN Mengshi, ZHU Qingsheng. Spectral clustering algorithm based on fast search of natural neighbors[J]. IEEE access, 2020, 8: 67277–67288.
[19] ZHANG Yuru, DING Shifei, WANG Yanru, et al. Chameleon algorithm based on improved natural neighbor graph generating sub-clusters[J]. Applied intelligence, 2021, 51(11): 8399–8415.
[20] YU Shi. Multiclass spectral clustering[C]//Proceedings Ninth IEEE International Conference on Computer Vision. Nice: IEEE, 2003: 313?319.
[21] LI Zhenguo, WU Xiaoming, CHANG S F. Segmentation using superpixels: a bipartite graph partitioning approach[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 789?796.
[22] COUR T, BENEZIT F, SHI J. Spectral segmentation with multiscale graph decomposition[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005: 1124?1131.
[23] FERN X Z, BRODLEY C E. Solving cluster ensemble problems by bipartite graph partitioning[C]//ICML’04: Proceedings of the Twenty-first International Conference on Machine learning. New York: ACM, 2004: 36.
[24] GOLUB G H, VAN LOAN C F. Matrix computations[M]. Baltimore: Johns Hopkins University Press, 2012.
[25] STREHL A, GHOSH J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions[J]. J mach learn res, 2003, 3: 583–617.

备注/Memo

收稿日期:2021-10-31。
基金项目:教育部人文社会科学研究项目(18XJC880002, 20YJAZH084)；重庆市教委科学技术研究项目（KJQN201800539）；重庆市研究生教育教学改革研究项目（yjg223068）
作者简介:毕志臻,硕士研究生,主要研究方向为数据挖掘;杨德刚,教授,博士,主要研究方向为智能算法、神经网络、复杂网络。主持及参与国家自然科学基金、省部级项目等20余项。发表学术论文50余篇;冯骥,副教授,博士,主要研究方向为数据挖掘、人工智能。主持及参与国家自然科学基金、省部级项目等10余项。发表学术论文10余篇
通讯作者:冯骥. E-mail：jifeng@cqnu.edu.cn

更新日期/Last Update: 1900-01-01

面向超大规模数据的自适应谱聚类算法 PDF下载HTML

备注/Memo

面向超大规模数据的自适应谱聚类算法

PDF下载 HTML