CHEN Wei,LYU Li,XIAO Renbin,et al.Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets[J].CAAI Transactions on Intelligent Systems,2025,20(1):172-184.[doi:10.11992/tis.202311005]
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
- Title:
Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets
- 作者:
陈威1,2, 吕莉1,2, 肖人彬3, 谭德坤1,2, 潘正祥4
1. 南昌工程学院 信息工程学院, 江西 南昌 330099;
2. 南昌工程学院 南昌市智慧城市物联感知与协同计算重点实验室, 江西 南昌 330099;
3. 华中科技大学 人工智能与自动化学院, 湖北 武汉 430074;
4. 山东科技大学 计算机科学与工程学院, 山东 青岛 266590
- Author(s):
CHEN Wei1,2, LYU Li1,2, XIAO Renbin3, TAN Dekun1,2, PAN Zhengxiang4
1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China;
2. Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City, Nanchang Institute of Technology, Nanchang 330099, China;
3. School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China;
4. School of Computer Science and Engineering, Shandong University Of Science And Technology, Qingdao 266590, China
- 关键词:
密度峰值聚类; 密度分布不均; 流形数据; K近邻; 逆近邻; 对称邻域; 微簇间相似性; 微簇合并
- Keywords:
density peaks clustering; uneven density; manifold data; K near neighbour; inverse close neighbor; symmetric neighborhood; similarity between micro-clusters; micro-cluster merging
- 分类号:
- DOI:
- 摘要:
- Abstract:
Mixed data refers to datasets containing uneven density distribution and streaming features. The local density definition of density peak clustering algorithm is apt to ignore the sparsity difference of samples between clusters of uneven density distribution dataset, which leads to misselection of clustering centers; the allocation strategy is based on the Euclidean distance for the allocation of the samples, which is not applicable to the streaming dataset with the same type of clusters in the case of the samples far away, resulting in the samples being misallocated. In this paper, we propose a density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets algorithm (DPC-SNMM). The algorithm introduces the concept of symmetric neighborhood and redefines the local density by using the logarithmic inverse cumulative method, which effectively improves the identification of clustering centers; at the same time, it proposes a method of selecting the number of micro-clusters based on the difference of densities, which puts the selection of micro-clusters in a reasonable range; moreover, it designs an inter-micro-cluster similarity metric to perform the micro-cluster merging, which avoids the cascading errors generated during the allocation. Experiments show that compared with comparison algorithms, the algorithm in this paper achieves better clustering results on mixed datasets, UCI datasets and image datasets.
基金项目:国家自然科学基金项目(62066030); 江西省教育厅科技项目(GJJ190958).
通讯作者:吕莉. E-mail:lvli623@163.com
更新日期/Last Update: