[1]CHEN Wei,LYU Li,XIAO Renbin,et al.Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets[J].CAAI Transactions on Intelligent Systems,2025,20(1):172-184.[doi:10.11992/tis.202311005]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 1
Page number:
172-184
Column:
学术论文—人工智能基础
Public date:
2025-01-05
- Title:
-
Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets
- Author(s):
-
CHEN Wei1; 2; LYU Li1; 2; XIAO Renbin3; TAN Dekun1; 2; PAN Zhengxiang4
-
1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China;
2. Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City, Nanchang Institute of Technology, Nanchang 330099, China;
3. School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China;
4. School of Computer Science and Engineering, Shandong University Of Science And Technology, Qingdao 266590, China
-
- Keywords:
-
density peaks clustering; uneven density; manifold data; K near neighbour; inverse close neighbor; symmetric neighborhood; similarity between micro-clusters; micro-cluster merging
- CLC:
-
TP301
- DOI:
-
10.11992/tis.202311005
- Abstract:
-
Mixed data refers to datasets containing uneven density distribution and streaming features. The local density definition of density peak clustering algorithm is apt to ignore the sparsity difference of samples between clusters of uneven density distribution dataset, which leads to misselection of clustering centers; the allocation strategy is based on the Euclidean distance for the allocation of the samples, which is not applicable to the streaming dataset with the same type of clusters in the case of the samples far away, resulting in the samples being misallocated. In this paper, we propose a density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets algorithm (DPC-SNMM). The algorithm introduces the concept of symmetric neighborhood and redefines the local density by using the logarithmic inverse cumulative method, which effectively improves the identification of clustering centers; at the same time, it proposes a method of selecting the number of micro-clusters based on the difference of densities, which puts the selection of micro-clusters in a reasonable range; moreover, it designs an inter-micro-cluster similarity metric to perform the micro-cluster merging, which avoids the cascading errors generated during the allocation. Experiments show that compared with comparison algorithms, the algorithm in this paper achieves better clustering results on mixed datasets, UCI datasets and image datasets.