[1]CHEN Zhongshang,FENG Ji,YANG Degang,et al.Hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures[J].CAAI Transactions on Intelligent Systems,2025,20(3):584-593.[doi:10.11992/tis.202407001]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 3
Page number:
584-593
Column:
学术论文—机器学习
Public date:
2025-05-05
- Title:
-
Hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures
- Author(s):
-
CHEN Zhongshang; FENG Ji; YANG Degang; CAI Fapeng
-
College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
-
- Keywords:
-
cluster analysis; hybrid neighborhood graph; shared natural neighbors; improved natural neighborhood graph; shared natural neighborhood graph; subgraph similarity; complex dataset; data mining
- CLC:
-
TP301
- DOI:
-
10.11992/tis.202407001
- Abstract:
-
Complex structured datasets typically refer to datasets containing clusters of different shapes (including spherical, non-spherical, and manifold shapes), sizes, and densities. The natural neighbor algorithm exhibits limitations in handling datasets with unclear boundaries and varying densities. Particularly, its performance decreases significantly when the dataset contains a significant amount of noise. To address this drawback, we propose a hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures (HCHNG). We proposed a method of shared natural neighborhood graph, which uses the neighbor relationships to sparse the dataset and reduce the impact of abnormal samples on clustering results. Subsequently, the algorithm divides the dataset into several subgraphs and enhances the processability of variable density data by merging operations. Concurrently, we propose a new method for defining subgraph similarity, which ensures higher similarity between subgraphs of the same class. Additionally, we improve the performance of the natural neighborhood graph in identifying datasets with blurred boundaries. The experimental results reveal that the HCHNG algorithms can recognize variable density spherical datasets and complex datasets containing a large amount of noise. Therefore, our algorithm is more effective than the existing methods in processing datasets with complex structures.