[1]陈仲尚,冯骥,杨德刚,等.基于混合邻域图的复杂结构数据集层次聚类算法[J].智能系统学报,2025,20(3):584-593.[doi:10.11992/tis.202407001]
CHEN Zhongshang,FENG Ji,YANG Degang,et al.Hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures[J].CAAI Transactions on Intelligent Systems,2025,20(3):584-593.[doi:10.11992/tis.202407001]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第3期
页码:
584-593
栏目:
学术论文—机器学习
出版日期:
2025-05-05
- Title:
-
Hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures
- 作者:
-
陈仲尚, 冯骥, 杨德刚, 蔡发鹏
-
重庆师范大学 计算机与信息科学学院, 重庆 401331
- Author(s):
-
CHEN Zhongshang, FENG Ji, YANG Degang, CAI Fapeng
-
College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
-
- 关键词:
-
聚类分析; 混合邻域图; 共享自然邻居; 改进的自然邻域图; 共享自然邻域图; 子图相似性; 复杂数据集; 数据挖掘
- Keywords:
-
cluster analysis; hybrid neighborhood graph; shared natural neighbors; improved natural neighborhood graph; shared natural neighborhood graph; subgraph similarity; complex dataset; data mining
- 分类号:
-
TP301
- DOI:
-
10.11992/tis.202407001
- 摘要:
-
复杂结构数据集通常指包含不同形状(如球形、非球形、流形)、大小和密度的簇的数据集。自然邻居算法在处理边界模糊、密度变化的数据集时存在局限性,特别是在数据集中含有大量噪声时,其性能会显著下降。针对这些问题,本文提出一种基于混合邻域图的复杂结构数据集层次聚类算法(hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures, HCHNG)。该方法提出一种共享自然邻域图方法,通过邻居关系稀疏数据集以减少噪声样本对聚类结果的影响。随后,HCHNG将数据集划分为子图并加以合并,这一策略增强了算法处理变密度数据集的能力,同时,定义一种新的子图相似性度量方法,提高同类子图间的相似性。此外,对自然邻域图进行改进,以提升其在识别边界模糊数据集时的性能。在具有复杂结构的人工数据集和真实数据集上的对比实验表明,本文算法不仅能有效识别变密度球形数据集,而且在含有大量噪声的复杂数据集中也拥有优越的性能,在处理具有复杂结构的数据集时比现有方法高效。
- Abstract:
-
Complex structured datasets typically refer to datasets containing clusters of different shapes (including spherical, non-spherical, and manifold shapes), sizes, and densities. The natural neighbor algorithm exhibits limitations in handling datasets with unclear boundaries and varying densities. Particularly, its performance decreases significantly when the dataset contains a significant amount of noise. To address this drawback, we propose a hybrid neighborhood graph-based hierarchical clustering algorithm for datasets with complex structures (HCHNG). We proposed a method of shared natural neighborhood graph, which uses the neighbor relationships to sparse the dataset and reduce the impact of abnormal samples on clustering results. Subsequently, the algorithm divides the dataset into several subgraphs and enhances the processability of variable density data by merging operations. Concurrently, we propose a new method for defining subgraph similarity, which ensures higher similarity between subgraphs of the same class. Additionally, we improve the performance of the natural neighborhood graph in identifying datasets with blurred boundaries. The experimental results reveal that the HCHNG algorithms can recognize variable density spherical datasets and complex datasets containing a large amount of noise. Therefore, our algorithm is more effective than the existing methods in processing datasets with complex structures.
备注/Memo
收稿日期:2024-7-1。
基金项目:重庆市教委科学技术研究项目 (KJZD-M202300502, KJQN201800539).
作者简介:陈仲尚,硕士研究生,主要研究方向为数据挖掘。E-mail: chenzhongshang@foxmail.com。;冯骥,副教授,博士,计算机与信息科学院副院长,主要研究方向为数据挖掘、人工智能。主持及参与国家自然科学基金、省部级项目等10余项。发表学术论文10余篇。E-mail: jifeng@cqnu.edu.cn。;杨德刚,教授,博士,主要研究方向为智能算法、神经网络、复杂网络。主持及参与国家自然科学基金、省部级项目等20余项。发表学术论文50余篇。E-mail: yangdg@cqnu.edu.cn。
通讯作者:冯骥. E-mail:jifeng@cqnu.edu.cn
更新日期/Last Update:
1900-01-01