[1]陈威,吕莉,肖人彬,等.面向混合数据的对称邻域和微簇合并密度峰值聚类算法[J].智能系统学报,2025,20(1):172-184.[doi:10.11992/tis.202311005]
CHEN Wei,LYU Li,XIAO Renbin,et al.Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets[J].CAAI Transactions on Intelligent Systems,2025,20(1):172-184.[doi:10.11992/tis.202311005]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第1期
页码:
172-184
栏目:
学术论文—人工智能基础
出版日期:
2025-01-05
- Title:
-
Density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets
- 作者:
-
陈威1,2, 吕莉1,2, 肖人彬3, 谭德坤1,2, 潘正祥4
-
1. 南昌工程学院 信息工程学院, 江西 南昌 330099;
2. 南昌工程学院 南昌市智慧城市物联感知与协同计算重点实验室, 江西 南昌 330099;
3. 华中科技大学 人工智能与自动化学院, 湖北 武汉 430074;
4. 山东科技大学 计算机科学与工程学院, 山东 青岛 266590
- Author(s):
-
CHEN Wei1,2, LYU Li1,2, XIAO Renbin3, TAN Dekun1,2, PAN Zhengxiang4
-
1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China;
2. Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City, Nanchang Institute of Technology, Nanchang 330099, China;
3. School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China;
4. School of Computer Science and Engineering, Shandong University Of Science And Technology, Qingdao 266590, China
-
- 关键词:
-
密度峰值聚类; 密度分布不均; 流形数据; K近邻; 逆近邻; 对称邻域; 微簇间相似性; 微簇合并
- Keywords:
-
density peaks clustering; uneven density; manifold data; K near neighbour; inverse close neighbor; symmetric neighborhood; similarity between micro-clusters; micro-cluster merging
- 分类号:
-
TP301
- DOI:
-
10.11992/tis.202311005
- 摘要:
-
混合数据是指包含密度分布不均和流形特征的数据集。密度峰值聚类算法局部密度定义方式易忽略密度分布不均数据集类簇间样本的疏密差异,导致误选聚类中心;分配策略依据欧氏距离进行样本分配,不适用于流形数据集同一类簇样本相距较远的情况,致使样本被错误分配。针对这些问题,本文提出一种面向混合数据的对称邻域和微簇合并密度峰值聚类算法。该算法引入对称邻域概念,采用对数倒数累加方法重新定义局部密度,有效提升了聚类中心的识别度;同时,提出了一种基于密度差的微簇个数选取方法,使微簇个数的选取处于合理范围;此外,设计了一种微簇间相似性度量方法进行微簇合并,避免了分配时产生的连带错误。实验表明,相较于对比算法,本文算法在混合数据集、UCI数据集和图像数据集上均取得较好的聚类效果。
- Abstract:
-
Mixed data refers to datasets containing uneven density distribution and streaming features. The local density definition of density peak clustering algorithm is apt to ignore the sparsity difference of samples between clusters of uneven density distribution dataset, which leads to misselection of clustering centers; the allocation strategy is based on the Euclidean distance for the allocation of the samples, which is not applicable to the streaming dataset with the same type of clusters in the case of the samples far away, resulting in the samples being misallocated. In this paper, we propose a density peak clustering algorithm based on symmetric neighborhood and micro-cluster merging for mixed datasets algorithm (DPC-SNMM). The algorithm introduces the concept of symmetric neighborhood and redefines the local density by using the logarithmic inverse cumulative method, which effectively improves the identification of clustering centers; at the same time, it proposes a method of selecting the number of micro-clusters based on the difference of densities, which puts the selection of micro-clusters in a reasonable range; moreover, it designs an inter-micro-cluster similarity metric to perform the micro-cluster merging, which avoids the cascading errors generated during the allocation. Experiments show that compared with comparison algorithms, the algorithm in this paper achieves better clustering results on mixed datasets, UCI datasets and image datasets.
备注/Memo
收稿日期:2023-11-5。
基金项目:国家自然科学基金项目(62066030); 江西省教育厅科技项目(GJJ190958).
作者简介:陈威,硕士研究生,主要研究方向为大数据挖掘。E-mail:chenwei9801@163.com。;吕莉,教授,博士,主要研究方向为智能计算与计算智能、目标跟踪与检测、大数据与人工智。主持国家自然科学基金项目2项,发表学术论文80余篇。E-mail:lvli623@163.com。;肖人彬,教授,博士,主要研究方向为复杂系统建模与分析、群集智能。主持国家自然科学基金11项,获教育部自然科学奖1项和湖北省自然科学奖及科技进步奖4项。发表学术论文300余篇,出版学术专著和教材10余部。E-mail:rbxiao@hust.edu.cn。
通讯作者:吕莉. E-mail:lvli623@163.com
更新日期/Last Update:
2025-01-05