[1]赵嘉,马清,肖人彬,等.面向流形数据的共享近邻密度峰值聚类算法[J].智能系统学报,2023,18(4):719-730.[doi:10.11992/tis.202209026]
ZHAO Jia,MA Qing,XIAO Renbin,et al.Density peaks clustering based on shared nearest neighbor for manifold datasets[J].CAAI Transactions on Intelligent Systems,2023,18(4):719-730.[doi:10.11992/tis.202209026]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第4期
页码:
719-730
栏目:
学术论文—机器学习
出版日期:
2023-07-15
- Title:
-
Density peaks clustering based on shared nearest neighbor for manifold datasets
- 作者:
-
赵嘉1, 马清1, 肖人彬2, 潘正祥3, 韩龙哲1
-
1. 南昌工程学院 信息工程学院, 江西 南昌 330099;
2. 华中科技大学 人工智能与自动化学院, 湖北 武汉 430074;
3. 山东科技大学 计算机科学与工程学院, 山东 青岛 266590
- Author(s):
-
ZHAO Jia1, MA Qing1, XIAO Renbin2, PAN Zhengxiang3, HAN Longzhe1
-
1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China;
2. Institute of artificial intelligence and automation, Huazhong University of science and technology, Wuhan 430074, China;
3. Institute of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
-
- 关键词:
-
密度峰值; 聚类分析; 流形数据; K近邻; 共享近邻; 流形数据; 样本相似度; 数据挖掘; 图像处理
- Keywords:
-
density peaks; clustering analysis; manifold data; K nearest neighbor; shared nearest neighbor; manifold data; sample similarity; data mining; image processing
- 分类号:
-
TP301.6
- DOI:
-
10.11992/tis.202209026
- 摘要:
-
流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(density peaks clustering based on shared nearestneighbor for manifold datasets, DPC-SNN)算法。提出了一种基于共享近邻的样本相似度定义方式,使得同一流形类簇样本间的相似度尽可能高;基于上述相似度定义局部密度,不忽略距类簇中心较远样本的密度贡献,能更好地区分出流形类簇的类簇中心与其他样本;根据样本的相似度分配剩余样本,避免了样本的连续误分配。DPC-SNN算法与DPC、FKNN-DPC、FNDPC、DPCSA及IDPC-FA算法的对比实验结果表明,DPC-SNN算法能够有效发现流形数据的类簇中心并准确完成聚类,对真实以及人脸数据集也有不错的聚类效果。
- Abstract:
-
Manifold data consists of some arc-shaped or ring-shaped clusters, which are characterized by a large distance between samples of the same cluster. Density peaks clustering (DPC) algorithm cannot effectively identify the cluster centers of the manifold clusters and is prone to the problem of continuous misallocation of samples when allocating the remaining samples. To solve these problems, density peaks clustering based on shared nearest neighbor (DPC-SNN) algorithm for manifold data is proposed in this paper. A sample similarity definition based on shared nearest neighbor is proposed to make the similarity between samples of the same manifold cluster as high as possible; Then, the local density is defined based on the above similarity without ignoring the density contribution of samples farther from the cluster centers, which can better distinguish the cluster centers from other samples of manifold cluster; And then, the remaining samples are allocated according to the similarity of samples to avoid continuous misallocation of samples. The comparative experimental results between DPC-SNN and other algorithms of DPC, FKNN-DPC, FNDPC, DPCSA and IDPC-FA show that DPC-SNN can effectively find the cluster centers of manifold data and accurately complete clustering, and has a good clustering effect on real and faces datasets.
备注/Memo
收稿日期:2022-09-15。
基金项目:国家自然科学基金项目(52069014,61962036).
作者简介:赵嘉,教授,博士,主要研究方向为智能计算与计算智能、模式识别与大数据挖掘。主持国家自然科学基金项目2项。发表学术论文60余篇,出版专著1部。;马清,硕士研究生,主要研究方向为数据挖掘;肖人彬,教授,博士生导师,主要研究方向为群体智能、大规模个性化定制、复杂系统与复杂性科学。主持并承担国家自然科学基金项目11项,作为第一完成人获得教育部自然科学奖1项和湖北省自然科学奖及科技进步奖4项。发表学术论文300余篇,出版学术专著和教材10余部。
通讯作者:赵嘉.E-mail:zhaojia925@163.com
更新日期/Last Update:
1900-01-01