[1]吕莉,陈威,肖人彬,等.面向密度分布不均数据的加权逆近邻密度峰值聚类算法[J].智能系统学报,2024,19(1):165-175.[doi:10.11992/tis.202212015]
LYU Li,CHEN Wei,XIAO Renbin,et al.Density peak clustering algorithm based on weighted reverse nearest neighbor for uneven density datasets[J].CAAI Transactions on Intelligent Systems,2024,19(1):165-175.[doi:10.11992/tis.202212015]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第1期
页码:
165-175
栏目:
学术论文—人工智能基础
出版日期:
2024-01-05
- Title:
-
Density peak clustering algorithm based on weighted reverse nearest neighbor for uneven density datasets
- 作者:
-
吕莉1,2, 陈威1,2, 肖人彬3, 韩龙哲1,2, 谭德坤1,2
-
1. 南昌工程学院 信息工程学院, 江西 南昌 330099;
2. 南昌工程学院 南昌市智慧城市物联感知与协同计算重点实验室, 江西 南昌 330099;
3. 华中科技大学 人工智能与自动化学院, 湖北 武汉 430074
- Author(s):
-
LYU Li1,2, CHEN Wei1,2, XIAO Renbin3, HAN Longzhe1,2, TAN Dekun1,2
-
1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China;
2. Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City, Nanchang Institute of Technology, Nanchang 330099, China;
3. School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
-
- 关键词:
-
密度峰值聚类; 密度分布不均; 逆近邻; 共享逆近邻; 样本相似度; 局部密度; 分配策略; 数据挖掘
- Keywords:
-
density peak clustering; uneven density distribution; reverse nearest neighbor; shared reverse nearest neighbor; sample similarity; local density; distribution strategy; data mining
- 分类号:
-
TP301
- DOI:
-
10.11992/tis.202212015
- 文献标志码:
-
2023-08-02
- 摘要:
-
针对密度分布不均数据,密度峰值聚类算法易忽略类簇间样本的疏密差异,导致误选类簇中心;分配策略易将稀疏区域的样本误分到密集区域,导致聚类效果不佳的问题,本文提出一种面向密度分布不均数据的加权逆近邻密度峰值聚类算法。该算法首先在局部密度公式中引入基于sigmoid函数的权重系数,增加稀疏区域样本的权重,结合逆近邻思想,重新定义了样本的局部密度,有效提升类簇中心的识别率;其次,引入改进的样本相似度策略,利用样本间的逆近邻及共享逆近邻信息,使得同一类簇样本间具有较高的相似度,可有效改善稀疏区域样本分配错误的问题。在密度分布不均、复杂形态和UCI数据集上的对比实验表明,本文算法的聚类效果优于IDPC-FA、FNDPC、FKNN-DPC、DPC和DPCSA算法。
- Abstract:
-
For data with uneven density distribution, the density peak clustering algorithm disregards the sparsity difference among intercluster samples, causing an inaccurate selection of the cluster center. Moreover, the allocation strategy easily divides the samples in sparse areas into dense areas by mistake, leading to a poor clustering effect. Therefore, the density peak clustering algorithm based on the weighted reverse nearest neighbor (DPC-WR) against datasets with uneven density distribution is proposed in this paper. First, the weight coefficient based on the sigmoid function is introduced to the local density formula to increase the weight of samples in sparse areas. Combined with the concept of reverse nearest neighbor, the local density of samples is then redesigned to improve the recognition rate of cluster centers effectively. Second, an improved sample similarity strategy is introduced, which utilizes reverse nearest neighbors and shares this neighbor’s information between samples to increase the similarity of samples in the same cluster. This effectively solves the problem of sample allocation error in sparse areas. Experiments on uneven density distribution, complex morphology, and UCI datasets show that the clustering effect of the DPC-WR algorithm outperforms that of IDPC-FA, FNDPC, FKNN-DPC, DPC, and DPCSA algorithms.
备注/Memo
收稿日期:2022-12-13。
基金项目:国家自然科学基金项目(62066030); 江西省重点研发计划项目(20192BBE50076,20203BBGL73225); 江西省教育厅科技项目(GJJ190958).
作者简介:吕莉,教授,博士,主要研究方向为智能计算与计算智能、目标跟踪与检测、大数据与人工智能。 主持国家自然科学基金项目2项,发表学术论文80余篇。E-mail:lvli623@163.com;陈威,硕士研究生, 主要研究方向为数据挖掘。E-mail:chenwei9801@163.com;肖人彬,教授,博士生导师,主要研究方向为群体智能、大规模个性化定制、复杂系统与复杂性科学。主持国家自然科学基金项目11项,主持获得教育部自然科学奖1项和湖北省自然科学奖及科技进步奖4 项,发表学术论文300余篇。出版学术专著和教 材10余部。E-mail:rbxiao@hust.edu.cn
通讯作者:吕莉. E-mail:lvli623@163.com
更新日期/Last Update:
1900-01-01