[1]杨志勇,江峰,于旭,等.采用离群点检测技术的混合型数据聚类初始化方法[J].智能系统学报,2023,18(1):56-65.[doi:10.11992/tis.202203031]
YANG Zhiyong,JIANG Feng,YU Xu,et al.Mixed data clustering initialization method using outlier detection technology[J].CAAI Transactions on Intelligent Systems,2023,18(1):56-65.[doi:10.11992/tis.202203031]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
56-65
栏目:
学术论文—机器感知与模式识别
出版日期:
2023-01-05
- Title:
-
Mixed data clustering initialization method using outlier detection technology
- 作者:
-
杨志勇, 江峰, 于旭, 杜军威
-
青岛科技大学 信息科学技术学院,山东 青岛 266100
- Author(s):
-
YANG Zhiyong, JIANG Feng, YU Xu, DU Junwei
-
School of Information Science & Technology, Qingdao University of Science and Technology, Qingdao 266100, China
-
- 关键词:
-
聚类初始化; 混合型数据; 离群点检测; 邻域粗糙集; 粒度邻域熵; 距离离群因子; 加权密度; 加权距离
- Keywords:
-
initialization of clustering; mixed-type data; outlier detection; neighborhood rough set; granular neighborhood entropy; distance outlier factor; weighted density; weighted distance
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202203031
- 摘要:
-
近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density, IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。
- Abstract:
-
In recent years, the clustering problem of mixed-type data has received wide attention. As an effective method to process mixed-type data, K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers. However, it is difficult to guarantee the quality of clustering results in many practical applications. To solve above problem, in this paper we select initial centers for K-prototype algorithm based on outlier detection, and present a new initialization algorithm (Initialization of K-prototype Clustering Based on Outlier Detection and Density, denoted as IKP-ODD) for mixed-type data clustering. Given a candidate object, IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor, weighted density and weighted distances from existing initial centers. IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density. When calculating the weighted densities of objects and the weighted distances between objects, we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute, and assign different weights to different attributes according to the significances of attributes, which can effectively reflect the difference between different attributes. Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.
备注/Memo
收稿日期:2022-03-17。
基金项目:国家自然科学基金项目(61973180,61671261);山东省自然科学基金项目(ZR2021MF092,ZR2022MF326).
作者简介:杨志勇,硕士研究生,主要研究方向为机器学习与数据挖掘;江峰,教授,主要研究方向为人工智能、粗糙集理论与网络安全。完成国家自然科学基金、山东省自然科学基金等3项,发表学术论文30余篇;于旭,副教授,主要研究方向为推荐系统、迁移学习与众包服务计算。完成国家自然科学基金、山东省自然科学基金等10余项,发表学术论文30余篇
通讯作者:江峰.E-mail:jiangfeng@qust.edu.cn
更新日期/Last Update:
1900-01-01