[1]YANG Zhiyong,JIANG Feng,YU Xu,et al.Mixed data clustering initialization method using outlier detection technology[J].CAAI Transactions on Intelligent Systems,2023,18(1):56-65.[doi:10.11992/tis.202203031]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
18
Number of periods:
2023 1
Page number:
56-65
Column:
学术论文—机器感知与模式识别
Public date:
2023-01-05
- Title:
-
Mixed data clustering initialization method using outlier detection technology
- Author(s):
-
YANG Zhiyong; JIANG Feng; YU Xu; DU Junwei
-
School of Information Science & Technology, Qingdao University of Science and Technology, Qingdao 266100, China
-
- Keywords:
-
initialization of clustering; mixed-type data; outlier detection; neighborhood rough set; granular neighborhood entropy; distance outlier factor; weighted density; weighted distance
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202203031
- Abstract:
-
In recent years, the clustering problem of mixed-type data has received wide attention. As an effective method to process mixed-type data, K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers. However, it is difficult to guarantee the quality of clustering results in many practical applications. To solve above problem, in this paper we select initial centers for K-prototype algorithm based on outlier detection, and present a new initialization algorithm (Initialization of K-prototype Clustering Based on Outlier Detection and Density, denoted as IKP-ODD) for mixed-type data clustering. Given a candidate object, IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor, weighted density and weighted distances from existing initial centers. IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density. When calculating the weighted densities of objects and the weighted distances between objects, we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute, and assign different weights to different attributes according to the significances of attributes, which can effectively reflect the difference between different attributes. Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.