[1]赵燕伟,朱芬,桂方志,等.基于可拓距的改进k-means聚类算法[J].智能系统学报,2020,15(2):344-351.[doi:10.11992/tis.201811020]
ZHAO Yanwei,ZHU Fen,GUI Fangzhi,et al.Improved k-means algorithm based on extension distance[J].CAAI Transactions on Intelligent Systems,2020,15(2):344-351.[doi:10.11992/tis.201811020]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第2期
页码:
344-351
栏目:
学术论文—人工智能基础
出版日期:
2020-03-05
- Title:
-
Improved k-means algorithm based on extension distance
- 作者:
-
赵燕伟1, 朱芬1, 桂方志1, 任设东2, 谢智伟1, 徐晨1
-
1. 浙江工业大学 特种装备制造与先进加工技术教育部/浙江省重点实验室, 浙江 杭州 310014;
2. 浙江业大学 计算机科学与技术学院, 浙江 杭州 310014
- Author(s):
-
ZHAO Yanwei1, ZHU Fen1, GUI Fangzhi1, REN Shedong2, XIE Zhiwei1, XU Chen1
-
1. Key Lab of Special Purpose Equipment and Advanced Manufacturing Technology, Ministry of Education & Zhejiang Province, Zhejiang University of Technology, Hangzhou 310014, China;
2. College of Computer Science and Technology, Zhejiang University of T
-
- 关键词:
-
可拓距; k-means聚类算法; 缩放因子; 初始聚类中心; 密集度; 疏远度
- Keywords:
-
extension distance; k-means clustering algorithm; scaling factor; initial cluster center; intensity; alienation
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.201811020
- 摘要:
-
针对现有聚类算法在初始聚类中心优化过程中存在首个初始聚类中心点落于边界非密集区域的不足,导致出现算法聚类效果不均衡问题,提出一种基于可拓距优选初始聚类中心的改进k-means算法。将样本经典距离向可拓区间映射,并通过可拓侧距计算方法得到可拓左侧距及可拓右侧距;引入平均可拓侧距概念,将平均可拓左侧距和平均可拓右侧距分别作为样本密集度和聚类中心疏远度的量化指标;在此基础上,给出初始聚类中心选取准则。通过与传统k-means聚类算法进行对比,结果表明改进后的k-means聚类算法选取的初始聚类中心分布更加均匀,聚类效果更好,尤其在对高维数据聚类时具有更高的聚类准确率和更好的均衡性。
- Abstract:
-
An improved k -means algorithm optimizing the initial cluster centers based on extension distance was proposed to solve several problems that lead to clustering imbalance of the algorithm, such as the poor quality of initial cluster center selection or the first initial cluster center easily falling into the non-dense area of the data boundary. First, the classical distance of the sample was mapped onto the extension interval, and the extension left-side and right-side distances were obtained using the extension distance calculation method. Then, the average extension side distance was determined, and the extension left-side and right-side distances were taken as the quantitative indicators of sample density and cluster center distance, respectively. Subsequently, the selection criteria of the initial cluster center were given. Finally, compared with the traditional k-means algorithm, the improved k-means algorithm obtained higher clustering accuracy and better balance, particularly in high-dimensional data clustering.
更新日期/Last Update:
1900-01-01