[1]卞则康,王士同.基于混合距离学习的鲁棒的模糊C均值聚类算法[J].智能系统学报,2017,(04):450-458.[doi:10.11992/tis.201607019]
 BIAN Zekang,WANG Shitong.Robust FCM clustering algorithm based on hybrid-distance learning[J].CAAI Transactions on Intelligent Systems,2017,(04):450-458.[doi:10.11992/tis.201607019]
点击复制

基于混合距离学习的鲁棒的模糊C均值聚类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2017年04期
页码:
450-458
栏目:
出版日期:
2017-08-25

文章信息/Info

Title:
Robust FCM clustering algorithm based on hybrid-distance learning
作者:
卞则康 王士同
江南大学 数字媒体学院, 江苏 无锡 214122
Author(s):
BIAN Zekang WANG Shitong
School of Digital Media, Jiangnan University, Wuxi 214122, China
关键词:
距离度量FCM聚类算法成对约束辅助信息混合距离半监督GIFP-FCM鲁棒性
Keywords:
distance metricFCM clustering algorithmpairwise constraintsside informationhybrid distancesemi-supervisedGIFP-FCMrobustness
分类号:
TP181
DOI:
10.11992/tis.201607019
摘要:
距离度量对模糊聚类算法FCM的聚类结果有关键性的影响。实际应用中存在这样一种场景,聚类的数据集中存在着一定量的带标签的成对约束集合的辅助信息。为了充分利用这些辅助信息,首先提出了一种基于混合距离学习方法,它能利用这样的辅助信息来学习出数据集合的距离度量公式。然后,提出了一种基于混合距离学习的鲁棒的模糊C均值聚类算法(HR-FCM算法),它是一种半监督的聚类算法。算法HR-FCM既保留了GIFP-FCM(Generalized FCM algorithm with improved fuzzy partitions)算法的鲁棒性等性能,也因为所采用更为合适的距离度量而具有更好的聚类性能。实验结果证明了所提算法的有效性。
Abstract:
The distance metric plays a vital role in the fuzzy C-means clustering algorithm. In actual applications, there is a practical scenario in which the clustered data have a certain amount of side information, such as pairwise constraints with labels. To sufficiently utilize this side information, first, we propose a learning method based on hybrid distance, in which side information can be utilized to attain a distance metric formula for the data set. Next, we propose a robust fuzzy C-means clustering algorithm (HR-FCM algorithm) based on hybrid-distance learning, which is semi-supervised. The HR-FCM inherits the robustness of the GIFP-FCM (generalized FCM algorithm with improved fuzzy partitions) and has better clustering performance due to the more appropriate distance metric. The experimental results confirm the effectiveness of the proposed algorithm.

参考文献/References:

[1] 王骏, 王士同. 基于混合距离学习的双指数模糊C均值算法[J]. 软件学报, 2010, 21(8):1878-1888.WANG Jun, WANG Shitong. Double indices FCM algorithm based on hybrid distance metric learning[J]. Journal of software, 2010, 21(8):1878-1888.
[2] WU L, HOI S C H, JIN R, et al. Learning bregman distance functions for semi-supervised clustering[J]. IEEE transactions on knowledge and data engineering, 2012, 24(3):478-491.
[3] WU K L, YANG M S. Alternative c-means clustering algorithms[J]. Pattern recognition, 2002, 35(10):2267-2278.
[4] XING E P, NG A Y, JORDAN M I, et al. Distance metric learning, with application to clustering with side-information[J]. Advances in neural information processing systems, 2003, 15:505-512.
[5] BAR-Hillel A, HERTZ T, SHENTAL N, et al. Learning a mahalanobis metric from equivalence constraints[J]. Journal of machine learning research, 2005, 6(6):937-965.
[6] 郭瑛洁, 王士同, 许小龙. 基于最大间隔理论的组合距离学习算法[J]. 智能系统学报, 2015, 10(6):843-850.
[7] YE J, ZHAO Z, LIU H. Adaptive distance metric learning for clustering[C]//IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007:1-7.
[8] WANG X, WANG Y, WANG L. Improving fuzzy c-means clustering based on feature-weight learning[J]. Pattern recognition letters, 2004, 25(10):1123-1132.
[9] HE P, XU X, HU K, et al. Semi-supervised clustering via multi-level random walk[J]. Pattern recognition, 2014, 47(2):820-832.
[10] HOI S C H, LIU W, LYU M R, et al. Learning distance metrics with contextual constraints for image retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition. New York, USA, 2006:2072-2078.
[11] 曾令伟,伍振兴,杜文才.基于改进自监督学习群体智能(ISLCI)的高性能聚类算法[J].重庆邮电大学学报:自然科学版, 2016, 28(1):131-137.ZENG Lingwei, WU Zhenxing, DU Wencai. Improved self supervised learning collection intelligence based high performance data clustering approach[J].Journal of Chongqing university of posts and telecommunications:natural science edition,2016, 28(1):131-137.
[12] 程旸,王士同. 基于局部保留投影的多可选聚类发掘算法[J].智能系统学报, 2016, 11(5):600-607.CHENG Yang, WANG Shitong. A multiple alternative clusterings mining algorithm using locality preserving projections[J]. CAAI transactions on intelligent systems,2016, 11(5):600-607.
[13] DUDA R O, HART P E, STORK D G. Pattern classification[M]//Pattern classification. Wiley, 2001:119-131.
[14] MEI J P, CHEN L. Fuzzy clustering with weighted medoids for relational data[J]. Pattern recognition, 2010, 43(5):1964-1974.
[15] HOPPNER F, KLAWONN F. Improved fuzzy partitions for fuzzy regression models[J]. International journal of approximate reasoning, 2003, 32(2/3):85-102.
[16] ZHU L, CHUNG F L, WANG S. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE transactions on systems man and cybernetics part B, 2009, 39(3):578-591.
[17] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of machine learning research, 2002, 3(3):583-617.
[18] IWAYAMA M, TOKUNAGA T. Hierarchical Bayesian clustering for automatic text classification[J]. IJCAI, 1996:1322-1327.
[19] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the american statistical association, 1971, 66(336):846-850.

备注/Memo

备注/Memo:
收稿日期:2016-07-23。
基金项目:国家自然科学基金项目(61272210).
作者简介:卞则康,男,1993年生,硕士研究生,主要研究方向为人工智能和模式识别;王士同,男,1964年生,教授,博士生导师,主要研究方向为人工智能与模式识别。发表学术论文近百篇,其中被SCI、EI检索50余篇。
通讯作者:卞则康,E-mail:bianzekang@163.com.
更新日期/Last Update: 2017-08-25