[1]李滔,王士同.适合大规模数据集的增量式模糊聚类算法[J].智能系统学报编辑部,2016,11(2):188-199.[doi:10.11992/tis.201507013]
 LI Tao,WANG Shitong.Incremental fuzzy (c+p)-means clustering for large data[J].CAAI Transactions on Intelligent Systems,2016,11(2):188-199.[doi:10.11992/tis.201507013]
点击复制

适合大规模数据集的增量式模糊聚类算法(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年2期
页码:
188-199
栏目:
出版日期:
2016-04-25

文章信息/Info

Title:
Incremental fuzzy (c+p)-means clustering for large data
作者:
李滔 王士同
江南大学 数字媒体学院, 江苏 无锡 214122
Author(s):
LI Tao WANG Shitong
School of Digital Media, Jiangnan University, Wuxi 214122, China
关键词:
增量式模糊聚类FCPMIFCM(c+p)平衡因子大规模数据集
Keywords:
incremental fuzzy clusteringFCPMIFCM(c+p)balance factorlarge data
分类号:
TP391.4
DOI:
10.11992/tis.201507013
摘要:
FCPM算法已被成功地应用到模糊系统建模上,但其在某一类的聚类中心已知的大规模数据上的聚类性能较差。为了避免这个缺点,参照单程模糊c均值(SPFCM)聚类算法、在线模糊c均值(OFCM)聚类算法,提出了适合大规模数据集的增量式模糊聚类算法(Incremental fuzzy(c+p)-means clustering, IFCM (c+p))。通过在每个数据块中使用FCPM算法进行聚类,把每个数据块的聚类中心及其附近的一些样本点加入到下一个数据块参与聚类,同时添加平衡因子以提高算法聚类性能。同SPFCM、OFCM以及rseFCM算法相比,IFCM(c+p)对初始聚类中心不敏感。实验表明在没有花费很多运行时间的情况下,IFCM(c+p)算法的聚类性能比SPFCM算法和rseFCM算法更具优势,因此该算法更适合处理某一类聚类中心已知的大规模数据集。
Abstract:
FCPM has been demonstrated to be successful in fuzzy system modeling, however, it will be ineffective for large data clustering tasks where the cluster centers of one class are known. In order to circumvent this drawback, referring to single-pass fuzzy c-means (SPFCM) clustering algorithm and online fuzzy c-means (OFCM) clustering algorithm, the incremental fuzzy clustering algorithm for large data called IFCM(c+p) is proposed in this paper. FCPM algorithm is used to cluster for each data block at first, and then the clustering centers of data block and some of the sample points being near them are joined into the next block to be clustered, meanwhile the balance factor is given to enhance the clustering performance. In contrast to SPFCM, OFCM and rseFCM, IFCM(c+p) is not sensitive to the initial cluster centers. The experiments indicate the proposed clustering algorithm IFCM(c+p) is competitive to the clustering algorithms SPFCM and rseFCM in the clustering performance without the loss of running time a lot, hence it is especially suitable for large data clustering tasks where the cluster centers of one class are known.

参考文献/References:

[1] BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2): 191-203.
[2] CAN F, DROCHAK N D II. Incremental clustering for dynamic document databases[C]//Proceedings of the 1990 Symposium on Applied Computing. Fayetteville, AR, USA, 1990: 61-67.
[3] KAUFMAN L, ROUSSEEUW P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 2009: 830-832.
[4] GUHA S, RASTOGI R, SHIM K. Cure: an efficient clustering algorithm for large databases[J]. Information systems, 2001, 26(1): 35-58.
[5] CAN F. Incremental clustering for dynamic information processing[J]. ACM transactions on information systems, 1993, 11(2): 143-164.
[6] CAN F, FOX E A, SNAVELY C D, et al. Incremental clustering for very large document databases: Initial MARIAN experience[J]. Information sciences, 1995, 84(1/2): 101-114.
[7] ZHANG Tian, RAMAKIRSHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, USA, 1996: 103-114.
[8] NG R T, HAN Jiawei. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE transactions on knowledge and data engineering, 2002, 14(5): 1003-1016.
[9] SHANKER B U, PAL N R. FFCM: An effective approach for large data sets[C]//Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing. Iizuka, Japan, 1994: 331-332.
[10] CHENG Taiwai, GOLDGOF D B, HALL L O. Fast clustering with application to fuzzy rule generation[C]//Proceedings of 1995 IEEE International Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium. Yokohama, Japan, 1995: 2289-2295.
[11] KOLEN J F, HUTCHESON T. Reducing the time complexity of the fuzzy c-means algorithm[J]. IEEE transactions on fuzzy systems, 2002, 10(2): 263-267.
[12] KOTHARI D, NARAYANAN S T, DEVI K K. Extended fuzzy c-means with random sampling techniques for clustering large data[J]. International journal of innovative research in advanced engineering (IJIRAE), 2014, 1(1): 1-4.
[13] HORE P, HALL L O, GOLDGOF D B. Single pass fuzzy c means[C]//Proceedings of IEEE International Fuzzy Systems Conference. London, UK, 2007: 1-7.
[14] HORE P, HALL L O, GOLDGOF D B, et al. Online fuzzy c means[C]//Proceedings of Annual Meeting of the North American Fuzzy Information Processing Society. New York, USA, 2008: 1-5.
[15] HAVENS T, BEZDEK J, LECKIE C, et al. Fuzzy c-means algorithms for very large data[J]. IEEE transactions on fuzzy systems, 2012, 20(6): 1130-1146.
[16] WANG Yangtao, CHEN Lihui, MEI Jianping. Incremental fuzzy clustering with multiple medoids for large data[J]. IEEE transactions on fuzzy systems, 2014, 22(6): 1557-1568
[17] BÖHM C, PLANT C, SHAO J, et al. Clustering by synchronization[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2010: 583-592.
[18] 应文豪, 许敏, 王士同, 等. 在大规模数据集上进行快速自适应同步聚类[J]. 计算机研究与发展, 2014, 51(4): 707-720. YING Wenhao, XU Min, WANG Shitong, et al. Fast adaptive clustering by synchronization on large scale datasets[J]. Journal of computer research and development, 2014, 51(4): 707-720.
[19] LESKI J M. Fuzzy (c+p) -means clustering and its application to a fuzzy rule-based classifier: towards good generalization and good interpretability[J]. IEEE transactions on fuzzy systems, 2014, 23(4): 802-812.
[20] LIU Jun, MOHAMMED J, CARTER J, et al. Distance-Based clustering of CGH data[J]. Bioinformatics, 2006, 22(16): 1971-1978.
[21] DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern recognition, 2010, 43(3): 767-781.
[22] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American statistical association, 1971, 66(336): 846-850.

备注/Memo

备注/Memo:
收稿日期:2015-7-6;改回日期:。
基金项目:国家自然科学基金项目(61272210).
作者简介:李滔,男,1990年生,硕士研究生,主要研究方向为人工智能与模式识别、模糊聚类算法、增量式学习;王士同,男,1964年生,教授,博士生导师,中国离散数学学会常务理事,中国机器学习学会常务理事。主要研究方向为人工智能/模式识别、图像处理及其应用等。发表学术论文近百篇,其中被SCI、EI检索50余篇。
通讯作者:李滔.E-mail:chasingdream119@163.com.
更新日期/Last Update: 1900-01-01