[1]陈爱国,王士同.基于极大熵的知识迁移模糊聚类算法[J].智能系统学报,2017,12(01):95-103.[doi:10.11992/tis.201602003]
 CHEN Aiguo,WANG Shitong.A maximum entropy-based knowledge transfer fuzzy clustering algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(01):95-103.[doi:10.11992/tis.201602003]
点击复制

基于极大熵的知识迁移模糊聚类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第12卷
期数:
2017年01期
页码:
95-103
栏目:
出版日期:
2017-02-25

文章信息/Info

Title:
A maximum entropy-based knowledge transfer fuzzy clustering algorithm
作者:
陈爱国12 王士同1
1. 江南大学 数字媒体学院, 江苏 无锡 214122;
2. 香港理工大学 计算机系, 香港 九龙 999077
Author(s):
CHEN Aiguo12 WANG Shitong1
1. School of Digital Media, Jiangnan University, Wuxi 214122, China;
2. Department of Computing, Hong Kong Polytechnic University, Kowloon 999077, China
关键词:
知识迁移极大熵聚类算法极大熵聚类模糊聚类
Keywords:
knowledge transfermaximum entropyclustering algorithmsmaximum entropy clusteringfuzzy clustering
分类号:
TP274
DOI:
10.11992/tis.201602003
摘要:
针对传统的聚类算法在样本数据量不足或样本受到污染情况下的聚类性能下降问题,在经典的极大熵聚类算法(MEKTFCA)的基础上,提出了一种新的融合历史聚类中心点和历史隶属度这两种知识的基于极大熵的知识迁移模糊聚类算法。该算法通过学习由源域总结出来的有益历史聚类中心和历史隶属度知识来指导数据量不足或受污染的目标域数据的聚类任务,从而提高了聚类性能。通过一组模拟数据集和两组真实数据集构造的迁移场景上的实验,证明了该算法的有效性。
Abstract:
To address the issue of clustering performance degradation when traditional clustering algorithms are applied to insufficient and/or noisy data, a maximum entropy-based knowledge transfer fuzzy clustering algorithm is proposed. This improves the classical maximum entropy clustering algorithm for target domains by leveraging two kinds of knowledge from the source domain, i.e., historical clustering centers and historical degree of membership, into the objective function proposed for clustering insufficient and/or noisy target data. The effectiveness of the proposed algorithm is demonstrated by experiments on several synthetic and two real datasets.

参考文献/References:

[1] CARIOU C, CHEHDI K. Unsupervised nearest neighbors clustering with application to hyperspectral images[J]. IEEE journal of selected topics in signal processing, 2015, 9(6): 1105-1116.
[2] ALI A, BOYACI A, BAYNAL K. Data mining application in banking sector with clustering and classification methods[C]//Proceedings of 2015 International Conference on Industrial Engineering and Operations Management. Dubai, UAE, 2015: 1-8.
[3] LI Shuai, ZHOU Xiaofeng, SHI Haibo, et al. An efficient clustering method for medical data applications[C]//Proceedings of 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent System. Shenyang, China, 2015: 133-138.
[4] LIKAS A, VLASSIS N, VERBEEK J J. The global k-means clustering algorithm[J]. Pattern recognition, 2003, 36(2): 451-461.
[5] BEZDEK J C. Pattern recognition with fuzzy objective function algorithms[M]. New York: Springer, 1981: 43-93.
[6] KARAYIANNIS N B. MECA: maximum entropy clustering algorithm[C]//Proceedings of the 3rd IEEE International Conference on Fuzzy Systems. Orlando, USA, 1994, 1: 630-635.
[7] LI Ruiping, MUKAIDONO M. A maximum-entropy approach to fuzzy clustering[C]//Proceedings of 1995 the 4th IEEE International Conference on Fuzzy System. Yokohama, Japan, 1995, 4: 2227-2232.
[8] ZHANG Tian, RAMAKRISHNAN R, LIVNY M. BIRCH: an efficient data clustering method for very large databases[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. New York, NY, USA, 1996: 103-114.
[9] GUHA S, RASTOGI R, SHIM K. CURE: an efficient clustering algorithm for large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, NY, USA, 1998: 73-84.
[10] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceeding of the Second International Conference on Knowledge Discovery and Data Mining. Portland, Oregon, USA, 1996: 226-231.
[11] ANKERST M, BREUNIG M M, KRIEGEL H P, et al. OPTICS: ordering Points to Identify the Clustering Structure[C]//Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. Philadelphia, Pennsylvania, USA, 1999: 49-60.
[12] ARIAS-CASTRO E, CHEN Guangliang, LERMAN G. Spectral Clustering based on local linear approximations[J]. Electronic journal of statistics, 2011, 5: 1537-1587.
[13] PAN S J, YANG Qiang. A survey on transfer learning[J]. IEEE transactions on knowledge and data engineering, 2010, 22(10): 1345-1359.
[14] GU Quanquan, ZHOU Jie. Learning the shared subspace for multi-task clustering and transductive transferclassification[C]//Proceedings of Ninth IEEE International Conference on Data Mining. Miami, FL, USA, 2009: 159-168.
[15] DAI Wenyuan, YANG Qiang, XUE Guirong, et al. Self-taught clustering[C]//Proceedings of the 25th International Conference on Machine Learning. New York, NY, USA, 2008: 200-207.
[16] GU Quanquan, ZHOU Jie. Co-clustering on manifolds[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2009: 359-368.
[17] JIANG Wenhao, CHUNG F L. Transfer spectral clustering[M]//FLACH P A, BIE T D, CRISTIANINI N. Machine Learning and Knowledge Discovery in Databases. Berlin Heidelberg: Springer, 2012: 789-803.
[18] JING Liping, NG K M, HUANG J Z. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data[J]. IEEE transactions on knowledge and data engineering, 2007, 19(8): 1026-1041.
[19] LIU Jun, MOHAMMED J, CARTER J, et al. Distance-based clustering of CGH data[J]. Bioinformatics, 2006, 22(16): 1971-1978.
[20] DAI Wenyuan, XUE Guirong, YANG Qiang, et al. Co-clustering based classification for out-of-domain documents[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA, 2007: 210-219.
[21] MCCALLUM A K. Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering[EB/OL]. 1996. http://www.cs.cmu.edu/mccallum/bow.
[22] BAY S D, KIBLER D, PAZZANI M J, et al. The UCI KDD archive of large data sets for data mining research and experimentation[J]. ACM SIGKDD explorations newsletter, 2000, 2(2): 81-85.

备注/Memo

备注/Memo:
收稿日期:2016-2-4;改回日期:。
基金项目:国家自然科学基金项目(61272210);江苏省杰出青年基金项目(BK20140001);江苏省自然科学基金项目(BK20130155).
作者简介:陈爱国,男,1975年生,博士研究生,主要研究方向为模式识别与机器学习;王士同,男,1964年生,教授,博士生导师,中国离散数学学会常务理事,中国机器学习学会常务理事,主要研究方向为人工智能、模式识别和生物信息。发表学术论文近百篇,其中被SCI、EI检索50余篇。
通讯作者:陈爱国.E-mail:agchen@jiangnan.edu.cn.
更新日期/Last Update: 1900-01-01