[1]王跃,杨燕,王红军.一种基于少量标签的改进迁移模糊聚类[J].智能系统学报编辑部,2016,11(3):310-317.[doi:10.11992/tis.201603046]
 WANG Yue,YANG Yan,WANG Hongjun.An improved transfer fuzzy clustering with few labels[J].CAAI Transactions on Intelligent Systems,2016,11(3):310-317.[doi:10.11992/tis.201603046]
点击复制

一种基于少量标签的改进迁移模糊聚类(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年3期
页码:
310-317
栏目:
出版日期:
2016-06-25

文章信息/Info

Title:
An improved transfer fuzzy clustering with few labels
作者:
王跃 杨燕 王红军
西南交通大学 信息科学与技术学院, 四川 成都 610031
Author(s):
WANG Yue YANG Yan WANG Hongjun
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
关键词:
聚类迁移学习半监督可能性C均值模糊C均值
Keywords:
clusteringtransfer learningsemi-supervisedpossibilistic C-meansfuzzy C-means
分类号:
TP301
DOI:
10.11992/tis.201603046
摘要:
传统聚类算法难以利用已有的历史信息,尤其是数据被污染的情况下聚类结果不理想;半监督聚类常用于数据中有部分标签的情况。在源数据有少量标签的情况下,提出半监督混合C均值聚类算法(SS-FPCM);基于迁移学习框架,针对负迁移问题对算法进行修正,提出了防止负迁移的半监督迁移算法(TSS-FPCM);最后,为了充分借鉴源数据的信息,利用“代表点”来代替源数据类信息,融入算法中再次迁移得到改善的半监督迁移算法(ITSS-FPCM)。实验表明,3个算法能够有效的利用源数据提高聚类性能。SS-FPCM与TSS-FPCM可以利用源数据的少量标签数据,而ITSS-FPCM算法结合了标签数据与“代表点”两个有效信息,在数据信息匮乏、数据被污染的情况下得到较好的聚类结果。
Abstract:
In the traditional clustering algorithm, it is difficult to utilize existing historical information, which tends to be less effective in cases in which the data is contaminated. The semi-supervised clustering algorithm is often used in such circumstances, wherein the target data has some labeled examples. For situations in which the source data has partially labeled samples, in this paper, we propose a semi-supervised fuzzy possibilistic C-means algorithm (SS-FPCM). Based on the transfer learning framework, we use a transfer semi-supervised fuzzy possibilistic C-means algorithm (TSS-FPCM) to avoid the negative transfer learning problem. Finally, in order to make full use of source data information, we use representative points to replace the source data class. Thus, we have developed an improved transfer semi-supervised fuzzy possibilistic C-means algorithm (ITSS-FPCM). The experimental results demonstrate that these three algorithms may be used to improve the clustering performance by using source data effectively, as compared with other clustering algorithms. Moreover, the SS-FPCM and TSS-FPCM algorithms exploit partially labeled data from the source, while the ITSS-FPCM algorithm combines the labeled data and "representative points," for cases having insufficient data information or contaminated data, and an excellent clustering result is attained.

参考文献/References:

[1] 庄福振, 罗平, 何清, 等. 迁移学习研究进展[J]. 软件学报, 2015, 26(1): 26-39. ZHUANG Fuzhen, LUO Ping, HE Qing, et al. Survey on transfer learning research[J]. Journal of software, 2015, 26(1): 26-39.
[2] WEI Fengmei, ZHANG Jianpei, CHU Yan, et al. FSFP: transfer learning from long texts to the short[J]. Applied mathematics and information sciences, 2014, 8(4): 2033-2040.
[3] DAI Wenyuan, XUE Guirong, YANG Qiang, et al. Co-clustering based classification for out-of-domain documents[C]//Proceedings of the 13th ACM SIGKDD Tinternational Conference on Knowledge Discovery and Data Mining. San Jose, California, USA, 2007: 210-219.
[4] DAI Wenyuan, YANG Qiang, XUE Guirong, et al. Self-taught clustering[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland,, 2008: 200-207.
[5] SAMANTA S, SELVAN A T, DAS S. Cross-domain clustering performed by transfer of knowledge across domains[C]//Proceedings of the 4th National Conference on Pattern Recognition, Image Processing and Graphics (NCVPRIPG). Jodhpur, India, 2013: 1-4.
[6] DAI Wenyuan, XUE Guirong, YANG Qiang, et al. Transferring naive Bayes classifiers for text classification[C]//Proceedings of the 22nd National Conference on Artificial Intelligence. Vancourver, British Columbia, Canada, 2007, 1: 540-545.
[7] LIAO Xuejun, XUE Ya, CARIN L. Logistic regression with an auxiliary data source[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, NY, USA, 2005: 505-512.
[8] DAI Wenyuan, YANG Qiang, XUE Guirong, et al. Boosting for transfer learning[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, Oregon, USA, 2007: 193-200.
[9] LUO Ping, ZHUANG Fuzhen, XIONG Hui, et al. Transfer learning from multiple source domains via consensus regularization[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. Napa Valley, California, USA, 2008: 103-112.
[10] DUAN Lixin, TSANG I W, XU Dong, et al. Domain adaptation from multiple sources via auxiliary classifiers[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada,, 2009: 289-296.
[11] 蒋亦樟, 邓赵红, 王骏, 等. 基于知识利用的迁移学习一般化增强模糊划分聚类算法[J]. 模式识别与人工智能, 2013, 26(10): 975-984. JIANG Yizhang, DENG Zhaohong, WANG Jun, et al. Transfer generalized fuzzy c-means clustering algorithm with improved fuzzy partitions by leveraging knowledge[J]. Pattern recognition and artificial intelligence, 2013, 26(10): 975-984.
[12] JIANG Wenhao, CHUNG F L. Transfer spectral clustering[M]//FLACH P A, DE BIE T, CRISTIANINI N. Machine learning and knowledge discovery in databases: lecture notes in computer science. Berlin Heidelberg: Springer, 2012, 7524: 789-803.
[13] 李昆仑, 曹铮, 曹丽苹, 等. 半监督聚类的若干新进展[J]. 模式识别与人工智能, 2009, 22(5): 735-742. LI Kunlun, CAO Zheng, CAO Liping, et al. Some developments on semi-supervised clustering[J]. Pattern recognition and artificial intelligence, 2009, 22(5): 735-742.
[14] PAL N R, PAL K, BEZDEK J C. A mixed c-means clustering model[C]//Proceedings of the 6th IEEE International Conference on Fuzzy Systems. Barcelona, Spain, 1997, 1: 11-21.
[15] BEZDEK J C, EHRLICH R, FULL W. FCM: The fuzzy c-means clustering algorithm[J]. Computers and geosciences, 1984, 10(2-3): 191-203.
[16] KRISHNAPURAM R, KELLER J M. The possibilistic C-means algorithm: insights and recommendations[J]. IEEE transactions on fuzzy systems, 1996, 4(3): 385-393.
[17] PEDRYCZ W. Algorithms of fuzzy clustering with partial supervision[J]. Pattern recognition letters, 1985, 3(1): 13-20.
[18] GU Quanquan, ZHOU Jie. Learning the shared subspace for multi-task clustering and transductive transfer classification[C]//Proceedings of the 2009 9th IEEE international conference on data mining. Miami, Florida, USA, 2009: 159-168.
[19] 杨燕, 靳蕃, KAME M. 聚类有效性评价综述[J]. 计算机应用研究, 2008, 25(6): 1630-1632, 1638. YANG Yan, JIN Fan, KAME M. Survey of clustering validity evaluation[J]. Application research of computers, 2008, 25(6): 1630-1632, 1638.
[20] GU Quanquan, ZHOU Jie. Co-clustering on manifolds[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 2009: 359-368.
[21] CAI Deng, HE Xiaofei, HAN Jiawei. Locally consistent concept factorization for document clustering[J]. IEEE transactions on knowledge and data engineering, 2011, 23(6): 902-913.

相似文献/References:

[1]杨小兵,何灵敏,孔繁胜.切换回归模型的抗噪音聚类算法[J].智能系统学报编辑部,2009,4(06):497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
 YANG Xiao-bing,HE Ling-min,KONG Fan-sheng.A noise-resistant clustering algorithm for switching regression models[J].CAAI Transactions on Intelligent Systems,2009,4(3):497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
[2]季瑞瑞,刘 丁.支持向量数据描述的基因表达数据聚类方法[J].智能系统学报编辑部,2009,4(06):544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
 JI Rui-rui,LIU Ding.Improved gene expression data clustering using a support vector domain description algorithm[J].CAAI Transactions on Intelligent Systems,2009,4(3):544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
[3]张秀玲,逄宗鹏,李少清,等.ANFIS的板形控制动态影响矩阵方法[J].智能系统学报编辑部,2010,5(04):360.
 ZHANG Xiu-ling,PANG Zong-peng,LI Shao-qing,et al.A dynamic influence matrix method for flatness control based on adaptivenetworkbased fuzzy inference systems[J].CAAI Transactions on Intelligent Systems,2010,5(3):360.
[4]李伟,杨晓峰,张重阳,等.复杂网络社团的投影聚类划分[J].智能系统学报编辑部,2011,6(01):57.
 LI Wei,YANG Xiaofeng,ZHANG Chongyang,et al.A clustering method for community detection on complex networks[J].CAAI Transactions on Intelligent Systems,2011,6(3):57.
[5]陈岳峰,苗夺谦,李文,等.基于概念的词汇情感倾向识别方法[J].智能系统学报编辑部,2011,6(06):489.
 CHEN Yuefeng,MIAO Duoqian,LI Wen,et al.Semantic orientation computing based on concepts[J].CAAI Transactions on Intelligent Systems,2011,6(3):489.
[6]方然,苗夺谦,张志飞.一种基于情感的中文微博话题检测方法[J].智能系统学报编辑部,2013,8(03):208.
 FANG Ran,MIAO Duoqian,ZHANG Zhifei.An emotion-based method of topic detection from Chinese microblogs[J].CAAI Transactions on Intelligent Systems,2013,8(3):208.
[7]刘恋,常冬霞,邓勇.动态小生境人工鱼群算法的图像分割[J].智能系统学报编辑部,2015,10(5):669.[doi:10.11992/tis.201501001]
 LIU lian,CHANG Dongxia,DENG Yong.An image segmentation method based on dynamic niche artificial fish-swarm algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(3):669.[doi:10.11992/tis.201501001]
[8]刘贝贝,马儒宁,丁军娣.基于密度的统计合并聚类算法[J].智能系统学报编辑部,2015,10(5):712.[doi:10.11992/tis.201410028]
 LIU Beibei,MA Runing,DING Jundi.Density-based statistical merging clustering algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(3):712.[doi:10.11992/tis.201410028]
[9]朱书伟,周治平,张道文.融合并行混沌萤火虫算法的K-调和均值聚类[J].智能系统学报编辑部,2015,10(6):872.[doi:10.11992/tis.201505043]
 ZHU Shuwei,ZHOU Zhiping,ZHANG Daowen.K-harmonic means clustering merged with parallel chaotic firefly algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(3):872.[doi:10.11992/tis.201505043]
[10]王晓初,包芳,王士同,等.基于最小最大概率机的迁移学习分类算法[J].智能系统学报编辑部,2016,11(1):84.[doi:10.11992/tis.201505024]
 WANG Xiaochu,BAO Fang,WANG Shitong,et al.Transfer learning classification algorithms based on minimax probability machine[J].CAAI Transactions on Intelligent Systems,2016,11(3):84.[doi:10.11992/tis.201505024]

备注/Memo

备注/Memo:
收稿日期:2016-3-19;改回日期:。
基金项目:国家自然科学基金项目(61170111,61572407,61134002);四川省科技支撑计划项目(2014SZ0207).
作者简介:王跃,男,1990年生,硕士研究生,主要研究方向为数据挖掘、计算智能。杨燕,女,1964年生,教授,博士生导师,主要研究方向为计算智能、数据挖掘、集成学习。主持国家自然科学基金项目3项,国家科技支撑计划项目1项,发表学术论文130余篇。王红军,男,1977年生,副研究员,主要研究方向为机器学习、深度学习、半监督学习。主持完成国家自然科学青年基金项目1项,主持国家自然科学基金项目2项,发表学术论文30余篇。
通讯作者:杨燕.E-mail:yyang@swjtu.edu.cn.
更新日期/Last Update: 1900-01-01