[1]王跃,杨燕,王红军.一种基于少量标签的改进迁移模糊聚类[J].智能系统学报编辑部,2016,11(3):310-317.[doi:10.11992/tis.201603046]
WANG Yue,YANG Yan,WANG Hongjun.An improved transfer fuzzy clustering with few labels[J].CAAI Transactions on Intelligent Systems,2016,11(3):310-317.[doi:10.11992/tis.201603046]
点击复制
《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷:
11
期数:
2016年第3期
页码:
310-317
栏目:
学术论文—机器学习
出版日期:
2016-06-25
- Title:
-
An improved transfer fuzzy clustering with few labels
- 作者:
-
王跃, 杨燕, 王红军
-
西南交通大学 信息科学与技术学院, 四川 成都 610031
- Author(s):
-
WANG Yue, YANG Yan, WANG Hongjun
-
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
-
- 关键词:
-
聚类; 迁移学习; 半监督; 可能性C均值; 模糊C均值
- Keywords:
-
clustering; transfer learning; semi-supervised; possibilistic C-means; fuzzy C-means
- 分类号:
-
TP301
- DOI:
-
10.11992/tis.201603046
- 摘要:
-
传统聚类算法难以利用已有的历史信息,尤其是数据被污染的情况下聚类结果不理想;半监督聚类常用于数据中有部分标签的情况。在源数据有少量标签的情况下,提出半监督混合C均值聚类算法(SS-FPCM);基于迁移学习框架,针对负迁移问题对算法进行修正,提出了防止负迁移的半监督迁移算法(TSS-FPCM);最后,为了充分借鉴源数据的信息,利用“代表点”来代替源数据类信息,融入算法中再次迁移得到改善的半监督迁移算法(ITSS-FPCM)。实验表明,3个算法能够有效的利用源数据提高聚类性能。SS-FPCM与TSS-FPCM可以利用源数据的少量标签数据,而ITSS-FPCM算法结合了标签数据与“代表点”两个有效信息,在数据信息匮乏、数据被污染的情况下得到较好的聚类结果。
- Abstract:
-
In the traditional clustering algorithm, it is difficult to utilize existing historical information, which tends to be less effective in cases in which the data is contaminated. The semi-supervised clustering algorithm is often used in such circumstances, wherein the target data has some labeled examples. For situations in which the source data has partially labeled samples, in this paper, we propose a semi-supervised fuzzy possibilistic C-means algorithm (SS-FPCM). Based on the transfer learning framework, we use a transfer semi-supervised fuzzy possibilistic C-means algorithm (TSS-FPCM) to avoid the negative transfer learning problem. Finally, in order to make full use of source data information, we use representative points to replace the source data class. Thus, we have developed an improved transfer semi-supervised fuzzy possibilistic C-means algorithm (ITSS-FPCM). The experimental results demonstrate that these three algorithms may be used to improve the clustering performance by using source data effectively, as compared with other clustering algorithms. Moreover, the SS-FPCM and TSS-FPCM algorithms exploit partially labeled data from the source, while the ITSS-FPCM algorithm combines the labeled data and "representative points," for cases having insufficient data information or contaminated data, and an excellent clustering result is attained.
备注/Memo
收稿日期:2016-3-19;改回日期:。
基金项目:国家自然科学基金项目(61170111,61572407,61134002);四川省科技支撑计划项目(2014SZ0207).
作者简介:王跃,男,1990年生,硕士研究生,主要研究方向为数据挖掘、计算智能。杨燕,女,1964年生,教授,博士生导师,主要研究方向为计算智能、数据挖掘、集成学习。主持国家自然科学基金项目3项,国家科技支撑计划项目1项,发表学术论文130余篇。王红军,男,1977年生,副研究员,主要研究方向为机器学习、深度学习、半监督学习。主持完成国家自然科学青年基金项目1项,主持国家自然科学基金项目2项,发表学术论文30余篇。
通讯作者:杨燕.E-mail:yyang@swjtu.edu.cn.
更新日期/Last Update:
1900-01-01