[1]闵帆,王宏杰,刘福伦,等.SUCE:基于聚类集成的半监督二分类方法[J].智能系统学报,2018,13(06):974-980.[doi:10.11992/tis.201711027]
 MIN Fan,WANG Hongjie,LIU Fulun,et al.SUCE: semi-supervised binary classification based on clustering ensemble[J].CAAI Transactions on Intelligent Systems,2018,13(06):974-980.[doi:10.11992/tis.201711027]
点击复制

SUCE:基于聚类集成的半监督二分类方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年06期
页码:
974-980
栏目:
出版日期:
2018-10-25

文章信息/Info

Title:
SUCE: semi-supervised binary classification based on clustering ensemble
作者:
闵帆 王宏杰 刘福伦 王轩
西南石油大学 计算机科学学院, 四川 成都 610500
Author(s):
MIN Fan WANG Hongjie LIU Fulun WANG Xuan
School of Computer Science, Southwest Petroleum University, Chengdu 610500, China
关键词:
集成学习聚类聚类集成半监督二分类
Keywords:
ensemble learningclusteringclustering ensemblesemi-supervisedbinary classification
分类号:
TP181
DOI:
10.11992/tis.201711027
摘要:
半监督学习和集成学习是目前机器学习领域中的重要方法。半监督学习利用未标记样本,而集成学习综合多个弱学习器,以提高分类精度。针对名词型数据,本文提出一种融合聚类和集成学习的半监督分类方法SUCE。在不同的参数设置下,采用多个聚类算法生成大量的弱学习器;利用已有的类标签信息,对弱学习器进行评价和选择;通过集成弱学习器对测试集进行预分类,并将置信度高的样本放入训练集;利用扩展的训练集,使用ID3、Nave Bayes、 kNN、C4.5、OneR、Logistic等基础算法对其他样本进行分类。在UCI数据集上的实验结果表明,当训练样本较少时,本方法能稳定提高多数基础算法的准确性。
Abstract:
Semi-supervised learning and ensemble learning are important methods in the field of machine learning. Semi-supervised learning utilize unlabeled samples, while ensemble learning combines multiple weak learners to improve classification accuracy. This paper proposes a new method called Semi-sUpervised classification through Clustering and Ensemble learning (SUCE) for symbolic data. Under different parameter settings, a number of weak learners are generated using multiple clustering algorithms. Using existing class label information the weak learners are evaluated and selected. The test sets are pre-classified by weak learners ensemble. The samples with high confidence are moved to the training set, and the other samples are classified through the extended training set by using the basic algorithms such as ID3, Nave Bayes, kNN, C4.5, OneR, Logistic and so on. The experimental on the UCI datasets results show that SUCE can steadily improve the accuracy of most of the basic algorithms when there are fewer training samples.

参考文献/References:

[1] MITCHELL T M. 机器学习[M]. 曾华军, 张银奎, 译. 北京:机械工业出版社, 2003.
[2] ZHU Xiaojin. Semi-supervised learning literature survey[R]. Madison:University of Wisconsin, 2008:63-77.
[3] 张晨光, 张燕. 半监督学习[M]. 北京:中国农业科学技术出版社, 2013.
[4] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016.
[5] NIGAM K, MCCALLUM A K, THRUN S, et al. Text classification from labeled and unlabeled documents using EM[J]. Machine learning, 2000, 39(2/3):103-134.
[6] SONG Yangqiu, ZHANG Changshui, LEE J, et al. Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images[J]. Pattern analysis and applications, 2009, 12(2):99-115.
[7] FENG Wei, XIE Lei, Zeng Jia, et al. Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models[J]. Journal of visual languages and computing, 2009, 20(3):188-195.
[8] SHAHSHAHANI B M, LANDGREBE D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J]. IEEE transactions on geoscience and remote sensing, 1994, 32(5):1087-1095.
[9] 梁吉业, 高嘉伟, 常瑜. 半监督学习研究进展[J]. 山西大学学报:自然科学版, 2009, 32(4):528-534 LIANG Jiye, GAO Jiawei, CHANG Yu. The research and advances on semi-supervised learning[J]. Journal of Shanxi university:natural science edition, 2009, 32(4):528-534
[10] MERZ C J, ST CLAIR D C, BOND W E. Semi-supervised adaptive resonance theory (SMART2)[C]//Proceedings of 1992 International Joint Conference on Neural Networks. Baltimore, USA, 1992:851-856.
[11] VEGA-PONS S, RUIZ-SHULCLOPER J. A survey of clustering ensemble algorithms[J]. International journal of pattern recognition and artificial intelligence, 2011, 25(3):337-372.
[12] 蔡毅, 朱秀芳, 孙章丽, 等. 半监督集成学习综述[J]. 计算机科学, 2017, 44(6A):7-13 CAI Yi, ZHU Xiufang, SUN Zhangli, et al. Semi-supervised and ensemble learning:a review[J]. Computer science, 2017, 44(6A):7-13
[13] 曾令伟, 伍振兴, 杜文才. 基于改进自监督学习群体智能(ISLCI)的高性能聚类算法[J]. 重庆邮电大学学报:自然科学版, 2016, 28(1):131-137 ZENG Lingwei, WU Zhenxing, DU Wencai. Improved Self supervised learning collection intelligence based high performance data clustering approach[J]. Journal of Chongqing university of posts and telecommunications:natural science edition, 2016, 28(1):131-137
[14] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining partitionings[J]. Journal of machine learning research, 2002, 3:583-617.
[15] FRED A L N, JAIN A K. Data clustering using evidence accumulation[C]//Proceedings of the 16th International Conference on Pattern Recognition. Quebec, Canada, 2002:276-280.
[16] ZHOU Zhihua. Ensemble Methods:Foundations and Algorithms[M]. Boca Raton:Taylor and Francis Group, 2012:135-156.
[17] ZHANG Minling, ZHOU Zhihua. Exploiting unlabeled data to enhance ensemble diversity[J]. Data mining and knowledge discovery, 2013, 26(1):98-129.
[18] MIN Fan, HU Qinghua, ZHU W. Feature selection with test cost constraint[J]. International journal of approximate reasoning, 2014, 55(1):167-179.
[19] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[M]//SAMMUT C, WEBB G I. Encyclopedia of Machine Learning. Boston:Springer, 2011.
[20] 罗会兰, 孔繁胜, 李一啸. 聚类集成中的差异性度量研究[J]. 计算机学报, 2007, 30(8):1315-1324 LUO Huilan, KONG Fansheng, LI Yixiao. An analysis of diversity measures in clustering ensembles[J]. Chinese journal of computers, 2007, 30(8):1315-1324
[21] 杨草原, 刘大有, 杨博, 等. 聚类集成方法研究[J]. 计算机科学, 2011, 38(2):166-170 YANG Caoyuan, LIU Dayou, YANG Bo, et al. Research on cluster aggregation approaches[J]. Computer science, 2011, 38(2):166-170
[22] 杨玉梅. 基于信息熵改进的K-means动态聚类算法[J]. 重庆邮电大学学报:自然科学版, 2016, 28(2):254-259 YANG Yumei. Improved K-means dynamic clustering algorithm based on information entropy[J]. Journal of Chongqing university of posts and telecommunications:natural science edition, 2016, 28(2):254-259
[23] JAMSHIDIAN M, JENNRICH R I. Standard errors for EM estimation[J]. Journal of the royal statistical society. series B, 2000, 62(2):257-270.
[24] DEEPSHREE A V, YOGISH H K. Farthest first clustering in links reorganization[J]. International journal of web and semantic technology, 2014, 5(3):17-24.
[25] RASHEDI E, MIRZAEI A. A hierarchical clusterer ensemble method based on boosting theory[J]. Knowledge-based systems, 2013, 45:83-93.

相似文献/References:

[1]杨小兵,何灵敏,孔繁胜.切换回归模型的抗噪音聚类算法[J].智能系统学报,2009,4(06):497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
 YANG Xiao-bing,HE Ling-min,KONG Fan-sheng.A noise-resistant clustering algorithm for switching regression models[J].CAAI Transactions on Intelligent Systems,2009,4(06):497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
[2]季瑞瑞,刘 丁.支持向量数据描述的基因表达数据聚类方法[J].智能系统学报,2009,4(06):544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
 JI Rui-rui,LIU Ding.Improved gene expression data clustering using a support vector domain description algorithm[J].CAAI Transactions on Intelligent Systems,2009,4(06):544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
[3]梁晓娜,于 红,范丽民,等.改进词频分类器集成的文本分类算法[J].智能系统学报,2010,5(02):177.
 LIANG Xiao-na,YU Hong,FAN Li-min,et al.A text classification algorithm that uses an improved term frequency classifier ensemble[J].CAAI Transactions on Intelligent Systems,2010,5(06):177.
[4]张秀玲,逄宗鹏,李少清,等.ANFIS的板形控制动态影响矩阵方法[J].智能系统学报,2010,5(04):360.
 ZHANG Xiu-ling,PANG Zong-peng,LI Shao-qing,et al.A dynamic influence matrix method for flatness control based on adaptivenetworkbased fuzzy inference systems[J].CAAI Transactions on Intelligent Systems,2010,5(06):360.
[5]李伟,杨晓峰,张重阳,等.复杂网络社团的投影聚类划分[J].智能系统学报,2011,6(01):57.
 LI Wei,YANG Xiaofeng,ZHANG Chongyang,et al.A clustering method for community detection on complex networks[J].CAAI Transactions on Intelligent Systems,2011,6(06):57.
[6]陈岳峰,苗夺谦,李文,等.基于概念的词汇情感倾向识别方法[J].智能系统学报,2011,6(06):489.
 CHEN Yuefeng,MIAO Duoqian,LI Wen,et al.Semantic orientation computing based on concepts[J].CAAI Transactions on Intelligent Systems,2011,6(06):489.
[7]方然,苗夺谦,张志飞.一种基于情感的中文微博话题检测方法[J].智能系统学报,2013,8(03):208.
 FANG Ran,MIAO Duoqian,ZHANG Zhifei.An emotion-based method of topic detection from Chinese microblogs[J].CAAI Transactions on Intelligent Systems,2013,8(06):208.
[8]刘恋,常冬霞,邓勇.动态小生境人工鱼群算法的图像分割[J].智能系统学报,2015,10(5):669.[doi:10.11992/tis.201501001]
 LIU lian,CHANG Dongxia,DENG Yong.An image segmentation method based on dynamic niche artificial fish-swarm algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(06):669.[doi:10.11992/tis.201501001]
[9]刘贝贝,马儒宁,丁军娣.基于密度的统计合并聚类算法[J].智能系统学报,2015,10(5):712.[doi:10.11992/tis.201410028]
 LIU Beibei,MA Runing,DING Jundi.Density-based statistical merging clustering algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(06):712.[doi:10.11992/tis.201410028]
[10]朱书伟,周治平,张道文.融合并行混沌萤火虫算法的K-调和均值聚类[J].智能系统学报,2015,10(6):872.[doi:10.11992/tis.201505043]
 ZHU Shuwei,ZHOU Zhiping,ZHANG Daowen.K-harmonic means clustering merged with parallel chaotic firefly algorithm[J].CAAI Transactions on Intelligent Systems,2015,10(06):872.[doi:10.11992/tis.201505043]

备注/Memo

备注/Memo:
收稿日期:2017-11-21。
基金项目:国家自然科学基金项目(61379089).
作者简介:闵帆,男,1973年生,教授,博士生导师,主要研究方向为粒计算、代价敏感学习、推荐系统,主持国家自然科学基金1项。发表学术论文100余篇,被SCI检索30余篇;王宏杰,男,1992年生,硕士研究生,主要研究方向为粒计算、代价敏感学习。发表学术论文7篇,其中被EI检索1篇;刘福伦,男,1993年生,硕士研究生,主要研究方向为代价敏感学习、粗糙集。发表学术论文5篇,其中被SCI检索2篇,被EI检索1篇。
通讯作者:闵帆.E-mail:minfanphd@163.com
更新日期/Last Update: 2018-12-25