[1]程旸,王士同.基于局部保留投影的多可选聚类发掘算法[J].智能系统学报,2016,(5):600-607.[doi:10.11992/tis.201508022]
 CHENG Yang,WANG Shitong.A multiple alternative clusterings mining algorithm using locality preserving projections[J].CAAI Transactions on Intelligent Systems,2016,(5):600-607.[doi:10.11992/tis.201508022]
点击复制

基于局部保留投影的多可选聚类发掘算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2016年5期
页码:
600-607
栏目:
出版日期:
2016-11-01

文章信息/Info

Title:
A multiple alternative clusterings mining algorithm using locality preserving projections
作者:
程旸 王士同
江南大学 数字媒体学院, 江苏 无锡 214122
Author(s):
CHENG Yang WANG Shitong
School of Digit Media, Jiangnan University, Wuxi 214122, China
关键词:
可供选择的聚类结果无监督学习流形学习多聚类特征分解
Keywords:
alternative clusteringunsupervised learningmanifold learningmultiple clusteringseigendecomposition
分类号:
TP18
DOI:
10.11992/tis.201508022
摘要:
绝大多数的聚类分析算法仅能得到单一的聚类结果,考虑到数据的复杂程度普遍较高,以及看待数据的视角不同,所得到的聚类结果在保证其合理性的基础上应当是不唯一的,针对此问题,提出了一个新的算法RLPP,用于发掘多种可供选择的聚类结果。RLPP的目标函数兼顾了聚类质量和相异性两大要素,采用子空间流形学习技术,通过新的子空间不断生成多种互不相同的聚类结果。RLPP同时适用于线性以及非线性的数据集。实验表明,RLPP成功地发掘了多种可供选择的聚类结果,其性能相当或优于现有的算法。
Abstract:
Most clustering algorithms typically find just one single result for the data inputted. Considering that the complexity of the data is generally high, combined with the need to allow the data to be viewed from different perspectives (on the basis of ensuring reasonableness), means that clustering results are often not unique. We present a new algorithm RLPP for an alternative clustering generation method. The objective of RLPP is to find a balance between clustering quality and dissimilarity using a subspace manifold learning technique in a new subspace so that a variety of clustering results can be generated. Experimental results using both linear and nonlinear datasets show that RLPP successfully provides a variety of alternative clustering results, and is able to outperform or at least match a range of existing methods.

参考文献/References:

[1] DANG Xuanhong, BAILEY J. Generating multiple alternative clusterings via globally optimal subspaces[J]. Data mining and knowledge discovery, 2014, 28(3):569-592.
[2] GRETTON A, BOUSQUET O, SMOLA A, et al. Measuring statistical dependence with Hilbert-Schmidt norms[M]//JAIN S, SIMON H U, TOMITA E. Algorithmic Learning Theory. Berlin Heidelberg:Springer, 2005:63-77.
[3] HE Xiaofei, NIYOGI X. Locality preserving projections[C]//Advances in Neural Information Processing Systems. Vancouver, Canada, 2003, 16:153-160.
[4] BAE E, BAILEY J. COALA:a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity[C]//Proceedings of the 6th International Conference on Data Mining. Hong Kong, China, 2006:53-62.
[5] GONDEK D, HOFMANN T. Non-redundant data clustering[J]. Knowledge and information systems, 2007, 12(1):1-24.
[6] JAIN P, MEKA R, DHILLON I S. Simultaneous unsupervised learning of disparate clusterings[J]. Statistical analysis and data mining:the ASA data science journal, 2008, 1(3):195-210.
[7] DANG Xuanhong, BAILEY J. Generation of alternative clusterings using the CAMI approach[C]//Proceedings of the SIAM International Conference on Data Mining, SDM 2010. Columbus, Ohio, USA, 2010, 10:118-129.
[8] DANG Xuanhong, BAILEY J. A hierarchical information theoretic technique for the discovery of non linear alternative clusterings[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, 2010:573-582.
[9] VINH N X, EPPS J. MinCEntropy:a novel information theoretic approach for the generation of alternative clusterings[C]//Proceedings of the IEEE International Conference on Data Mining. Sydney, Australia, 2010:521-530.
[10] COVER T M, THOMAS J A. Elements of information theory[M]. Chichester:John Wiley & Sons, 2012.
[11] KAPUR J N. Measures of information and their applications[M]. New York:Wiley-Interscience, 1994.
[12] PRINCIPE J C, XU D, FISHER J. Information theoretic learning[M]//HAYKIN S. Unsupervised Adaptive Filtering. New York:Wiley, 2000, 1:265-319.
[13] PARZEN E. On estimation of a probability density function and mode[J]. The annals of mathematical statistics, 1962, 33(3):1065-1076.
[14] CUI Ying, FERN X Z, DY J G. Non-redundant multi-view clustering via orthogonalization[C]//Proceedings of the 7th IEEE International Conference on Data Mining. Omaha, Nebraska, USA, 2007:133-142.
[15] DAVIDSON I, QI Zijie. Finding alternative clusterings using constraints[C]//Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy, 2008:773-778.
[16] QI Zijie, DAVIDSON I. A principled and flexible framework for finding alternative clusterings[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 2009:717-726.
[17] DASGUPTA S, NG V. Mining clustering dimensions[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, 2010:263-270.
[18] NIU Donglin, DY J G, JORDAN M I. Multiple non-redundant spectral clustering views[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, 2010:831-838.

相似文献/References:

[1]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,(04):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,(5):607.[doi:10.3969/j.issn.1673-4785.201411036]

备注/Memo

备注/Memo:
收稿日期:2015-08-26。
基金项目:国家自然科学基金项目(61272210).
作者简介:程旸,男,1991年生,硕士研究生,主要研究方向为人工智能与模式识别、数据挖掘;王士同,男,1964年生,教授,博士生导师,中国离散数学学会常务理事,中国机器学习学会常务理事。主要研究方向为人工智能、模式识别和图像处理。发表学术论文近百篇,其中被SCI、EI检索50余篇。
通讯作者:程旸.E-mail:szhchengyang@163.com
更新日期/Last Update: 1900-01-01