[1]李建元,周脚根,关佶红,等.谱图聚类算法研究进展[J].智能系统学报,2011,6(05):405-414.
 LI Jianyuan,ZHOU Jiaogen,GUAN Jihong,et al.A survey of clustering algorithms based on spectra of graphs[J].CAAI Transactions on Intelligent Systems,2011,6(05):405-414.
点击复制

谱图聚类算法研究进展(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第6卷
期数:
2011年05期
页码:
405-414
栏目:
出版日期:
2011-10-30

文章信息/Info

Title:
A survey of clustering algorithms based on spectra of graphs
文章编号:
1673-4785(2011)05-0405-10
作者:
李建元1周脚根2关佶红1周水庚3
1.同济大学 计算机科学与技术系,上海 201804;
2.上海市农业科学院 数字农业与工程技术研究中心,上海 201106;
3.复旦大学 上海市智能信息处理重点实验室,上海200433
Author(s):
LI Jianyuan1 ZHOU Jiaogen2 GUAN Jihong1 ZHOU Shuigeng3
1.Department of Computer Science & Technology, Tongji University, Shanghai 201804, China;
2.Center of Information Technology in Agriculture, Shanghai Academy of Agricultural Sciences, Shanghai 201106, China;
3.Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China
关键词:
谱图聚类图割目标函数谱宽松方法相似图构建半监督学习
Keywords:
spectral clustering graphcut objectives method of spectral relaxation construction of similarity graphs semisupervised learning
分类号:
TP301.6
文献标志码:
A
摘要:
近10多年来,关于谱图聚类的研究成果非常丰富,为了总结和理清这些工作之间的脉络关系,揭示最新的研究趋势,回顾和比较了典型的图割目标函数,以及这些目标函数的谱宽松解决方法,总结了谱聚类算法的本质.另外,讨论了谱图聚类的几个关键问题:相似图的构建方法、复杂性与扩充性、簇数估计、半监督谱学习等.最后,展望了谱图聚类算法的主要研究趋势,如探寻其理论解释,构建更贴切的相似图,通过学习筛选特征,应用实例化等.
Abstract:
Over the past decade, a huge amount of research has covered the clustering algorithms that are based on the spectra of graphs. It is essential to analyze the relationships among those works so as to reveal the research tendencies. In this paper, the typical works on topics ranging from cost functions to spectral relaxation solutions were investigated and compared in an effort to clearly reveal the essence of these algorithms. Furthermore, the focus was concentrated on several crucial technical issues, including the construction of similarity graphs, the estimation of the clusters’ number, the complexity and scalability, and semisupervised spectral learning. Finally, some open issues were highlighted for future studies, e.〖KG-*1/3〗g.〖KG-*1/3〗, finding more theoretical interpretations of spectral clustering, constructing better similarity graphs, selecting features via learning, and the instantiations of concrete fields.

参考文献/References:

[1]LLOYD S P. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129137. 
[2]DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society—Series B: Statistical Methodology, 1977, 39(1): 138.
[3]NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]//PIETTERICH T G, BECKER S, GHAHRAMANI Z. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001: 849856.
[4]RAHIMI A, RECHT B. Clustering with normalized cuts is clustering with a hyperplane[C]//Statistical Learning in Computer Vision Workshop in ECCV 2004. Prague, Czech Republic, 2004: 112.
[5]SHI J B, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888905.
[6]MALIK J, BELONGIE S, LEUNG T, et al. Contour and texture analysis for image segmentation[J]. International Journal of Computer Vision, 2001, 43(1): 727.
[7]ZHANG X R, JIAO L C, LIU F. Spectral clustering ensemble applied to SAR image segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2008, 46(7): 21262136.
[8]陶文兵,金海.一种新的基于图谱理论的图像阈值分割方法[J].计算机学报, 2007, 30(1): 110119.
 TAO Wenbing, JIN Hai. A new image thresholding method based on graph spectral theory[J]. Journal of Computers, 2007, 30(1): 110119.
[9]ALPERT C J, KAHNG A B. Multiway partitioning via geometric embeddings, orderings and dynamic programming[J]. IEEE Transactions on ComputerAaided Design of Integrated Circuits and Systems, 1995, 14(11): 13421358.
[10]DRIESSCHE R V, ROOSE D. An improved spectral bisection algorithm and its application to dynamic load balancing[J]. Parallel Computing, 1995, 21(1): 2948.
[11]HENDRICKSON B, LELAND R. An improved spectral graph partitioning algorithm for mapping parallel computations[J]. SIAM Journal on Scientific Computing, 1995, 16(2): 452459.
[12]CRISTIANINI N, SHAWETAYLOR J, KANDOLA J. Spectral kernel methods for clustering[C]//Proceedings of the Neural Information Processing Systems. Vancouver, Canada, 2001: 649655.
[13]KLUGER Y, BASRI R, CHANG J T, et al. Spectral biclustering of microarray data: coclustering genes and conditions[J]. Genome Research, 2003, 13(4): 703716.
[14]KULIS B, BASU S, DHILLON I S, et al. Semisupervised graph clustering: a kernel approach[C]//International Conference on Machine Learning. New York, USA: ACM Press, 2005: 457464.
[15]PACCANARO A, CHENNUBHOTLA C, CASBON J A. Spectral clustering of protein sequences[J]. Nucleic Acids Research, 2006, 34(5): 15711580.
[16]DHILLON I S. Coclustering documents and words using bipartite spectral graph partitioning[C]//Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2001: 269274.
[17]谢永康,周雅倩,黄萱菁.一种基于谱聚类的共指消解方法[J].中文信息学报, 2009, 23(3): 1016.
XIE Yongkang, ZHOU Yaqian, HUANG Xuanjing. A spectral clustering based coreference resolution method[J]. Journal of Chinese Information Processing, 2009, 23(3): 1016.
[18]ALPERT C J, YAO S Z. Spectral partitioning: the more eigenvectors, the better[C]//Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference. New York, USA: ACM, 1995: 195200.
[19]ZELNIKMANOR L, PERONA P. Selftuning spectral clustering[C]//Neural Information Processing Systems. Vancouver, Canada, 2004, 2: 16011608.
[20]NING Huazhong, XU Wei, CHI Yun, et al. Incremental spectral clustering with application to monitoring of evolving blog communities[C]//Proceedings of the SIAM International Conference on Data Mining. Minneapolis, USA, 2007: 261272.
[21]ALZATE C, SUYKENS J A K. Multiway spectral clustering with outofsample extensions through weighted kernel PCA[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(2): 335347.
[22]YU S X, SHI J B. Grouping with bias[C]//The Fifteenth Annual Conference on Neural Information Processing Systems. Vancouver, Canada, 2001: 13271334.
[23]KANNAN R, VEMPALA S, VETTA A. On clusterings: good, bad, and spectral[J]. Journal of the ACM, 2004, 51(3): 497515.
[24]SONG Yangqiu, CHEN Wenyen, BAI Hongjie, et al. Parallel spectral clustering[C]//European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Antwerp, Belgium, 2008: 374389.
[25]VERMA D, MEILA M. A comparison of spectral clustering algorithms, Technical Report UWCSE030501[R]. Seattle, USA: Department of CSE, University of Washington, 2003.
[26]LUXBURG U, BELKIN M, BOUSQUET O. Consistency of spectral clustering[J]. Annals of Statistics, 2008, 36(2): 555586.
[27]FILIPPONE M, CAMASTRA F, MASULLI F, et al. A survey of kernel and spectral methods for clustering[J]. Pattern Recognition, 2008, 41(1): 176190.
[28]蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学, 2008, 35(7): 1417.
CAI Xiaoyan, DAI Guanzhong, YANG Libin. Survey on spectral clustering algorithms[J]. Computer Science, 2008, 35(7): 1417.
[29]DIESTEL R. Graph theory[M]. 4th ed. Heidelberg, Germany: SpringerVerlag, 2010.
[30]FIEDLER M. Algebraic connectivity of graphs[J]. Czechoslovak Mathematical Journal, 1973, 23(98): 298305.
[31]DONATH W E, HOFFMAN A J. Lower bounds for the partitioning of graphs[J]. IBM Journal of Research and Development, 1973, 17(5): 420425.
[32]BAMES E R. An algorithm for partitioning the nodes of a graph[J]. SIAM Journal on Algebraic and Discrete Methods, 1982, 17(5): 541550.
[33]DONATH W E. Logic partitioning[M]//PREAS B T, LORENZETTI M J. Physical Design Automation of VLSI Systems. [S.l.]: Benjamin/Cummings Pub Co, 1988: 6586.
[34]BIGGS N L. Algebraic graph theory[M]. Cambridge, USA: Cambridge University Press, 1974.
[35]BROUWER A E, HAEMERS W H. Spectra of graphs[EB/OL]. [20101005]. http://homepages.cwi.nl/~aeb/math/ipm.pdf.
[36]CHUNG F. Spectral graph theory[EB/OL]. [2010-10-05]. http://www.ams.org/mathscinetgetitem?mr=1421568.
[37]MOHAR B. The Laplacian spectrum of graphs[M]//ALAVI Y, CHARTRAND G, OELLERMANN O R, et al. Graph Theory, Combinatorics, and Applications. [S.l.]: Wiley, 1991, 2: 871898.
[38]MOHAR B. Some applications of Laplace eigenvalues of graphs[J]. Graph Symmetry: Algebraic Methods and Applications, 1997, 497(22): 227275.
[39]LUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395416.
[40]WEI Y C, CHENG C K. Toward efficient hierarchical designs by ratio cut partitioning[C]//IEEE International Conference on CAD. New York, USA, 1989: 298301.
[41]BARNARD S, POTHEN A, SIMON H.A spectral algorithm for envelope reduction of sparse matrices[J]. Numerical Linear Algebra with Applications, 1995, 2(4): 317334.
[42]GUATTERY S, MILLER G L. On the quality of spectral separators[J]. SIAM Journal on Matrix Analysis and Applications, 1998, 19(3): 701719.
[43]WEISS Y. Segmentation using eigenvectors: a unifying view[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Washington, DC, USA: IEEE Computer Society, 1999: 975982.
[44]HIGHAM D, KIBBLE M. A unified view of spectral clustering[EB/OL]. [2010-10-05]. http://meyer.math.ncsu.edu/Meyer/Courses/Selee591RPresentation.pdf, 2007.
[45]MEILA M, SHI J B. Learning segmentation by random walks[C]//LEEN T K, DIETTERICH T G, TRESP V. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001: 873879.
[46]NEWMAN M E J. Finding community structure in networks using the eigenvectors of matrices[J]. Physical Review E, 2006, 74(3): 036104.
[47]NEWMAN M E J. Modularity and community structure in networks[J]. Proceedings of the National Academy of Sciences of the United States, 2006, 103(23): 85778582.
[48]NEWMAN M E J. Analysis of weighted networks[J]. Physical Review E, 2004, 70(5): 056131.
[49]LEICHT E A, NEWMAN M E J. Community structure in directed networks[J]. Physical Review Letters, 2008, 100(11): 118703.
[50]ZAHN C T. Graphtheoretic methods for detecting and describing gestalt clusters[J]. IEEE Transactions on Computers, 1971, 20(1): 6886.
[51]URQUHART R. Graph theoretical clustering based on limited neighborhood sets[J]. Pattern Recognition, 1982, 15(3): 173187.
[52]WAGNER D, WAGNER F. Between mincut and graph bisection[J]. Lecture Notes in Computer Science, 1993, 711: 744750.
[53]WU Z, LEAHY R. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993, 15(11): 11011113.
[54]CHAN P K, SCHLAG M D F, ZIEN J Y. Spectral kway ratiocut partitioning and clustering[J]. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 1994, 13(9): 10881096.
[55]WEI Y C, CHENG C K. A twolevel twoway partitioning algorithm[C]//IEEE International Conference on CAD. Santa Clara, USA, 1990: 516519.
[56]HAGEN L, ANDREW B K. New spectral methods for ratio cut partitioning and clustering[J]. IEEE Transactions on ComputerAided Design of Intergrated Circuits and Systems, 1992, 11(9): 10741085.
[57]YU S, SHI J B. Multiclass spectral clustering[C]//Proceedings of the Ninth IEEE International Conference on Computer Vision. Nice, France, 2003, 2: 313319.
[58]DING C, HE X, ZHA H, et al. A minmax cut algorithm for graph partitioning and data clustering[C]//Proceedings of the 2001 IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2001: 107114.
[59]SARKAR S, SOUNDARARAJAN P. Supervised learning of large perceptual organization: graph spectral partitioning and learning automata[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(5): 504525.
[60]ZHA Hongyuan, HE Xiaofeng, DING C H Q, et al. Spectral relaxation for kmeans clustering[C]//DIETTERICH T G, BECKER S, GHAHRAMANI Z. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002, 14: 10571064.
[61]WU Z, LEAHY R. Tissue classification in MR images using hierarchical segmentation[C]//1990 IEEE Nuclear Science Symposium Conference Record, Including Sessions on Nuclear Power Systems and Medical Imaging Conference. Piscataway, USA: IEEE Service Center, 1990: 14101414.
[62]FORD L R, FULKERSON D R. Flows in networks[M]. Princeton, USA: Princeton University Press, 1962.
[63]DHILLON I S, KULIS Y B. A unified view of kernel kmeans, spectral clustering and graph partitioning, UTCS Technical Report #TR0425[R]. Austin, USA: Department of Computer Science, The University of Texas at Austin, 2005.
[64]DHILLON I S, GUAN Y, KULIS B. Kernel kmeans: spectral clustering and normalized cuts[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2004: 551556.
[65]JEBARA T, WANG J, CHANG S. Graph construction and bmatching for semisupervised learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. New York, USA: ACM, 2009: 441448.
[66]DAITCH S I, KELNER J A, SPIELMAN D A. Fitting a graph to vector data[C]// Proceedings of the 26th Annual International Conference on Machine Learning. New York, USA: ACM, 2009: 201208.
[67]XUE F, KUMAR P R. The number of neighbors needed for connectivity of wireless networks[J]. Wireless Networks, 2004, 10(2): 169181.
[68]BALISTER P, BOLLOBAS B, SARKAR A, et al. Connectivity of random knearestneighbour graphs[J]. Advances in Applied Probability, 2005, 37(1): 124. 
[69]BRITO M R, CHFIVEZ E L, QUIROZ A J, et al. Connectivity of the mutual knearestneighbor graph in clustering and outlier detection[J]. Statistics & Probability Letters, 1997, 35(1): 3342. 
[70]POLITO M, PERONA P. Grouping and dimensionality reduction by locally linear embedding[C]// DIETTERICH T G, BECKER S, GHAHRAMANI Z. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001: 12551262.
[71]田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学E辑:信息科学, 2007, 37(4): 527543.
 TIAN Zheng, LI Xiaobin, JU Yanwei. Perturbation analysis of spectral clustering[J]. Science in China Series E: Technological Sciences, 2007, 37(4): 527543.
[72]HERNANDEZ V, ROMAN J, TOMAS A, et al. A survey of software for sparse eigenvalue problems[EB/OL]. [20101005]. http://www.grycap.upv.es/slepc/documentation/reports/str6.pdf.
[73]FOWLKES C, BELONGIE S, CHUNG F, et al. Spectral grouping using the Nystrom method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214225. 
[74]DHILLON I S, GUAN Y, KULIS B. Weighted graph cuts without eigenvectors: a multilevel approach[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(11): 19441957.
[75]YAN D H, HUANG L, JORDAN M I. Fast approximate spectral clustering[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2009: 907915.
[76]MASCHHOFF K J, SORENSEN D C. A portable implementation of ARPACK for distributed memory parallel architectures[EB/OL]. [20101005]. http://www.caam.rice.edu/_kristyn/parpack home.html.
[77]BACH F R, JORDAN M I. Learning spectral clustering, Technical Report No.UCB/CSD031249[R]. Berkeley, USA: Computer Science Division, University of California, 2003.
[78]王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报, 2007, 18 (10): 24122422.
WANG Ling, BO Liefeng, JIAO Licheng. Densitysensitive semisupervised spectral clustering[J]. Journal of Software, 2007, 18(10): 24122422. 
[79]KAMVAR S D, KLEIN D, MANNING C D. Spectral learning[C]//Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003: 561566.
[80]FISCHER I, POLAND I. New methods for spectral clustering, Technical Report No.IDSIA1204[R]. Manno, Switzerland: Dalle Molle Institute for Artificial Intelligence, 2004.
[81]ROWEIS S, SAUL L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5550): 23232326.
[82] CHENG B, YANG J, YAN S, et al. Learning with L1graph for image analysis[J]. IEEE Transactions on Image Processing, 2010, 19(4): 858866.

备注/Memo

备注/Memo:
收稿日期:2010-10-12.
基金项目:国家自然科学基金资助项目(60873040).
通信作者:李建元.E-mail:lijy79@gmail.com.
作者简介:
李建元,男,1979年生,讲师,博士研究生,CCF及ACM学生会员,主要研究方向为数据挖掘、机器学习、遥感与GIS等.
周脚根,男,1978年生,副研究员,博士,主要研究方向为空间数据挖掘和空间统计等.
关佶红,女,1969年生,教授,博士生导师,博士,主要研究方向为分布计算、数据库、数据挖掘、生物信息学等. 
周水庚,男,1966年生,教授,博士生导师,博士,主要研究方向为网络数据管理与搜索、海量数据挖掘与学习、生物信息学等.
更新日期/Last Update: 2011-11-16