[1]张美琴,白亮,王俊斌.基于加权聚类集成的标签传播算法[J].智能系统学报,2018,13(06):994-998.[doi:10.11992/tis.201806011]
 ZHANG Meiqin,BAI Liang,WANG Junbin.Label propagation algorithm based on weighted clustering ensemble[J].CAAI Transactions on Intelligent Systems,2018,13(06):994-998.[doi:10.11992/tis.201806011]
点击复制

基于加权聚类集成的标签传播算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年06期
页码:
994-998
栏目:
出版日期:
2018-10-25

文章信息/Info

Title:
Label propagation algorithm based on weighted clustering ensemble
作者:
张美琴1 白亮2 王俊斌1
1. 山西大学 计算机与信息技术学院, 山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
Author(s):
ZHANG Meiqin1 BAI Liang2 WANG Junbin1
1. College of Computer Science and Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Symbol Computation and Knowledge Engineering(Shanxi University), Ministry of Education, Taiyuan 030006, China
关键词:
数据挖掘网络数据社区发现标签传播算法聚类集成基聚类模块度加权度量
Keywords:
data miningnetwork datacommunity detectionlabel propagation algorithmclustering ensemblebase clusteringmodularity measureweighted measure
分类号:
TP391
DOI:
10.11992/tis.201806011
摘要:
标签传播算法(LPA)是一种高效地处理大规模网络的社区发现算法,由于其近乎线性的时间复杂度而受到广泛关注。然而,该算法每个节点的标签依赖于其邻居节点,其迭代速度和聚类有效性对标签信息的更新顺序非常敏感,影响了社区发现结果的准确性和稳定性。基于该问题,提出了一种基于加权聚类集成的标签传播算法。该算法利用多次标签传播算法的结果作为基聚类集,并用模块度评估每个基聚类的重要性,使其作为节点相似性度量的权值形成加权相似性矩阵,最后通过层次聚类得出最终的社区划分结果。在实验分析中,该算法和其他5个具有代表性的标签传播算法的改进算法在真实数据集上进行了比较,展示了新算法能有效地提高标签传播算法的社区发现精度。
Abstract:
Label propagation algorithm (LPA) is one of the high-efficiency community detection algorithms for processing large-scale network data. It has attracted much attention because of its nearly linear time complexity with the number of nodes. However, in the algorithm, the label of each node depends on the labels of its neighbor nodes, which makes the iteration speed and clustering performance of the algorithm very sensitive to the order of label information update; this influences the accuracy and stability of the community detection result. To solve this problem, a new LPA is proposed based on weighted clustering ensemble. The new algorithm runs the LPAs many times to obtain several partition results, which can be regarded as a base clustering set. Furthermore, the modularity measure is used to evaluate the importance of each clustering. Based on the evaluation results, a weighted similarity measure is defined between nodes to obtain a weighted similarity matrix of pairwise nodes. Finally, hierarchical clustering on the similarity matrix is used to obtain a final community division result. In the experimental analysis, the new algorithm is compared with several other improved LPAs on five real representative network datasets. The experimental results show that the new algorithm is more effective for improving the community detection accuracy.

参考文献/References:

[1] SARKAR P, MOORE A W. Dynamic social network analysis using latent space models[J]. ACM sigkdd explorations newsletter, 2005, 7(2):31-40.
[2] ZARE H, SHOOSHTARI P, GUPTA A, et al. Data reduction for spectral clustering to analyze high throughput flow cytometry data[J]. BMC bioinformatics, 2010, 11:403.
[3] SHI Jianbo, MALIK J. Normalized cuts and image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2000, 22(8):888-905.
[4] DJIDJEV H N, ONUS M. Scalable and accurate graph clustering and community structure detection[J]. IEEE transactions on parallel and distributed systems, 2013, 24(5):1022-1029.
[5] DUCH J, ARENAS A. Community detection in complex networks using extremal optimization[J]. Physical review E, 2005, 72(2):027104.
[6] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review E, 2004, 69(6):066133.
[7] GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the national academy of sciences of the United States of America, 2002, 99(12):7821-7826.
[8] NEWMAN M E J. Modularity and community structure in networks[J]. Proceedings of the national academy of sciences of the United States of America, 2006, 103(23):8577-8582.
[9] TANG Lei, WANG Xufei, LIU Huan. Community detection via heterogeneous interaction analysis[J]. Data mining and knowledge discovery, 2012, 25(1):1-33.
[10] RAGHAVAN U N, ALBERT R, KUMARA S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical review E, 2007, 76(3):036106.
[11] BARBER M J, CLARK J W. Detecting network communities by propagating labels under constraints[J]. Physical review E, 2009, 80(2):026129.
[12] LIU X, MURATA T. Advanced modularity-specialized label propagation algorithm for detecting communities in networks[J]. Physica A:statistical mechanics and its applications, 2010, 389(7):1493-1500.
[13] XIE Jierui, SZYMANSKI B K. Community detection using a neighborhood strength driven label propagation algorithm[C]//Proceedings of 2011 IEEE Network Science Workshop. West Point, NY, USA:IEEE, 2011:188-195.
[14] HE Miao, LENG Mingwei, LI Fan, et al. A node importance based label propagation approach for community detection[M]//SUN Fuchun, LI Tianrui, LI Hongbo. Knowledge Engineering and Management:Proceedings of the Seventh International Conference on Intelligent Systems and Knowledge Engineering, Beijing, China, Dec 2012(ISKE 2012). Berlin Heidelberg:Springer, 2014:249-257.
[15] SUN Heli, LIU Jiao, HUANG Jianbin, et al. CenLP:a centrality-based label propagation algorithm for community detection in networks[J]. Physica A:statistical mechanics and its applications, 2015, 436:767-780.
[16] LI Wei, HUANG Ce, WANG Miao, et al. Stepping community detection algorithm based on label propagation and similarity[J]. Physica A:statistical mechanics and its applications, 2017, 472:145-155.
[17] ?UBELJ L, BAJEC M. Unfolding communities in large complex networks:combining defensive and offensive label propagation for core extraction[J]. Physical review E, 2011, 83(3):036103
[18] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics:theory and experiment, 2008(10):10008.
[19] ELHADARY R S, TOLBA A S, ELSHARKAWY M A, et al. An efficient and robust combined clustering technique for mining in large spatial databases[C]//Proceedings of 2007 International Conference on Computer Engineering and Systems. Cairo, Egypt, 2007:439-445.
[20] FRED A L N, JAIN A K. Data clustering using evidence accumulation[C]//Proceedings of the 16th International Conference on Pattern Recognition. Quebec City, Quebec, Canada, 2002:276-280.
[21] YANG Yan, KAMEL M S. An aggregated clustering approach using multi-ant colonies algorithms[J]. Pattern recognition, 2006, 39(7):1278-1289.

相似文献/References:

[1]张继福,张素兰,胡立华.约束概念格及其构造方法[J].智能系统学报,2006,1(02):31.
 ZHANG Ji-fu,ZHANG Su-lan,HU Li-hua.Constrained concept lattice and its construction method[J].CAAI Transactions on Intelligent Systems,2006,1(06):31.
[2]王国胤,张清华,胡 军.粒计算研究综述[J].智能系统学报,2007,2(06):8.
 WANG Guo-yin,ZHANG Qing-hua,HU Jun.An overview of granular computing[J].CAAI Transactions on Intelligent Systems,2007,2(06):8.
[3]何清.物联网与数据挖掘云服务[J].智能系统学报,2012,7(03):189.
 HE Qing.The Internet of things and the data mining cloud service[J].CAAI Transactions on Intelligent Systems,2012,7(06):189.
[4]李海林,郭韧,万校基.基于特征矩阵的多元时间序列最小距离度量方法[J].智能系统学报,2015,10(03):442.[doi:10.3969/j.issn.1673-4785.201405047]
 LI Hailin,GUO Ren,WAN Xiaoji.A minimum distance measurement method for amultivariate time series based on the feature matrix[J].CAAI Transactions on Intelligent Systems,2015,10(06):442.[doi:10.3969/j.issn.1673-4785.201405047]
[5]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(04):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(06):607.[doi:10.3969/j.issn.1673-4785.201411036]
[6]汤建国,汪江桦,韩莉英,等.基于覆盖粗糙集的语言动力系统[J].智能系统学报,2014,9(02):229.[doi:10.3969/j.issn.1673-4785.201307018]
 TANG Jianguo,WANG Jianghua,HAN Liying,et al.Linguistic dynamic systems based on covering-based rough sets[J].CAAI Transactions on Intelligent Systems,2014,9(06):229.[doi:10.3969/j.issn.1673-4785.201307018]
[7]石磊,杜军平,周亦鹏,等.在线社交网络挖掘与搜索技术研究[J].智能系统学报,2016,11(6):777.[doi:10.11992/tis.201612007]
 SHI Lei,DU Junping,ZHOU Yipeng,et al.A survey on online social network mining and search[J].CAAI Transactions on Intelligent Systems,2016,11(06):777.[doi:10.11992/tis.201612007]
[8]淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法[J].智能系统学报,2017,12(02):229.[doi:10.11992/tis.201512036]
 GAN Wenyan,LIU Chong.An improved clustering algorithm that searches and finds density peaks[J].CAAI Transactions on Intelligent Systems,2017,12(06):229.[doi:10.11992/tis.201512036]
[9]翟俊海,刘博,张素芳.基于粗糙集相对分类信息熵和粒子群优化的特征选择方法[J].智能系统学报,2017,12(03):397.[doi:10.11992/tis.201705004]
 ZHAI Junhai,LIU Bo,ZHANG Sufang.A feature selection approach based on rough set relative classification information entropy and particle swarm optimization[J].CAAI Transactions on Intelligent Systems,2017,12(06):397.[doi:10.11992/tis.201705004]
[10]何明,常盟盟,刘郭洋,等.基于SQL-on-Hadoop查询引擎的日志挖掘及其应用[J].智能系统学报,2017,12(05):717.[doi:10.11992/tis.201706016]
 HE Ming,CHANG Mengmeng,LIU Guoyang,et al.Log mining and application based on sql-on-hadoop query engine[J].CAAI Transactions on Intelligent Systems,2017,12(06):717.[doi:10.11992/tis.201706016]

备注/Memo

备注/Memo:
收稿日期:2018-06-04。
基金项目:国家自然科学基金项目(61773247).
作者简介:张美琴,女,1992年生,硕士研究生,主要研究方向为社区检测;白亮,男,1982年生,副教授,博士,主要研究方向为数据挖掘、机器学习;王俊斌,男,1994年生,硕士研究生,主要研究方向为数据挖掘。
通讯作者:张美琴.E-mail:landian.zhang@qq.com
更新日期/Last Update: 2018-12-25