[1]张美琴,白亮,王俊斌.基于加权聚类集成的标签传播算法[J].智能系统学报,2018,13(6):994-998.[doi:10.11992/tis.201806011]
ZHANG Meiqin,BAI Liang,WANG Junbin.Label propagation algorithm based on weighted clustering ensemble[J].CAAI Transactions on Intelligent Systems,2018,13(6):994-998.[doi:10.11992/tis.201806011]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
13
期数:
2018年第6期
页码:
994-998
栏目:
学术论文—机器学习
出版日期:
2018-10-25
- Title:
-
Label propagation algorithm based on weighted clustering ensemble
- 作者:
-
张美琴1, 白亮2, 王俊斌1
-
1. 山西大学 计算机与信息技术学院, 山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
- Author(s):
-
ZHANG Meiqin1, BAI Liang2, WANG Junbin1
-
1. College of Computer Science and Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Symbol Computation and Knowledge Engineering(Shanxi University), Ministry of Education, Taiyuan 030006, China
-
- 关键词:
-
数据挖掘; 网络数据; 社区发现; 标签传播算法; 聚类集成; 基聚类; 模块度; 加权度量
- Keywords:
-
data mining; network data; community detection; label propagation algorithm; clustering ensemble; base clustering; modularity measure; weighted measure
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201806011
- 摘要:
-
标签传播算法(LPA)是一种高效地处理大规模网络的社区发现算法,由于其近乎线性的时间复杂度而受到广泛关注。然而,该算法每个节点的标签依赖于其邻居节点,其迭代速度和聚类有效性对标签信息的更新顺序非常敏感,影响了社区发现结果的准确性和稳定性。基于该问题,提出了一种基于加权聚类集成的标签传播算法。该算法利用多次标签传播算法的结果作为基聚类集,并用模块度评估每个基聚类的重要性,使其作为节点相似性度量的权值形成加权相似性矩阵,最后通过层次聚类得出最终的社区划分结果。在实验分析中,该算法和其他5个具有代表性的标签传播算法的改进算法在真实数据集上进行了比较,展示了新算法能有效地提高标签传播算法的社区发现精度。
- Abstract:
-
Label propagation algorithm (LPA) is one of the high-efficiency community detection algorithms for processing large-scale network data. It has attracted much attention because of its nearly linear time complexity with the number of nodes. However, in the algorithm, the label of each node depends on the labels of its neighbor nodes, which makes the iteration speed and clustering performance of the algorithm very sensitive to the order of label information update; this influences the accuracy and stability of the community detection result. To solve this problem, a new LPA is proposed based on weighted clustering ensemble. The new algorithm runs the LPAs many times to obtain several partition results, which can be regarded as a base clustering set. Furthermore, the modularity measure is used to evaluate the importance of each clustering. Based on the evaluation results, a weighted similarity measure is defined between nodes to obtain a weighted similarity matrix of pairwise nodes. Finally, hierarchical clustering on the similarity matrix is used to obtain a final community division result. In the experimental analysis, the new algorithm is compared with several other improved LPAs on five real representative network datasets. The experimental results show that the new algorithm is more effective for improving the community detection accuracy.
备注/Memo
收稿日期:2018-06-04。
基金项目:国家自然科学基金项目(61773247).
作者简介:张美琴,女,1992年生,硕士研究生,主要研究方向为社区检测;白亮,男,1982年生,副教授,博士,主要研究方向为数据挖掘、机器学习;王俊斌,男,1994年生,硕士研究生,主要研究方向为数据挖掘。
通讯作者:张美琴.E-mail:landian.zhang@qq.com
更新日期/Last Update:
2018-12-25