[1]赵丹枫,吕闫妍,张文博,等.不确定图上的极大团枚举及高效验证算法[J].智能系统学报,2024,19(6):1539-1551.[doi:10.11992/tis.202304058]
ZHAO Danfeng,LYU Yanyan,ZHANG Wenbo,et al.Maximum clique enumeration and verification algorithm on α uncertain graphs[J].CAAI Transactions on Intelligent Systems,2024,19(6):1539-1551.[doi:10.11992/tis.202304058]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第6期
页码:
1539-1551
栏目:
学术论文—人工智能基础
出版日期:
2024-12-05
- Title:
-
Maximum clique enumeration and verification algorithm on α uncertain graphs
- 作者:
-
赵丹枫1, 吕闫妍1, 张文博1, 黄冬梅2, 高峰3
-
1. 上海海洋大学 信息学院, 上海 201306;
2. 上海电力大学 电子与信息工程学院, 上海 200090;
3. 广东美的制冷有限公司, 广东 佛山 528311
- Author(s):
-
ZHAO Danfeng1, LYU Yanyan1, ZHANG Wenbo1, HUANG Dongmei2, GAO Feng3
-
1. College of Information, Shanghai Ocean University, Shanghai 201306, China;
2. College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 200090, China;
3. Guangdong Midea Refrigeration Co. LTD., Foshan 528
-
- 关键词:
-
不确定图; 极大团; 数据挖掘; 枚举算法; 验证算法; 子图划分; 倒排表; 映射表
- Keywords:
-
uncertain graph; maximal clique; data mining; enumeration algorithm; verification algorithm; subgraph division; inversion list; mapping table
- 分类号:
-
TP311
- DOI:
-
10.11992/tis.202304058
- 摘要:
-
现有的不确定图中极大团枚举方法“子图划分—枚举—验证”,在处理大规模图时,整体效率不高;当挖掘出的伪极大团数量较多时,验证速率明显下降。因此,提出高效枚举及验证算法(multiple inversion list enumerate uncertain maximal cliques,MILEUMC)。在子图划分和枚举前,定义并构造概率阈值(α)不确定图,通过缩小图的规模,提高枚举效率;在“验证”时,提出基于多重倒排表的验证方法,分为去重复和去包含关系2个阶段去除伪极大团,以不同索引构建各个阶段的多重倒排表,根据极大团的属性完成验证,同时动态更新相应的倒排表和映射表,以减小工作量,提高时间效率。最后在多个真实的数据集上比较,结果验证了MILEUMC算法的高效性。该算法更适用于在较为稀疏的图上寻找联系更紧密的极大团。
- Abstract:
-
The existing maximal clique enumeration method for uncertain graphs, which uses “subgraph division–enumeration–verification,” is not efficient for large-scale graphs. As the number of pseudo-maximal cliques increases, the verification speed notably decreases. Therefore, the multi-inversion list enumerates uncertain maximal cliques (MILEUMC) algorithm is proposed to address the aforementioned problem. This algorithm improves efficiency by defining and constructing the α uncertain graph before subgraph division and enumeration, which reduces the size of the graph and enhances enumeration efficiency. In the “verification” phase, the algorithm introduces a novel method based on multiple inverted lists. The method involves two stages: the removal of inclusion relations (deduplication) and the removal of pseudo-maximum cliques. During each stage, multiple posting lists with different indexes are built, and verification is completed in accordance with the attributes of maximal cliques. Simultaneously, the corresponding inversion lists and mapping tables are dynamically updated, thereby reducing the workload and saving time. Compared to multiple real data sets, experimental results verify the efficiency of the MILEUMC algorithm. Furthermore, the algorithm is more suitable for identifying maximal cliques in sparser graphs, where connections between nodes are closer.
备注/Memo
收稿日期:2023-4-28。
基金项目:国家自然科学基金青年项目(42106190,62102243).
作者简介:赵丹枫,副教授,博士,主要研究方向为图计算与海洋大数据分析,主持国家自然科学基金青年项目,参与国家自然科学基金面上项目、省部级项目等近10项,多次获上海市科学技术奖二等奖、浦东新区科学技术奖一等奖等,发表学术论文20余篇。 E-mail:dfzhao@shou.edu.cn;吕闫妍,硕士研究生,主要研究方向为图数据挖掘。E-mail:504096670@qq.com;高峰,企业导师,博士,主要研究方向为知识图谱、图神经网络、自然语言处理。主导及参与多项行业及团体标准,发表数篇CCF-A类论文,EMNLP2022 审稿专家。E-mail:gaofeng14@midea.com。
通讯作者:高峰. E-mail:gaofeng14@midea.com
更新日期/Last Update:
2024-11-05