[1]于世英,袁雪梅,卢海涛,等.基于序列聚类的相似代码检测算法[J].智能系统学报,2013,8(1):52-57.[doi:10.3969/j.issn.1673-4785.201209054]
YU Shiying,YUAN Xuemei,LU Haitao,et al.Similar code detection algorithm based on sequence clustering[J].CAAI Transactions on Intelligent Systems,2013,8(1):52-57.[doi:10.3969/j.issn.1673-4785.201209054]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
8
期数:
2013年第1期
页码:
52-57
栏目:
学术论文—机器学习
出版日期:
2013-03-25
- Title:
-
Similar code detection algorithm based on sequence clustering
- 文章编号:
-
1673-4785(2013)01-0052-06
- 作者:
-
于世英1,2,袁雪梅1,卢海涛1,任家东1,李硕1
-
1.燕山大学 信息科学与工程学院,河北 秦皇岛 066004;
2.河北省科技管理信息中心,河北 石家庄 050021
- Author(s):
-
YU Shiying 1,2 , YUAN Xuemei 1, LU Haitao 1, REN Jiadong 1, LI Shuo 1
-
1. College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China; 2.Hebei Provincial S&T Manaegement Information Center, Shijiazhuang 050021, China
-
- 关键词:
-
序列聚类; 权重编辑距离; 相似代码检测
- Keywords:
-
sequence clustering; weighted edit distance; similar code detection
- 分类号:
-
TP311.131
- DOI:
-
10.3969/j.issn.1673-4785.201209054
- 文献标志码:
-
A
- 摘要:
-
为了提高源程序代码之间相似性的检测效率,提出一种基于序列聚类的相似代码检测算法.算法首先把源代码按照其自身的结构进行分段提取,然后对各个分段进行部分代码变换,再以带权重的编辑距离为相似度量标准对这些符号进行序列聚类,得到相似的程序代码片段,以达到对源程序进行相似功能检测的目的.使用多个真实和仿真程序对上述算法进行了实验,实验结果验证了算法的有效性和可伸缩性.
- Abstract:
-
In order to improve efficiency of similar detecting between the codes of source programs, similar code detection algorithm based on sequence clustering was proposed in this paper. First, the algorithm extracts the source code by partitioning it into multi segments according to its structure. Secondly, parts of the codes in each segment were transformed and the sequences were then clustered taking the weighted edit distance as the similar measure standard. Next, similar code fragments were obtained, achieving the objective of detecting similar functions among multi codes of the source program. The experimental results based on a series of real and synthetic programs reveal the validity and scalability of the algorithm.
更新日期/Last Update:
2013-04-12