[1]于世英,袁雪梅,卢海涛,等.基于序列聚类的相似代码检测算法[J].智能系统学报,2013,8(01):52-57.[doi:10.3969/j.issn.1673-4785.201209054]
 YU Shiying,YUAN Xuemei,LU Haitao,et al.Similar code detection algorithm based on sequence clustering[J].CAAI Transactions on Intelligent Systems,2013,8(01):52-57.[doi:10.3969/j.issn.1673-4785.201209054]
点击复制

基于序列聚类的相似代码检测算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第8卷
期数:
2013年01期
页码:
52-57
栏目:
出版日期:
2013-03-25

文章信息/Info

Title:
Similar code detection algorithm based on sequence clustering
文章编号:
1673-4785(2013)01-0052-06
作者:
于世英12袁雪梅1卢海涛1任家东1李硕1
1.燕山大学 信息科学与工程学院,河北 秦皇岛 066004;
2.河北省科技管理信息中心,河北 石家庄 050021
Author(s):
YU Shiying 12 YUAN Xuemei 1 LU Haitao 1 REN Jiadong 1 LI Shuo 1
1. College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China; 2.Hebei Provincial S&T Manaegement Information Center, Shijiazhuang 050021, China
关键词:
序列聚类权重编辑距离相似代码检测
Keywords:
sequence clustering weighted edit distance similar code detection
分类号:
TP311.131
DOI:
10.3969/j.issn.1673-4785.201209054
文献标志码:
A
摘要:
为了提高源程序代码之间相似性的检测效率,提出一种基于序列聚类的相似代码检测算法.算法首先把源代码按照其自身的结构进行分段提取,然后对各个分段进行部分代码变换,再以带权重的编辑距离为相似度量标准对这些符号进行序列聚类,得到相似的程序代码片段,以达到对源程序进行相似功能检测的目的.使用多个真实和仿真程序对上述算法进行了实验,实验结果验证了算法的有效性和可伸缩性.
Abstract:
In order to improve efficiency of similar detecting between the codes of source programs, similar code detection algorithm based on sequence clustering was proposed in this paper. First, the algorithm extracts the source code by partitioning it into multi segments according to its structure. Secondly, parts of the codes in each segment were transformed and the sequences were then clustered taking the weighted edit distance as the similar measure standard. Next, similar code fragments were obtained, achieving the objective of detecting similar functions among multi codes of the source program. The experimental results based on a series of real and synthetic programs reveal the validity and scalability of the algorithm.

参考文献/References:

[1]KONTOGIANNIS K, GALLER M, DEMORI R. Detecting code similarity using patterns[C]//Working Notes of Third Workshop on AI and Software Engineering: Breaking the Toy Mold (AISE). [S.l.], 1995: 68-73.
[2]OHNO A. Measure source code similarity using reference vectors[C]//Proceedings of the First International Conference on Innovative Computing, Information and Control. Washington, DC, USA: IEEE Computer Society, 2006, 2: 92-95.
[3]YAMAMOTO T, MATSUSHITA M, KAMIYA T, et al. Measuring similarity of large software systems based on source code correspondence[C]//Proceedings of the 6th International Conference on Product Focused Software Process Improvement. Berlin/Heidelberg: SpringerVerlag, 2005: 530-544.
[4]JI J H, PARK S H, WOO G, et al. Source code similarity detection using adaptive local alignment of keywords[C]//Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies. Washington, DC, USA: IEEE Computer Society, 2007: 179-180.
[5]赵长海,晏海华,金茂忠.基于编译优化和反汇编的程序相似性检测方法[J].北京航空航天大学学报, 2008, 34(6): 711-715. 
ZHAO Changhai, YAN Haihua, JIN Maozhong. Approach based on compiling optimization and disassembling to detect program similarity[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(6): 711-715.
[6]于海英.程序代码相似度度量的研究与实现[J].计算机工程, 2010, 36(4): 45-49. 
YU Haiying. Research and implementation of program code similarity measurement[J]. Computer Engineering, 2010, 36(4): 45-49.
[7]JIANG Linxiao. Scalable detection of similar code: techniques and applications[D]. Davis, CA, USA: University of California Davis, 2009: 12-45.
[8]张丽萍,刘东升,李彦臣,等.一种基于 AST 的代码抄袭检测方法[J].计算机应用研究, 2011, 28(12): 4616-4620.
ZHANG Liping, LIU Dongsheng, LI Yanchen, et al. ASTbased code plagiarism detection method[J]. Application Research of Computers, 2011, 28(12): 4616-4620.
[9]钟美,张丽萍,刘东升.基于XML的C代码抄袭检测算法[J].计算机工程与应用, 2011, 47(8): 215-218. 
ZHONG Mei, ZHANG Liping, LIU Dongsheng. Plagiarism detection algorithm based on XML for C code[J]. Computer Engineering and Applications, 2011, 47(8): 215-218.
[10]戴东波,汤春蕾,熊赟.基于整体和局部相似性的序列聚类算法[J].软件学报, 2010, 21(4): 702-717. 
DAI Dongbo, TANG Chunlei, XIONG Yun. Sequence clustering algorithms based on global and local similarity[J]. Journal of Software, 2010, 21(4): 702-717.

备注/Memo

备注/Memo:
收稿日期:2012-09-25.
网络出版日期:2013-01-25.
基金项目:国家自然科学基金资助项目(61170190).
通信作者:于世英.
E-mail: wangqianysu@163.com.
作者简介:
于世英,1973年生,女,工程师,主要研究方向为数据挖掘. 
袁雪梅,女,1989年生,硕士研究生,主要研究方向为数据挖掘. 
卢海涛,女,1975年生,讲师,主要研究方向为数据挖掘、虚拟现实.
更新日期/Last Update: 2013-04-12