<-Previous Article Next Article->

[1]YU Shiying,YUAN Xuemei,LU Haitao,et al.Similar code detection algorithm based on sequence clustering[J].CAAI Transactions on Intelligent Systems,2013,8(1):52-57.[doi:10.3969/j.issn.1673-4785.201209054]

Copy

Similar code detection algorithm based on sequence clustering

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 8 Number of periods: 2013 1 Page number: 52-57 Column: 学术论文—机器学习 Public date: 2013-03-25

Title:: Similar code detection algorithm based on sequence clustering

Author(s):: YU Shiying ¹; 2 ; YUAN Xuemei ¹; LU Haitao¹; REN Jiadong¹; LI Shuo ¹; 1. College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China; 2．Hebei Provincial S&T Manaegement Information Center, Shijiazhuang 050021, China

Keywords:: sequence clustering; weighted edit distance; similar code detection

CLC:: TP311.131

DOI:: 10.3969/j.issn.1673-4785.201209054

Abstract:: In order to improve efficiency of similar detecting between the codes of source programs, similar code detection algorithm based on sequence clustering was proposed in this paper. First, the algorithm extracts the source code by partitioning it into multi segments according to its structure. Secondly, parts of the codes in each segment were transformed and the sequences were then clustered taking the weighted edit distance as the similar measure standard. Next, similar code fragments were obtained, achieving the objective of detecting similar functions among multi codes of the source program. The experimental results based on a series of real and synthetic programs reveal the validity and scalability of the algorithm.

References:: ［1］KONTOGIANNIS K, GALLER M, DEMORI R. Detecting code similarity using patterns［C］//Working Notes of Third Workshop on AI and Software Engineering: Breaking the Toy Mold (AISE). ［S.l.］, 1995: 68-73.
［2］OHNO A. Measure source code similarity using reference vectors［C］//Proceedings of the First International Conference on Innovative Computing, Information and Control. Washington, DC, USA: IEEE Computer Society, 2006, 2: 92-95.
［3］YAMAMOTO T, MATSUSHITA M, KAMIYA T, et al. Measuring similarity of large software systems based on source code correspondence［C］//Proceedings of the 6th International Conference on Product Focused Software Process Improvement. Berlin/Heidelberg: SpringerVerlag, 2005: 530-544.
［4］JI J H, PARK S H, WOO G, et al. Source code similarity detection using adaptive local alignment of keywords［C］//Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies. Washington, DC, USA: IEEE Computer Society, 2007: 179-180.
［5］赵长海，晏海华，金茂忠．基于编译优化和反汇编的程序相似性检测方法［J］．北京航空航天大学学报, 2008, 34(6): 711-715. 
ZHAO Changhai, YAN Haihua, JIN Maozhong. Approach based on compiling optimization and disassembling to detect program similarity［J］. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(6): 711-715.
［6］于海英．程序代码相似度度量的研究与实现［J］．计算机工程, 2010, 36(4): 45-49. 
YU Haiying. Research and implementation of program code similarity measurement［J］. Computer Engineering, 2010, 36(4): 45-49.
［7］JIANG Linxiao. Scalable detection of similar code: techniques and applications［D］. Davis, CA, USA: University of California Davis, 2009: 12-45.
［8］张丽萍，刘东升，李彦臣，等．一种基于 AST 的代码抄袭检测方法［J］．计算机应用研究, 2011, 28(12): 4616-4620.
ZHANG Liping, LIU Dongsheng, LI Yanchen, et al. ASTbased code plagiarism detection method［J］. Application Research of Computers, 2011, 28(12): 4616-4620.
［9］钟美，张丽萍，刘东升．基于XML的C代码抄袭检测算法［J］．计算机工程与应用, 2011, 47(8): 215-218. 
ZHONG Mei, ZHANG Liping, LIU Dongsheng. Plagiarism detection algorithm based on XML for C code［J］. Computer Engineering and Applications, 2011, 47(8): 215-218.
［10］戴东波，汤春蕾，熊赟．基于整体和局部相似性的序列聚类算法［J］．软件学报, 2010, 21(4): 702-717. 
DAI Dongbo, TANG Chunlei, XIONG Yun. Sequence clustering algorithms based on global and local similarity［J］. Journal of Software, 2010, 21(4): 702-717.

Similar References:

Memo

Last Update: 2013-04-12

Similar code detection algorithm based on sequence clustering PDF DownloadHTML

Memo

Similar code detection algorithm based on sequence clustering

PDF Download HTML