[1]YU Shiying,YUAN Xuemei,LU Haitao,et al.Similar code detection algorithm based on sequence clustering[J].CAAI Transactions on Intelligent Systems,2013,8(1):52-57.[doi:10.3969/j.issn.1673-4785.201209054]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
8
Number of periods:
2013 1
Page number:
52-57
Column:
学术论文—机器学习
Public date:
2013-03-25
- Title:
-
Similar code detection algorithm based on sequence clustering
- Author(s):
-
YU Shiying 1; 2 ; YUAN Xuemei 1; LU Haitao 1; REN Jiadong 1; LI Shuo 1
-
1. College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China; 2.Hebei Provincial S&T Manaegement Information Center, Shijiazhuang 050021, China
-
- Keywords:
-
sequence clustering; weighted edit distance; similar code detection
- CLC:
-
TP311.131
- DOI:
-
10.3969/j.issn.1673-4785.201209054
- Abstract:
-
In order to improve efficiency of similar detecting between the codes of source programs, similar code detection algorithm based on sequence clustering was proposed in this paper. First, the algorithm extracts the source code by partitioning it into multi segments according to its structure. Secondly, parts of the codes in each segment were transformed and the sequences were then clustered taking the weighted edit distance as the similar measure standard. Next, similar code fragments were obtained, achieving the objective of detecting similar functions among multi codes of the source program. The experimental results based on a series of real and synthetic programs reveal the validity and scalability of the algorithm.