[1]LI Hailin,ZOU Jinchuan.Text similarity measure method based on classified dictionary[J].CAAI Transactions on Intelligent Systems,2017,12(4):556-562.[doi:10.11992/tis.201608010]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
12
Number of periods:
2017 4
Page number:
556-562
Column:
学术论文—自然语言处理与理解
Public date:
2017-08-25
- Title:
-
Text similarity measure method based on classified dictionary
- Author(s):
-
LI Hailin1; ZOU Jinchuan2
-
1. Department of Information Systems, Huaqiao University, Quanzhou 362021, China;
2. Research Center of Applied Statistics and Big Data, Huaqiao University, Xiamen 361021, China
-
- Keywords:
-
data mining; semantic analysis; classified dictionary; keywords extraction; encoder; similarity measure; clustering; classification
- CLC:
-
TP301
- DOI:
-
10.11992/tis.201608010
- Abstract:
-
Existing text-similarity measurement methods based on the semantic knowledge rules analysis have the limitation of high time complexity. In this paper, we propose a text-similarity measurement method based on the Classified Dictionary. First, we segmented texts using the Chinese Lexical Analysis System. Then, we extracted text keywords using the term frequency-inverse document frequency (tf*idf) method and performed keywords coding by traversing the dictionary. By calculating the coding similarity of the text keywords, we can determine the similarity of the original texts. As our two comparison methods, we selected similarity measurement methods based on semantic knowledge rules and statistics. We verified our similarity measurement results using traditional clustering algorithms and the k-nearest neighbors classification method. Our numerical results show that our proposed method can obtain relatively good results in clustering and classification experiments. In addition, compared with other semantic analysis measurement methods, this method has better time efficiency.