[1]MA Jialin,ZHANG Yongjun,WANG Zhijian.Multi-topic extraction algorithm based on concept clusters[J].CAAI Transactions on Intelligent Systems,2015,10(2):261-266.[doi:10.3969/j.issn.1673-4785.201405066]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
10
Number of periods:
2015 2
Page number:
261-266
Column:
学术论文—机器学习
Public date:
2015-04-25
- Title:
-
Multi-topic extraction algorithm based on concept clusters
- Author(s):
-
MA Jialin1; 2; ZHANG Yongjun1; 2; WANG Zhijian1
-
1. College of Computer and Information, Hohai University, Nanjing 211100, China;
2. School of Computer Engineering, Huaiyin Institute of Technology, Huaian 223003, China
-
- Keywords:
-
semantic; sparsity; context; knowledge base; concept clusters; multi-topic extraction; K-means; MEABCC
- CLC:
-
TP18
- DOI:
-
10.3969/j.issn.1673-4785.201405066
- Abstract:
-
There are a large number of multi-topic documents existing in the real world, and the extraction of multi-topic is widely used in the fields of information retrieval, library science and intelligence. In the traditional theme extraction algorithm, in most cases a theme is extracted for the whole text, which lacks of semantic information and has high-dimensional vector and sparse defects. Setting concept vectors to represent text based on the repository of cnki.net, merging synonyms and discriminating polysemy according to the semantic of concepts and context, thereby achieving the computation of semantic similarity in light of the semantic relation among concepts. The multi-topic extraction algorithm based on the concept of clusters (MEABCC) is proposed. The MEABCC acquires multiple topics by clustering concepts. The conceptual clustering made by K-means algorithm is improved through the method of presetting "default seed", which makes up the undulating time and space overlay and the unstable results. This happen to be caused by sensitivity to initial centers of traditional K-means algorithm. The experiments showed that MEABCC has good accuracy, recall and F1 values.