[1]LIU Su-qin,CHAI Song.K-means dynamic web topic detection method based on named entities[J].CAAI Transactions on Intelligent Systems,2010,5(2):122-126.
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
5
Number of periods:
2010 2
Page number:
122-126
Column:
学术论文—自然语言处理与理解
Public date:
2010-04-25
- Title:
-
K-means dynamic web topic detection method based on named entities
- Author(s):
-
LIU Su-qin1; CHAI Song1; 2
-
1.College of Computer & Communication Engineering,China University of Petroleum, Qingdao 266555, China;
2.Automation Workstation,Military District, Shandong Province, Ji’nan 250013, China
-
- Keywords:
-
named entity; web topics; dynamic detection; Kmeans clustering method; selfsimilarity; topic vector
- CLC:
-
TP18
- DOI:
-
-
- Abstract:
-
Current text representation models are not suitable for web topic detection, and the traditional Kmeans clustering algorithm has some drawbacks. The authors developed a dynamic Kmeans detection algorithm for web topics on the basis of named entities. In the new method, the representation model of the traditional topic detection method was modified. The text was represented by a combination of named entities and text features. The weight of the named entity was described by its contribution to the representation. The number of clusters K in the Kmeans algorithm selfconverged by the use of an adaptive technique. The Kmeans algorithm was optimized, achieving a dynamic detection of web topics by using dynamic selection of K values. Experimental results indicated that the new method detects and distinguishes between similar topics effectively, thus significantly improving the performance of topic detection.