[1]SHEN Gaofeng,GU Shumin.Chinese Web page feature extraction by optimizing comprehensive heuristics based on GA[J].CAAI Transactions on Intelligent Systems,2014,9(4):474-479.[doi:10.3969/j.issn.1673-4785.201305044]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
9
Number of periods:
2014 4
Page number:
474-479
Column:
学术论文—智能系统
Public date:
2014-08-25
- Title:
-
Chinese Web page feature extraction by optimizing comprehensive heuristics based on GA
- Author(s):
-
SHEN Gaofeng1; GU Shumin2
-
1. School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China;
2. Department of Basic Subjects, College Information & Business, Zhongyuan University of Technology, Zhengzhou 450007, China
-
- Keywords:
-
feature extraction; GA; text classification; text clustering; word frequency; word correlation
- CLC:
-
TP391.1
- DOI:
-
10.3969/j.issn.1673-4785.201305044
- Abstract:
-
Feature extraction is the basis of such technologies as information retrieval, text classification, text clustering and automatic summarization. Aiming at the shortcomings of the traditional feature extraction methods which make it difficult to test feature words comprehensively and effectively, this paper proposes a method for extracting Chinese web page features by optimizing the comprehensive heuristic features based on GA. This proposed method employs comprehensive heuristics of word frequency, word correlation, parts of speech (POS) and position features to comprehensively test selected features and uses GA to optimize the weight of each heuristic parameter. The experimental results of the different test sets show that the proposed method can effectively avoid the derivations of the traditional extraction methods and obtain more representative features, and therefore it has a certain practical value.