[1]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,4(5):453-457.[doi:10.3969/j.issn.1673-4785.2009.05.011]
ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,4(5):453-457.[doi:10.3969/j.issn.1673-4785.2009.05.011]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
4
期数:
2009年第5期
页码:
453-457
栏目:
学术论文—自然语言处理与理解
出版日期:
2009-10-25
- Title:
-
Feature selection for text categorization based on rough set
- 文章编号:
-
1673-4785(2009)05-0453-05
- 作者:
-
张志飞1,2,苗夺谦1,2
-
1.同济大学计算机科学与技术系,上海201804; 2.同济大学嵌入式系统与服务计算教育部重点实验室,上海201804
- Author(s):
-
ZHANG Zhi-fei 1,2, MIAO Duo-qian 1,2
-
1. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China; 2. The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Shanghai 201804, China
-
- 关键词:
-
文本分类; 粗糙集; 特征选择; 快速约简
- Keywords:
-
text categorization; rough set; feature selection; quick reduction
- 分类号:
-
TP391
- DOI:
-
10.3969/j.issn.1673-4785.2009.05.011
- 文献标志码:
-
A
- 摘要:
-
文本分类是根据未知文本的内容将其划分到一个或多个预先定义的类别的过程,是许多基于内容的信息管理任务的重要组成部分.文本分类问题的难点是特征空间的高维性,通常采用特征选择作为降维的重要方法.将属性约简和文本分类的特点相结合,提出了一种基于粗糙集的特征选择算法即改进的快速约简算法.实验表明该算法是有效的,不仅可以降低特征空间的维度,而且能够维持高精度.
- Abstract:
-
Text categorization assigns text documents to one or more predefined categories based on their contents. This assists content-based information management. A difficult problem in this task is the high dimensionality of the feature space. To resolve this, a feature selection method was employed to reduce the dimensions. A new approach based on rough sets,that we call it the improved quick reduction (IQR) algorithm,was proposed. It involved both attribute reduction and text categorization. The experimental results demonstrated the effectiveness of the proposed algorithm. It reduced the dimensionality of feature space, while maintaining high accuracy.
更新日期/Last Update:
2009-12-29