[1]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,(05):453-457.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,(05):453-457.[doi:10.3969/j.issn.1673-4785.2009.05.011]
点击复制

基于粗糙集的文本分类特征选择算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2009年05期
页码:
453-457
栏目:
出版日期:
2009-10-25

文章信息/Info

Title:
Feature selection for text categorization based on rough set
文章编号:
1673-4785(2009)05-0453-05
作者:
张志飞12苗夺谦12
1.同济大学计算机科学与技术系,上海201804; 2.同济大学嵌入式系统与服务计算教育部重点实验室,上海201804
Author(s):
ZHANG Zhi-fei 12 MIAO Duo-qian 12
1. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China; 2. The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Shanghai 201804, China
关键词:
文本分类粗糙集特征选择快速约简
Keywords:
text categorization rough set feature selection quick reduction
分类号:
TP391
DOI:
10.3969/j.issn.1673-4785.2009.05.011
文献标志码:
A
摘要:
文本分类是根据未知文本的内容将其划分到一个或多个预先定义的类别的过程,是许多基于内容的信息管理任务的重要组成部分.文本分类问题的难点是特征空间的高维性,通常采用特征选择作为降维的重要方法.将属性约简和文本分类的特点相结合,提出了一种基于粗糙集的特征选择算法即改进的快速约简算法.实验表明该算法是有效的,不仅可以降低特征空间的维度,而且能够维持高精度.
Abstract:
Text categorization assigns text documents to one or more predefined categories based on their contents. This assists content-based information management. A difficult problem in this task is the high dimensionality of the feature space. To resolve this, a feature selection method was employed to reduce the dimensions. A new approach based on rough sets,that we call it the improved quick reduction (IQR) algorithm,was proposed. It involved both attribute reduction and text categorization. The experimental results demonstrated the effectiveness of the proposed algorithm. It reduced the dimensionality of feature space, while maintaining high accuracy.

参考文献/References:

[1]苗夺谦, 卫志华. 中文文本信息处理的原理与应用[M]. 北京: 清华大学出版社, 2007: 214-230.
[2]周    屹.基于Naive Bayes 的文本分类器的设计与实现[J].黑龙江工程学院学报, 2007, 21(2): 28-30.
 ZHOU Yi.A text classifier’s design and realization based on Naive Bayes method[J].Journal of Heilongjiang Institute of Technology, 2007, 21(2): 28-30.
[3]YANG Yiming, PEDERSEN J O. A comparative study on feature selection in text categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning. Nashville, USA, 1997: 412-420.
[4]王国胤. Rough集理论与知识获取[M]. 西安: 西安交通大学出版社, 2001: 1-100.
[5]MAUDAL O. Preprocessing data for neural network based classifiers: rough sets vs principal component analysis[R]. Edinburgh: University of Edinburgh, 1996.
[6]苗夺谦, 胡桂荣. 知识约简的一种启发式算法[J]. 计算机研究与发展, 1999, 36(6): 681-684.
MIAO Duoqian, HU Guirong. A heuristic algorithm for reduction of knowledge[J]. Computer Research and Development, 1999, 36(6): 681-684.
[7]盛晓炜, 江铭虎. 基于Rough集约简算法的中文文本自动分类研究[J]. 电子与信息学报, 2005, 27(7): 1047-1052.
 SHENG Xiaowei, JIANG Minghu. Automatic classification of Chinese documents based on rough set and improved quickreduce algorithm[J]. Electronics and Information Technology, 2005, 27(7): 1047-1052.
[8]AAS K, EIKVIL L. Text categorisation: a survey[R]. Oslo: Norwegian Computing Center, 1999.

相似文献/References:

[1]尹林子,阳春华,桂卫华,等.规则分层约简算法[J].智能系统学报,2008,(06):492.
 YIN Lin-zi,YANG Chun-hua,GUI Wei-hua,et al.Hierarchical reduction of rules[J].CAAI Transactions on Intelligent Systems,2008,(05):492.
[2]毋 非,封化民,申晓晔.容错粗糙模型的事件检测研究[J].智能系统学报,2009,(02):112.
 WU Fei,FENG Hua-min,SHEN Xiao-ye.Research on event detection based on the tolerance rough set model[J].CAAI Transactions on Intelligent Systems,2009,(05):112.
[3]伞 冶,叶玉玲.粗糙集理论及其在智能系统中的应用[J].智能系统学报,2007,(02):40.
 SAN Ye,YE Yu-ling.Rough set theory and its application in the intelligent systems[J].CAAI Transactions on Intelligent Systems,2007,(05):40.
[4]王国胤,张清华,胡 军.粒计算研究综述[J].智能系统学报,2007,(06):8.
 WANG Guo-yin,ZHANG Qing-hua,HU Jun.An overview of granular computing[J].CAAI Transactions on Intelligent Systems,2007,(05):8.
[5]裴小兵,吴 涛,陆永忠.最小化决策规则集的计算方法[J].智能系统学报,2007,(06):65.
 PEI Xiao-bing,WU Tao,LU Yong-zhong.Calculating method for a minimal set of decision rules[J].CAAI Transactions on Intelligent Systems,2007,(05):65.
[6]梁晓娜,于 红,范丽民,等.改进词频分类器集成的文本分类算法[J].智能系统学报,2010,(02):177.
 LIANG Xiao-na,YU Hong,FAN Li-min,et al.A text classification algorithm that uses an improved term frequency classifier ensemble[J].CAAI Transactions on Intelligent Systems,2010,(05):177.
[7]马胜蓝,叶东毅.一种带禁忌搜索的粒子并行子群最小约简算法[J].智能系统学报,2011,(02):132.
 MA Shenglan,YE Dongyi.A minimum reduction algorithm based on parallel particle subswarm optimization with tabu search capability[J].CAAI Transactions on Intelligent Systems,2011,(05):132.
[8]顾成杰,张顺颐,杜安源.结合粗糙集和禁忌搜索的网络流量特征选择[J].智能系统学报,2011,(03):254.
 GU Chengjie,ZHANG Shunyi,DU Anyuan.Feature selection of network traffic using a rough set and tabu search[J].CAAI Transactions on Intelligent Systems,2011,(05):254.
[9]夏睿,宗成庆.情感文本分类混合模型及特征扩展策略[J].智能系统学报,2011,(06):483.
 XIA Rui,ZONG Chengqing.A hybrid approach to sentiment classification and feature expansion strategy[J].CAAI Transactions on Intelligent Systems,2011,(05):483.
[10]周丹晨.采用粒计算的属性权重确定方法[J].智能系统学报,2015,(02):273.[doi:10.3969/j.issn.1673-4785.201312008]
 ZHOU Danchen.A method for ascertaining the weight of attributes based on granular computing[J].CAAI Transactions on Intelligent Systems,2015,(05):273.[doi:10.3969/j.issn.1673-4785.201312008]

备注/Memo

备注/Memo:
作者简介:
张志飞,男,1986年生,硕士研究生,主要研究方向为文本挖掘、智能信息处理.

苗夺谦,男,1964年生,教授、博士生导师.中国计算机学会人工智能与模式识别专业委员会委员,中国人工智能学会理事,上海市计算机学会理论与人工智能专业委员会委员.主要研究方向为粗糙集理论、粒计算、主曲线、网络智能、数据挖掘等.已主持完成多项国家、省部级自然科学基金与科技攻关项目,并参与完成“973”计划子项目1项,“863”计划项目2项.曾获国家教委科技进步三等奖、山西省科技进步二等奖、教育部科技进步一等奖等.发表学术论文120余篇,其中被SCI和EI等收录50余篇,出版学术专著3部.
更新日期/Last Update: 2009-12-29