[1]翟俊海,刘博,张素芳.基于粗糙集相对分类信息熵和粒子群优化的特征选择方法[J].智能系统学报,2017,12(3):397-404.[doi:10.11992/tis.201705004]
ZHAI Junhai,LIU Bo,ZHANG Sufang.A feature selection approach based on rough set relative classification information entropy and particle swarm optimization[J].CAAI Transactions on Intelligent Systems,2017,12(3):397-404.[doi:10.11992/tis.201705004]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
12
期数:
2017年第3期
页码:
397-404
栏目:
学术论文—知识工程
出版日期:
2017-06-25
- Title:
-
A feature selection approach based on rough set relative classification information entropy and particle swarm optimization
- 作者:
-
翟俊海1,2, 刘博3, 张素芳4
-
1. 河北大学 河北省机器学习与计算智能重点实验室, 河北 保定 071002;
2. 浙江师范大学 数理与信息工程学院, 浙江 金华 321004;
3. 河北大学 计算机科学与技术学院, 河北 保定 071002;
4. 中国气象局 气象干部培训学院河北分院, 河北 保定 071000
- Author(s):
-
ZHAI Junhai1,2, LIU Bo3, ZHANG Sufang4
-
1. Key Lab of Machine Learning and Computational Intelligence, Hebei University, Baoding 071002, China;
2. College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua 321004, China;
3. College of Computer Science and Technology, Hebei University, Baoding 071002, China;
4. Hebei Branch of Meteorological Cadres Training Institute, China Meteorological Administration, Baoding 071000, China
-
- 关键词:
-
数据挖掘; 特征选择; 数据预处理; 粗糙集; 决策表; 粒子群算法; 信息熵; 适应度函数
- Keywords:
-
data mining; feature selection; data preprocessing; rough set; decision table; particle swarm optimization; information entropy; fitness function
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.201705004
- 摘要:
-
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。
- Abstract:
-
Feature selection, an important step in data mining, is a process that selects a subset from an original feature set based on some criteria. Its purpose is to reduce the computational complexity of the learning algorithm and to improve the performance of data mining by removing irrelevant and redundant features. To deal with the problem of discrete values, a feature selection approach was proposed in this paper. It uses a particle swarm optimization algorithm to search the optimal feature subset. Further, it employs relative classification information entropy as a fitness function to measure the significance of the feature subset. Then, the proposed approach was compared with other evolutionary algorithm-based methods of feature selection. The experimental results confirm that the proposed approach outperforms genetic algorithm-based methods.
备注/Memo
收稿日期:2017-05-07。
基金项目:国家自然科学基金项目(71371063);河北省自然科学基金项目(F2017201026);浙江省计算机科学与技术重中之重学科(浙江师范大学)资助项目.
作者简介:翟俊海,男,1964年生,男,教授,中国人工智能学会粗糙集与软计算专业委员会委员,主要研究方向为机器学习。近几年主持或参与省部级以上项目10余项,获河北省自然科学三等奖1项,出版专著4部,发表论文70余篇;刘博,男,1989年生,硕士研究生,主要研究方向为机器学习;张素芳,女,1966年生,副教授,主要研究方向为机器学习。
通讯作者:翟俊海.E-mail:mczjh@126.com.
更新日期/Last Update:
2017-06-25