[1]冀若含,董红斌.基于重复度分析的森林优化特征选择算法[J].智能系统学报,2022,17(6):1113-1122.[doi:10.11992/tis.202111060]
JI Ruohan,DONG Hongbin.Feature selection using forest optimization algorithm based on duplication analysis[J].CAAI Transactions on Intelligent Systems,2022,17(6):1113-1122.[doi:10.11992/tis.202111060]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
17
期数:
2022年第6期
页码:
1113-1122
栏目:
学术论文—机器学习
出版日期:
2022-11-05
- Title:
-
Feature selection using forest optimization algorithm based on duplication analysis
- 作者:
-
冀若含, 董红斌
-
哈尔滨工程大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
- Author(s):
-
JI Ruohan, DONG Hongbin
-
School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
-
- 关键词:
-
特征选择; 演化算法; 重复度分析; 信息熵; 信息增益; 重启机制; 森林优化算法; 维度缩减
- Keywords:
-
feature selection; evolutionary algorithm; duplication analysis; information entropy; information gain; restart mechanism; forest optimization algorithm; dimensionality reduction
- 分类号:
-
TP301
- DOI:
-
10.11992/tis.202111060
- 文献标志码:
-
2022-08-24
- 摘要:
-
森林优化算法是一种基于森林中树木播种思想的演化算法,其具有良好的特征空间搜索能力,且实现难度低。但该算法在森林整体的收敛速度和寻优能力上仍存在提升空间,而且对高维数据集的适应度较差。本文针对上述问题提出了基于重复度分析的森林优化特征选择算法(feature selection using forest optimization algorithm based on duplication analysis, DAFSFOA)。该算法提出了基于信息增益的自适应初始化策略、森林重复度分析机制、森林重启机制、候选最优树生成策略、综合考虑特征选择数量和分类正确率的适应度函数。实验结果表明,DAFSFOA在大部分数据集上达到了最高的分类准确率。同时,对于高维数据集SRBCT,在维度缩减率和分类准确率方面,DAFSFOA对比森林优化特征选择算法(feature selection using forest optimization algorithm, FSFOA)都有较大提升。DAFSFOA 比FSFOA具有更强的特征空间探索能力,而且能够适应不同维度的数据集。
- Abstract:
-
The forest optimization algorithm is an evolutionary algorithm based on the concept of forest tree planting. It has a strong capability for searching for feature space and low implementation difficulty. However, the algorithm still has room for improvement in the convergence speed and merit-seeking ability of the forest as a whole, and it is not well-suited to high dimensional data sets. In this paper, we propose to use a forest optimization algorithm based on duplication analysis (DAFSFOA) to address the above problems. The algorithm proposes an adaptive initialization strategy based on information gain, a forest repetition analysis mechanism, a forest restart mechanism, a candidate optimal tree generation strategy, and an adaptation function that integrates the number of feature selections and the correct classification rate. The experimental results show that DAFSFOA achieves the highest classification accuracy on most datasets. Meanwhile, for the high dimensional dataset SRBCT, DAFSFOA has a large improvement over feature selection using a forest optimization algorithm (FSFOA) in terms of dimensionality reduction rate and classification accuracy. DAFSFOA has a stronger feature space exploration capability than FSFOA and can adapt to datasets with different dimensions.
更新日期/Last Update:
1900-01-01