[1]周红标,乔俊飞.基于高维k-近邻互信息的特征选择方法[J].智能系统学报,2017,(05):595-600.[doi:10.11992/tis.201609020]
 ZHOU Hongbiao,QIAO Junfei.Feature selection method based on high dimensional k-nearest neighbors mutual information[J].CAAI Transactions on Intelligent Systems,2017,(05):595-600.[doi:10.11992/tis.201609020]
点击复制

基于高维k-近邻互信息的特征选择方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2017年05期
页码:
595-600
栏目:
出版日期:
2017-10-25

文章信息/Info

Title:
Feature selection method based on high dimensional k-nearest neighbors mutual information
作者:
周红标123 乔俊飞12
1. 北京工业大学 信息学部, 北京 100124;
2. 计算智能和智能系统北京市重点实验室, 北京 100124;
3. 淮阴工学院 自动化学院, 江苏 淮安 223003
Author(s):
ZHOU Hongbiao123 QIAO Junfei12
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China;
3. Faculty of Automation, Huaiyin Institute of Technology, Huai’an 223003, China
关键词:
特征选择互信息k-近邻高维互信息多层感知器
Keywords:
feature selectionmutual informationk-nearest neighborhigh-dimensional mutual informationmultilayer perceptron
分类号:
TP183
DOI:
10.11992/tis.201609020
摘要:
针对多元序列预测建模过程中特征选择问题,提出了一种基于数据驱动型高维k-近邻互信息的特征选择方法。该方法首先将数据驱动型k-近邻法扩展用于高维特征变量之间互信息的估计,然后采用前向累加策略给出全部特征最优排序,根据预设无关特征个数剔除无关特征,再利用后向交叉策略找出并剔除冗余特征,最终得到最优强相关特征子集。以Friedman数据、Housing数据和实际污水处理出水总磷预测数据为例,采用多层感知器神经网络预测模型进行仿真实验,验证了所提方法的有效性。
Abstract:
Feature selection plays an important role in the modeling and forecast of multivariate series. In this paper, we propose a feature selection method based on data-driven high-dimensional k-nearest neighbor mutual information. First, this method extends the k-nearest neighbor method to estimate the amount of mutual information among high-dimensional feature variables. Next, optimal sorting of all these features is achieved by adopting a forward accumulation strategy in which irrelevant features are eliminated according to a preset number. Then, redundant features are located and removed using a backward cross strategy. Lastly, this method obtains optimal subsets that feature a strong correlation. Using Friedman data, housing data, and actual effluent total-phosphorus forecast data from wastewater treatment plant as examples, we performed a simulation experiment by adopting a neural network forecast model with multilayer perception. The simulation results demonstrate the feasibility of the proposed method.

参考文献/References:

[1] OSELEDETS I V, TYRTYSHNIKOV E E. Breaking the curse of dimensionality, or how to use SVD in many dimensions[J]. SIAM journal on scientific computing, 2009, 31(5):3744-3759.
[2] GHAMISI P, BENEDIKTSSON J A. Feature selection based on hybridization of genetic algorithm and particle swarm optimization[J]. IEEE geoscience and remote sensing letters, 2015, 12(2):309-313.
[3] RAUBER T W, ASSIS BOLDT de F, VAREJÃO F M. Heterogeneous feature models and feature selection applied to bearing fault diagnosis[J]. IEEE transactions on industrial electronics, 2015, 62(1):637-646.
[4] WOLD S, SJÖSTRÖM M, ERIKSSON L. PLS-regression:a basic tool of chemometrics[J]. Chemometrics and intelligent laboratory systems, 2001, 58(2):109-130.
[5] SONG Q, SHEPPERD M. Predicting software project effort:a grey relational analysis based method[J]. Expert systems with applications, 2011, 38(6):7302-7316.
[6] FENG J, JIAO L, LIU F, et al. Mutual-information-based semi-supervised hyperspectral band selection with high discrimination, high information, and low redundancy[J]. IEEE transactions on geoscience and remote sensing, 2015, 53(5):2956-2969.
[7] BENNASAR M, HICKS Y, SETCHI R. Feature selection using joint mutual information maximisation[J]. Expert systems with applications, 2015, 42(22):8520-8532.
[8] SHANNON C E. A mathematical theory of communication[J]. ACM sigmobile mobile computing and communications review, 2001, 5(1):3-55.
[9] BATTITI R. Using mutual information for selecting features in supervised neural net learning[J]. IEEE transactions on neural networks, 1994, 5(4):537-550.
[10] PENG H, LONG F, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE transactions on pattern analysis and machine intelligence, 2005, 27(8):1226-1238.
[11] FLEURET F. Fast binary feature selection with conditional mutual information[J]. Journal of machine learning research, 2004, 5:1531-1555.
[12] YANG H H, MOODY J E. Data visualization and feature selection:new algorithms for nongaussian data[C]//Advances in Neural Information Processing Systems. Cambridge,Britain, 1999:687-693.
[13] BROWN G. A new perspective for information theoretic feature selection[C]//Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. Florida, USA, 2009:49-56.
[14] 韩敏, 刘晓欣. 基于互信息的分步式输入变量选择多元序列预测研究[J]. 自动化学报, 2012, 38(6):999-1005.HAN Min, LIU Xiaoxin. Stepwise input variable selection based on mutual information for multivariate forecasting[J]. ACTA automatica sinica, 2012, 38(6):999-1005.
[15] BEIRLANT J, DUDEWICZ E J, GYÖRFI L, et al. Nonparametric entropy estimation:an overview[J]. International journal of mathematical and statistical sciences, 1997, 6(1):17-39.
[16] MOON Y I, RAJAGOPALAN B, LALL U. Estimation of mutual information using kernel density estimators[J]. Physical review E, 1995, 52(3):2318-2321.
[17] KRASKOV A, STÖGBAUER H, GRASSBERGER P. Estimatingmutual information[J]. Physical review E, 2004, 69(6):1-16.
[18] STÖGBAUER H, KRASKOV A, ASTAKHOV S A, et al. Least-pendent-component analysis based on mutual information[J]. Physical review E, 2004, 70(6):1-17.
[19] 霍军周, 王亚杰, 欧阳湘宇, 等. 基于BP神经网络的TBM主轴承载荷谱预测[J]. 哈尔滨工程大学学报, 2015, 36(7):965-969.HUO Junzhou, WANG Yajie, OUYANG Xiangyu, et al.Load spectrum prediction of the main drive bearing of a tunnel boring machine based on BP neural networks[J]. Journal of Harbin Engineering University, 2015, 36(7):965-969.
[20] 乔俊飞, 周红标. 基于自组织模糊神经网络的出水总磷预测[J]. 控制理论与应用, 2017, 34(2):224-232. QIAO Junfei, ZHOU Hongbiao. Prediction of effluent total phosphorus based on self-organizing fuzzy neural network[J]. Control theory and applications, 2017, 34(2):224-232.

相似文献/References:

[1]陈伟卿,李冠华,欧宗瑛,等.基于灰度互信息和梯度相似性的医学图像配准及其加速处理[J].智能系统学报,2008,(06):498.
 CHEN Wei-qing,LI Guan-hua,OU Zong-ying,et al.Medical image registration based on grey mutual information and gradient similarity with an accelerated processing method[J].CAAI Transactions on Intelligent Systems,2008,(05):498.
[2]孙正兴,张尧烨,李 彬.基于线性规划分类器的相关反馈技术[J].智能系统学报,2007,(03):34.
 SUN Zheng-xing,ZHANG Yao-ye,LI Bin.Applying relevance feedback with a linear programming classifier[J].CAAI Transactions on Intelligent Systems,2007,(05):34.
[3]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
[4]李冰寒,高晓利,刘三阳,等.利用互信息学习贝叶斯网络结构[J].智能系统学报,2011,(01):68.
 LI-Binghan,GAO-Xiaoli,LIU-Sanyang,et al.Learning Bayesian network structures based on mutual information[J].CAAI Transactions on Intelligent Systems,2011,(05):68.
[5]顾成杰,张顺颐,杜安源.结合粗糙集和禁忌搜索的网络流量特征选择[J].智能系统学报,2011,(03):254.
 GU Chengjie,ZHANG Shunyi,DU Anyuan.Feature selection of network traffic using a rough set and tabu search[J].CAAI Transactions on Intelligent Systems,2011,(05):254.
[6]张昭昭,乔俊飞,杨刚.自适应前馈神经网络结构优化设计[J].智能系统学报,2011,(04):312.
 ZHANG Zhaozhao,QIAO Junfei,YANG Gang.An adaptive algorithm for designingoptimal feedforward neural network architecture[J].CAAI Transactions on Intelligent Systems,2011,(05):312.
[7]宋晓丽,刘冀伟,张晓星.分布式视频编码的关键帧提取算法[J].智能系统学报,2011,(06):539.
 SONG Xiaoli,LIU Jiwei,ZHANG Xiaoxing.A key frame selection algorithm for distributed video coding[J].CAAI Transactions on Intelligent Systems,2011,(05):539.
[8]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,(03):189.
 SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,(05):189.
[9]曹晋,张莉,李凡长.一种基于支持向量数据描述的特征选择算法[J].智能系统学报,2015,(02):215.[doi:10.3969/j.issn.1673-4785.201405063]
 CAO Jin,ZHANG Li,LI Fanzhang.A noval support vector data description-based feature selection method[J].CAAI Transactions on Intelligent Systems,2015,(05):215.[doi:10.3969/j.issn.1673-4785.201405063]
[10]张佳骕,蒋亦樟,王士同.基于特征选择聚类方法的稀疏TSK模糊系统[J].智能系统学报,2015,(04):583.[doi:10.3969/j.issn.1673-4785.201412001]
 ZHANG Jiasu,JIANG Yizhang,WANG Shitong.Sparse TSK fuzzy system based on feature selection clustering method[J].CAAI Transactions on Intelligent Systems,2015,(05):583.[doi:10.3969/j.issn.1673-4785.201412001]

备注/Memo

备注/Memo:
收稿日期:2016-09-21。
基金项目:国家自然科学基金重点项目(61533002);国家杰出青年科学基金项目(61225016).
作者简介:周红标,男,1980年生,讲师,博士研究生,主要研究方向为神经网络分析与设计。发表论文十余篇,其中被EI检索6篇;乔俊飞,男,1968年生,教授,博士生导师,国家杰出青年基金获得者,教育部长江学者特聘教授,教育部新世纪优秀人才,主要研究方向为污水处理过程智能优化控制。获教育部科技进步奖一等奖和北京市科学进步奖三等奖各1项,发表论文近100篇,其中被SCI收录18篇,EI收录60篇,获发明专利20项。
通讯作者:乔俊飞.E-mail:hyitzhb@163.com
更新日期/Last Update: 2017-10-25