[1]胡敏杰,林耀进,杨红和,等.基于特征相关的谱特征选择算法[J].智能系统学报,2017,(04):519-525.[doi:10.11992/tis.201609008]
 HU Minjie,LIN Yaojin,YANG Honghe,et al.Spectral feature selection based on feature correlation[J].CAAI Transactions on Intelligent Systems,2017,(04):519-525.[doi:10.11992/tis.201609008]
点击复制

基于特征相关的谱特征选择算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
期数:
2017年04期
页码:
519-525
栏目:
出版日期:
2017-08-25

文章信息/Info

Title:
Spectral feature selection based on feature correlation
作者:
胡敏杰 林耀进 杨红和 郑荔平 傅为
闽南师范大学 计算机学院, 福建 漳州 363000
Author(s):
HU Minjie LIN Yaojin YANG Honghe ZHENG Liping FU Wei
School of Computer Science, Minnan Normal University, Zhangzhou 363000, China
关键词:
特征选择谱特征选择谱图理论特征关联区分能力索搜策略拉普拉斯分类精度
Keywords:
feature selectionspectral feature selectionspectral graph theoryfeature relevancediscernibilitysearch strategyLaplacian scoreclassification performance
分类号:
TP18
DOI:
10.11992/tis.201609008
摘要:
针对传统的谱特征选择算法只考虑单特征的重要性,将特征之间的统计相关性引入到传统谱分析中,构造了基于特征相关的谱特征选择模型。首先利用Laplacian Score找出最核心的一个特征作为已选特征,然后设计了新的特征组区分能力目标函数,采用前向贪心搜索策略依次评价候选特征,并选中使目标函数最小的候选特征加入到已选特征。该算法不仅考虑了特征重要性,而且充分考虑了特征之间的关联性,最后在2个不同分类器和8个UCI数据集上的实验结果表明:该算法不仅提高了特征子集的分类性能,而且获得较高的分类精度下所需特征子集的数量较少。
Abstract:
In the traditional spectrum feature selection algorithm, only the importance of single features are considered. In this paper, we introduce the statistical correlation between features into traditional spectrum analysis and construct a spectral feature selection model based on feature correlation. First, the proposed model utilizes the Laplacian Score to identify the most central feature as the selected feature, then designs a new feature group discernibility objective function, and applies the forward greedy search strategy to sequentially evaluate the candidate features. Then, the candidate feature with the minimum objective function is added to the selected features. The algorithm considers both the importance of feature as well as the correlations between features. We conducted experiments on two different classifiers and eight UCI datasets, the results of which show that the algorithm effectively improves the classification performance of the feature subset and also obtains a small number of feature subsets with high classification precision.

参考文献/References:

[1] LIN Yaojin, Li Jinjin, LIN Peirong, et al. Feature selection via neighborhood multi-granulation fusion[J]. Knowledge-based systems, 2014, 67:162-168.
[2] MANORANJAN D, LIU Huan. Consistency-based search in feature selection[J]. Artificial intelligence, 2003,151(1):155-176.
[3] ZHANG C, ARUN K, CHRISTOPHER R. Materialization optimizations for feature selection workloads[J]. ACM transactions on database systems, 2016, 41(1):2.
[4] 曹晋, 张莉, 李凡长. 一种基于支持向量数据描述的特征选择算法[J]. 智能系统学报, 2015, 10(2):215-220.CAO Jin, ZHANG li, LI Fanchang. A feature selection algorithm based on support vector data description[J]. CAAI transactions on intelligent systems, 2015, 10(2):215-220.
[5] MANORANJAN D, LIU Huan. Feature selection for classification[J]. Intelligent data analysis, 1997, 1(3):131-156.
[6] SUN Yujing, WANG Fei, WANG Bo, et al. Correlation feature selection and mutual information theory based quantitative research on meteorological impact factors of module temperature for solar photovoltaic systems[J]. Energies, 2016, 10(1):7.
[7] CVETKOVIC D M, ROWLINSON P. Spectral graph theory[J]. Topics in algebraic graph theory, 2004:88-112.
[8] ZHAO Zheng, LIU Huan. Spectral feature selection for supervised and unsupervised learning[C]//Proceedings of the 24th international conference on Machine learning. ACM, 2007:1151-1157.
[9] ZHAO Zhou, HE Xiaofei, CAI Deng, et al. Graph regularized feature selection with data reconstruction[J]. IEEE transactions on knowledge and data engineering, 2016, 28(3):689-700.
[10] HE Xiaofei, CAI Deng, NIYONGI P. Laplacian score for feature selection[M].Cambridge:MIT Press, MA, 2005, 17:507-514.
[11] BELABBAS M A, WOLFE P J. Spectral method in machine learning and new strategies for very large datasets[J]. Proceedings of the national academy of sciences, 2009, 106(2):369-374.
[12] WANG Xiaodong, ZHANG Xu, ZENG Zhiqiang, et al. Unsupervised spectral feature selection with l 1-norm graph[J]. Neurocomputing, 2016, 200:47-54.
[13] 边肇祺,张学工.模式识别[M]. 2版. 北京:清华大学出版社, 2000.
[14] HALL M A. Correlation-based feature selection for discrete and numeric class machine learning[C]//the 17th International Conference on Machine Learning. San Francisco:Morgan Kaufmann, 2000:359-366.
[15] ANDREAS W, ANDREAS P. Attacks on stegan ographic systems[M]. Heidelberg, Berlin:Springer-Verlag, 2000:61-76.
[16] YU Lei, LIU Huan. Efficient feature selection via analysis of relevance and redundancy[J]. Journal of machine learning research, 2004, 5(1):1205-1224.
[17] HU Qinghua, YU Daren, LIU Jinfu, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information sciences, 2008, 178(18):3577-3594.
[18] CRAMMER K, GILAD-BACHRACH R, NAVOT A. Margin analysis of the lvq algorithm[C]//Advances in Neural Information Processing Systems. 2002, 14:462-469.
[19] FRIEDMAN M, A comparison of alternative tests of significance for the problem of m rankings[J]. The annals of mathematical statistics, 1940, 11(1):86-92.
[20] DUNN O J.Multiple comparisons among means[J]. Journal of the american statistical association, 1961, 56(293):52-64.

相似文献/References:

[1]孙正兴,张尧烨,李 彬.基于线性规划分类器的相关反馈技术[J].智能系统学报,2007,(03):34.
 SUN Zheng-xing,ZHANG Yao-ye,LI Bin.Applying relevance feedback with a linear programming classifier[J].CAAI Transactions on Intelligent Systems,2007,(04):34.
[2]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,(04):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
[3]顾成杰,张顺颐,杜安源.结合粗糙集和禁忌搜索的网络流量特征选择[J].智能系统学报,2011,(03):254.
 GU Chengjie,ZHANG Shunyi,DU Anyuan.Feature selection of network traffic using a rough set and tabu search[J].CAAI Transactions on Intelligent Systems,2011,(04):254.
[4]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,(03):189.
 SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,(04):189.
[5]曹晋,张莉,李凡长.一种基于支持向量数据描述的特征选择算法[J].智能系统学报,2015,(02):215.[doi:10.3969/j.issn.1673-4785.201405063]
 CAO Jin,ZHANG Li,LI Fanzhang.A noval support vector data description-based feature selection method[J].CAAI Transactions on Intelligent Systems,2015,(04):215.[doi:10.3969/j.issn.1673-4785.201405063]
[6]张佳骕,蒋亦樟,王士同.基于特征选择聚类方法的稀疏TSK模糊系统[J].智能系统学报,2015,(04):583.[doi:10.3969/j.issn.1673-4785.201412001]
 ZHANG Jiasu,JIANG Yizhang,WANG Shitong.Sparse TSK fuzzy system based on feature selection clustering method[J].CAAI Transactions on Intelligent Systems,2015,(04):583.[doi:10.3969/j.issn.1673-4785.201412001]
[7]陈玉明,吴克寿,李向军.基因表达数据在邻域关系中的特征选择[J].智能系统学报,2014,(02):210.[doi:10.3969/j.issn.1673-4785.201307014]
 CHEN Yuming,WU Keshou,LI Xiangjun.Gene expression data feature selection with neighborhood relation[J].CAAI Transactions on Intelligent Systems,2014,(04):210.[doi:10.3969/j.issn.1673-4785.201307014]
[8]郭雨萌,李国正.一种多标记数据的过滤式特征选择框架[J].智能系统学报,2014,(03):292.[doi:10.3969/j.issn.1673-4785.201403064]
 GUO Yumeng,LI Guozheng.A filtering framework for the multi-label feature selection[J].CAAI Transactions on Intelligent Systems,2014,(04):292.[doi:10.3969/j.issn.1673-4785.201403064]
[9]滕旭阳,董红斌,孙静.面向特征选择问题的协同演化方法[J].智能系统学报,2017,(01):24.[doi:10.11992/tis.201611029]
 TENG Xuyang,DONG Hongbin,SUN Jing.Co-evolutionary algorithm for feature selection[J].CAAI Transactions on Intelligent Systems,2017,(04):24.[doi:10.11992/tis.201611029]
[10]路子祥,屠黎阳,祖辰,等.基于脑连接网络的阿尔茨海默病临床变量值预测[J].智能系统学报,2017,(03):355.[doi:10.11992/tis.201607020]
 LU Zixiang,TU Liyang,ZU Chen,et al.Prediction of clinical variables in Alzheimer’s disease using brain connective networks[J].CAAI Transactions on Intelligent Systems,2017,(04):355.[doi:10.11992/tis.201607020]

备注/Memo

备注/Memo:
收稿日期:2016-09-08。
基金项目:国家自然科学基金项目(61303131,61379021);福建省高校新世纪优秀人才支持计划;福建省教育厅科技项目(JA14192)
作者简介:胡敏杰,女,1979年生,讲师,主要研究方向为数据挖掘;林耀进,男,1980年生,主要研究方向为数据挖掘、粒计算。主持国家自然科学基金2项。发表学术论文60余篇;杨红和,男,1969生,高级实验师,主要研究方向为数字校园。
通讯作者:胡敏杰,E-mail:zzhuminjie@sina.com.
更新日期/Last Update: 2017-08-25