[1]郭雨萌,李国正.一种多标记数据的过滤式特征选择框架[J].智能系统学报,2014,9(03):292-297.[doi:10.3969/j.issn.1673-4785.201403064]
 GUO Yumeng,LI Guozheng.A filtering framework for the multi-label feature selection[J].CAAI Transactions on Intelligent Systems,2014,9(03):292-297.[doi:10.3969/j.issn.1673-4785.201403064]
点击复制

一种多标记数据的过滤式特征选择框架(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第9卷
期数:
2014年03期
页码:
292-297
栏目:
学术论文—智能系统
出版日期:
2014-06-25

文章信息/Info

Title:
A filtering framework for the multi-label feature selection
作者:
郭雨萌 李国正
同济大学 电子与信息工程学院控制系, 上海 201804
Author(s):
GUO Yumeng LI Guozheng
School of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
关键词:
特征选择多标记过滤式卡方检验
Keywords:
feature selectionmulti-labelfilterCHI-square test
分类号:
TP391
DOI:
10.3969/j.issn.1673-4785.201403064
摘要:
提出一种过滤式的多标记数据特征选择框架, 并在卡方检验基础上进行实现和实验研究。该框架计算每个特征在各个类标上的卡方检验, 然后通过得分的统计值计算出每个特征的最终排序情况, 选取了最大、平均、最小3种统计值分别进行了实验比较。在5个评价指标、4个常用的多标记数据集和3个学习器上的对比实验表明, 3种得分统计方式各有优劣, 但都能提高多标记学习的效果。
Abstract:
The researchers of multi-label learning mainly focus on the classifier performance, regardless of the influence of the dataset feature. This paper proposes a filter framework of the multi-labeled data feature selection. The algorithm implementation and experiment were carried out based on the Chi-square test. This framework calculates the CHI-square test for each feature on each label, and then the ranking order of each feature is computed by the statistics of the score. This paper considers three different types of statistical data (average, maximum, minimum) for the experimental comparisons. The contrasting experiments with the four common multi-label datasets with three classifiers and five evaluation criteria show that these three score statistical methods share both superior and inferior characteristics, but still improve the performance for multi-label learning problems.

参考文献/References:

[1] TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Mining Multi-label Data[R]. Data Minging and Knowledge Discovery Handbook, 2010:667-685.
[2] TSOUMAKAS G, KATAKIS I. Multi-label classification:an overview[J]. International Journal of Data Wareh -ousing and Mining, 2007, 40(3):1-13.
[3] ZHANG M L, ZHANG K. Multi-label learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIG-KDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, 2010:999-1008.
[4] YANG Y, PEDERSEN J O. A comparative study on feature selection in text categorization[C]//Machine Learning International Workshop then Conference. Philadelphia, USA, 1997:412-420.
[5] SWATI S, GHATOL A, ASHOK C. Feature selection for medical diagnosis:Evaluation for cardiovascular diseases[J]. Expert Systems with Applications, 2013, 40(10):4146-4153.
[6] NEWTON S, EVERTON A C, MARIA C M, et al. A comparison of multi -label feature selection methods using the problem transformation approach[J]. Electronic Notes in Theoretical Computer Science, 2013, 292:135-151.
[7] 计智伟, 胡珉, 尹建新.特征选择算法综述[J].电子设计工程, 2011, 19(9):46-51.JI Zhiwei, HU Ming, YIN Jianxin. A survey of feature selection algorithm[J]. Electronic Design Engineering, 2011, 19(9):46-51.
[8] 邱云飞, 王威, 刘大有, 等.基于方差CHI的特征选择方法[J].计算机应用研究, 2012, 29(4):1301-1303.QIU Yunfei, WANG Wei, LIU Dayou, et al. CHI feature selection method based on variance[J]. Application Research of Computers, 2012, 29(4):1301-1303.
[9] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 39(10):1-43.
[10] MATTHEW R B, LUO J B, SHEN X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9):1757-1771.
[11] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J].Machine Learning, 2011, 85(3):333-359.
[12] ZHANG M L, ZHOU Z H. ML-kNN:a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7):2038-2048.

相似文献/References:

[1]孙正兴,张尧烨,李 彬.基于线性规划分类器的相关反馈技术[J].智能系统学报,2007,2(03):34.
 SUN Zheng-xing,ZHANG Yao-ye,LI Bin.Applying relevance feedback with a linear programming classifier[J].CAAI Transactions on Intelligent Systems,2007,2(03):34.
[2]陈晓峰,王士同,曹苏群.半监督多标记学习的基因功能分析[J].智能系统学报,2008,3(01):83.
 CHEN Xiao-feng,WANG Shi-tong,CAO Su-qun.Gene function analysis of semisupervised multilabel learning[J].CAAI Transactions on Intelligent Systems,2008,3(03):83.
[3]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,4(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,4(03):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
[4]顾成杰,张顺颐,杜安源.结合粗糙集和禁忌搜索的网络流量特征选择[J].智能系统学报,2011,6(03):254.
 GU Chengjie,ZHANG Shunyi,DU Anyuan.Feature selection of network traffic using a rough set and tabu search[J].CAAI Transactions on Intelligent Systems,2011,6(03):254.
[5]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,8(03):189.
 SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,8(03):189.
[6]曹晋,张莉,李凡长.一种基于支持向量数据描述的特征选择算法[J].智能系统学报,2015,10(02):215.[doi:10.3969/j.issn.1673-4785.201405063]
 CAO Jin,ZHANG Li,LI Fanzhang.A noval support vector data description-based feature selection method[J].CAAI Transactions on Intelligent Systems,2015,10(03):215.[doi:10.3969/j.issn.1673-4785.201405063]
[7]张佳骕,蒋亦樟,王士同.基于特征选择聚类方法的稀疏TSK模糊系统[J].智能系统学报,2015,10(04):583.[doi:10.3969/j.issn.1673-4785.201412001]
 ZHANG Jiasu,JIANG Yizhang,WANG Shitong.Sparse TSK fuzzy system based on feature selection clustering method[J].CAAI Transactions on Intelligent Systems,2015,10(03):583.[doi:10.3969/j.issn.1673-4785.201412001]
[8]陈玉明,吴克寿,李向军.基因表达数据在邻域关系中的特征选择[J].智能系统学报,2014,9(02):210.[doi:10.3969/j.issn.1673-4785.201307014]
 CHEN Yuming,WU Keshou,LI Xiangjun.Gene expression data feature selection with neighborhood relation[J].CAAI Transactions on Intelligent Systems,2014,9(03):210.[doi:10.3969/j.issn.1673-4785.201307014]
[9]滕旭阳,董红斌,孙静.面向特征选择问题的协同演化方法[J].智能系统学报,2017,12(01):24.[doi:10.11992/tis.201611029]
 TENG Xuyang,DONG Hongbin,SUN Jing.Co-evolutionary algorithm for feature selection[J].CAAI Transactions on Intelligent Systems,2017,12(03):24.[doi:10.11992/tis.201611029]
[10]邵东恒,杨文元,赵红.应用k-means算法实现标记分布学习[J].智能系统学报,2017,12(03):325.[doi:10.11992/tis.201704024]
 SHAO Dongheng,YANG Wenyuan,ZHAO Hong.Label distribution learning based on k-means algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(03):325.[doi:10.11992/tis.201704024]

备注/Memo

备注/Memo:
收稿日期:2014-03-25。
基金项目:国家自然科学基金资助项目(61273305)
作者简介:郭雨萌,男,1989年生,博士研究生,主要研究方向为模式识别与机器学习等。
通讯作者:李国正,男,1977年生,研究员,博士生导师,博士,中国人工智能学会机器学习专业委员会常务委员,主要研究方向为模式识别和生物医学数据挖掘,在研和完成国家自然科学基金项目、上海市科委"创新行动计划"重大项目子课题等多项课题,发表学术论文100余篇,其中SCI检索40余篇,EI检索50余篇,参与撰写专著6部,主持翻译专著1部,E-mail:gzli@tongji.edu.cn。
更新日期/Last Update: 1900-01-01