[1]黄琴,钱文彬,王映龙,等.代价敏感数据的多标记特征选择算法[J].智能系统学报,2019,14(05):929-938.[doi:10.11992/tis.201807027]
 HUANG Qin,QIAN Wenbin,WANG Yinglong,et al.Multi-label feature selection algorithm for cost-sensitive data[J].CAAI Transactions on Intelligent Systems,2019,14(05):929-938.[doi:10.11992/tis.201807027]
点击复制

代价敏感数据的多标记特征选择算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年05期
页码:
929-938
栏目:
出版日期:
2019-09-05

文章信息/Info

Title:
Multi-label feature selection algorithm for cost-sensitive data
作者:
黄琴12 钱文彬12 王映龙1 吴兵龙2
1. 江西农业大学 计算机与信息工程学院, 江西 南昌 330045;
2. 江西农业大学 软件学院, 江西 南昌 330045
Author(s):
HUANG Qin12 QIAN Wenbin12 WANG Yinglong1 WU Binglong2
1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China;
2. School of Software, Jiangxi Agricultural University, Nanchang 330045, China
关键词:
特征选择属性约简代价敏感粗糙集粒计算多标记学习信息熵正态分布
Keywords:
feature selectionattribute reductioncost-sensitiverough setsgranular computingmulti-label learninginformation entropynormal distribution
分类号:
TP391
DOI:
10.11992/tis.201807027
摘要:
在多标记学习中,特征选择是提升多标记学习分类性能的有效手段。针对多标记特征选择算法计算复杂度较大且未考虑到现实应用中数据的获取往往需要花费代价,本文提出了一种面向代价敏感数据的多标记特征选择算法。该算法利用信息熵分析特征与标记之间的相关性,重新定义了一种基于测试代价的特征重要度准则,并根据服从正态分布的特征重要度和特征代价的标准差,给出一种合理的阈值选择方法,同时通过阈值剔除冗余和不相关特征,得到低总代价的特征子集。通过在多标记数据的实验对比和分析,表明该方法的有效性和可行性。
Abstract:
In multi-label learning, feature selection is an effective means to improve multi-label learning classification performance. Aiming at the problem that the existing multi-label feature selection methods have high computation complexity and do not consider the cost of data acquisition in real-world applications, this paper proposes a multi-label feature selection algorithm for cost-sensitive data. The algorithm first analyzes the relevance between the feature and label based on information entropy, and redefines a criterion for feature significance by employing feature test cost; it then gives a reasonable threshold selection method on the basis of the standard deviation of feature significance and feature cost that obey normal distribution. At the same time, the algorithm derives the feature subsets with low total cost by removing redundant and irrelevant features according to a threshold. Finally, the effectiveness and feasibility of the proposed algorithm are verified by the comparison and analysis of the experimental results on a multi-labeled dataset.

参考文献/References:

[1] ZHANG Minling, ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE transactions on knowledge and data engineering, 2014, 26(8):1819-1837.
[2] TSOUMAKAS G, KATAKIS I, VLAHAVAS I. Random -Labelsets for multilabel classification[J]. IEEE transactions on knowledge and data engineering, 2011, 23(7):1079-1089.
[3] 郑伟, 王朝坤, 刘璋, 等. 一种基于随机游走模型的多标签分类算法[J]. 计算机学报, 2010, 33(8):1418-1426 ZHENG Wei, WANG Chaokun, LIU Zhang, et al. A multi-label classification algorithm based on random walk model[J]. Chinese journal of computers, 2010, 33(8):1418-1426
[4] 李宇峰, 黄圣君, 周志华. 一种基于正则化的半监督多标记学习方法[J]. 计算机研究与发展, 2012, 49(6):1272-1278 LI Yufeng, HUANG Shengjun, ZHOU Zhihua. Regularized semi-supervised multi-label learning[J]. Journal of computer research and development, 2012, 49(6):1272-1278
[5] PAWLAK Z. Rough sets[J]. International journal of computer and information sciences, 1982, 11(5):341-356.
[6] PAWLAK Z, SO-Winski R. Rough set approach to multi-attribute decision analysis[J]. European journal of operational research, 1994, 72(3):443-459.
[7] 刘清. Rough集及Rough推理[M]. 北京:科学出版社, 2001.
[8] SUN Liang, JI Shuiwang, YE Jieping. Multi-label dimensionality reduction[M]. Florida:CRC Press, 2013:20–22.
[9] ZHANG Yin, ZHOU Zhihua. Multi-label dimensionality reduction via dependence maximization[C]//Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago, Illinois, 2008:1503?1505.
[10] YU Kai, YU Shipeng, TRESP V. Multi-label informed latent semantic indexing[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil, 2005:258?265.
[11] 段洁, 胡清华, 张灵均, 等. 基于邻域粗糙集的多标记分类特征选择算法[J]. 计算机研究与发展, 2015, 52(1):56-65 DUAN Jie, HU Qinghua, ZHANG Lingjun, et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of computer research and development, 2015, 52(1):56-65
[12] 王晨曦, 林耀进, 唐莉, 等. 基于信息粒化的多标记特征选择算法[J]. 模式识别与人工智能, 2018, 31(2):123-131 WANG Chenxi, LIN Yaojin, TANG Li, et al. Multi-label feature selection based on information granulation[J]. Pattern recognition and artificial intelligence, 2018, 31(2):123-131
[13] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied soft computing, 2016, 38:244-256.
[14] 刘景华, 林梦雷, 王晨曦, 等. 基于局部子空间的多标记特征选择算法[J]. 模式识别与人工智能, 2016, 29(3):240-251 LIU Jinghua, LIN Menglei, WANG Chenxi, et al. Multi-label feature selection algorithm based on local subspace[J]. Pattern recognition and artificial intelligence, 2016, 29(3):240-251
[15] LEE J, LIM H, KIM D W. Approximating mutual information for multi-label feature selection[J]. Electronics letters, 2012, 48(15):929-930.
[16] 张振海, 李士宁, 李志刚, 等. 一类基于信息熵的多标签特征选择算法[J]. 计算机研究与发展, 2013, 50(6):1177-1184 ZHANG Zhenhai, LI Shining, LI Zhigang, et al. Multi-label feature selection algorithm based on information entropy[J]. Journal of computer research and development, 2013, 50(6):1177-1184
[17] YANG Qiang, WU Xindong. 10 challenging problems in data mining research[J]. International journal of information technology & decision making, 2006, 5(4):597-604.
[18] 徐章艳, 刘作鹏, 杨炳儒, 等. 一个复杂度为max(O(|C||U|),O(|C|2|U/C|))的快速属性约简算法[J]. 计算机学报, 2006, 29(3):391-399 XU Zhangyan, LIU Zuopeng, YANG Bingru, et al. A quick attribute reduction algorithm with complexity of max(O(|C||U|),O(|C|2|U/C|))[J]. Chinese journal of computers, 2006, 29(3):391-399
[19] WU Binglong, QIAN Wenbin, HUANG Qin, et al. Cost-Sensitive multi-label feature selection algorithm based on positive approximation[C]//Fuzzy Systems and Data Mining IV-Proceedings of FSDM 2018. Bangkok, Thailand, 2018:381?386.
[20] QIAN Yuhua, LIANG Jiye, PEDRYCZ W, et al. Positive approximation:an accelerator for attribute reduction in rough set theory[J]. Artificial intelligence, 2010, 174(9/10):597-618.
[21] WEI Wei, WU Xiaoying, LIANG Jiye, et al. Discernibility matrix based incremental attribute reduction for dynamic data[J]. Knowledge-based systems, 2018, 140:142-157.
[22] ELISSEEFF A, WESTON J. A kernel method for multi-labelled classification[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Vancouver, Canada, 2001:681-687.
[23] TROHIDIS K, TSOUMAKAS G, KALLIRIS G, et al. Multi-label classification of music into emotions[C]//Proceedings of the 9th International Society for Music Information Retrieval Conference. Philadelphia, PA, 2008:325?330.
[24] BRIGGS F, HUANG Yonghong, RAICH R, et al. The 9th annual MLSP competition:new methods for acoustic classification of multiple simultaneous bird species in a noisy environment[C]//Proceedings of 2013 IEEE International Workshop on Machine Learning for Signal Processing. Southampton, UK, 2013:22?25.

相似文献/References:

[1]伞 冶,叶玉玲.粗糙集理论及其在智能系统中的应用[J].智能系统学报,2007,2(02):40.
 SAN Ye,YE Yu-ling.Rough set theory and its application in the intelligent systems[J].CAAI Transactions on Intelligent Systems,2007,2(05):40.
[2]孙正兴,张尧烨,李 彬.基于线性规划分类器的相关反馈技术[J].智能系统学报,2007,2(03):34.
 SUN Zheng-xing,ZHANG Yao-ye,LI Bin.Applying relevance feedback with a linear programming classifier[J].CAAI Transactions on Intelligent Systems,2007,2(05):34.
[3]张志飞,苗夺谦.基于粗糙集的文本分类特征选择算法[J].智能系统学报,2009,4(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
 ZHANG Zhi-fei,MIAO Duo-qian.Feature selection for text categorization based on rough set[J].CAAI Transactions on Intelligent Systems,2009,4(05):453.[doi:10.3969/j.issn.1673-4785.2009.05.011]
[4]马胜蓝,叶东毅.一种带禁忌搜索的粒子并行子群最小约简算法[J].智能系统学报,2011,6(02):132.
 MA Shenglan,YE Dongyi.A minimum reduction algorithm based on parallel particle subswarm optimization with tabu search capability[J].CAAI Transactions on Intelligent Systems,2011,6(05):132.
[5]顾成杰,张顺颐,杜安源.结合粗糙集和禁忌搜索的网络流量特征选择[J].智能系统学报,2011,6(03):254.
 GU Chengjie,ZHANG Shunyi,DU Anyuan.Feature selection of network traffic using a rough set and tabu search[J].CAAI Transactions on Intelligent Systems,2011,6(05):254.
[6]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(05):183.[doi:10.3969/j.issn.1673-4785.201209056]
[7]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,8(03):189.
 SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,8(05):189.
[8]曹晋,张莉,李凡长.一种基于支持向量数据描述的特征选择算法[J].智能系统学报,2015,10(02):215.[doi:10.3969/j.issn.1673-4785.201405063]
 CAO Jin,ZHANG Li,LI Fanzhang.A noval support vector data description-based feature selection method[J].CAAI Transactions on Intelligent Systems,2015,10(05):215.[doi:10.3969/j.issn.1673-4785.201405063]
[9]张佳骕,蒋亦樟,王士同.基于特征选择聚类方法的稀疏TSK模糊系统[J].智能系统学报,2015,10(04):583.[doi:10.3969/j.issn.1673-4785.201412001]
 ZHANG Jiasu,JIANG Yizhang,WANG Shitong.Sparse TSK fuzzy system based on feature selection clustering method[J].CAAI Transactions on Intelligent Systems,2015,10(05):583.[doi:10.3969/j.issn.1673-4785.201412001]
[10]乔丽娟,徐章艳,谢小军,等.基于知识粒度的不完备决策表的属性约简算法[J].智能系统学报,2016,11(1):129.[doi:10.11992/tis.201506029]
 QIAO Lijuan,XU Zhangyan,XIE Xiaojun,et al.Efficient attribute reduction algorithm for an incomplete decision table based on knowledge granulation[J].CAAI Transactions on Intelligent Systems,2016,11(05):129.[doi:10.11992/tis.201506029]
[11]陈曼如,张楠,童向荣,等.集值信息系统的快速正域约简[J].智能系统学报,2019,14(03):471.[doi:10.11992/tis.201804059]
 CHEN Manru,ZHANG Nan,TONG Xiangrong,et al.Quick positive region reduction in set-valued information systems[J].CAAI Transactions on Intelligent Systems,2019,14(05):471.[doi:10.11992/tis.201804059]

备注/Memo

备注/Memo:
收稿日期:2018-07-26。
基金项目:国家自然科学基金项目(61502213,61662023);江西省自然科学基金项目(20161BAB212047);江西省教育厅科技项目(GJJ180200).
作者简介:黄琴,女,1993年生,硕士研究生,主要研究方向为粒计算与机器学习。取得计算机软件著作权2项,发表学术论文3篇;钱文彬,男,1984年生,副教授,博士,主要研究方向为粒计算、知识发现与机器学习。主持完成国家青年科学基金项目和江西省青年科学基金项目各1项。发表学术论文20余篇;王映龙,男,1970年生,教授,博士,主要研究方向为知识发现与数据挖掘。参与国家自然科学基金项目2项,先后主持江西省自然科学基金项目3项。发表学术论文20余篇。
通讯作者:钱文彬.E-mail:qianwenbin1027@126.com
更新日期/Last Update: 1900-01-01