[1]黄琴,钱文彬,王映龙,等.代价敏感数据的多标记特征选择算法[J].智能系统学报,2019,14(5):929-938.[doi:10.11992/tis.201807027]
HUANG Qin,QIAN Wenbin,WANG Yinglong,et al.Multi-label feature selection algorithm for cost-sensitive data[J].CAAI Transactions on Intelligent Systems,2019,14(5):929-938.[doi:10.11992/tis.201807027]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第5期
页码:
929-938
栏目:
学术论文—人工智能基础
出版日期:
2019-09-05
- Title:
-
Multi-label feature selection algorithm for cost-sensitive data
- 作者:
-
黄琴1,2, 钱文彬1,2, 王映龙1, 吴兵龙2
-
1. 江西农业大学 计算机与信息工程学院, 江西 南昌 330045;
2. 江西农业大学 软件学院, 江西 南昌 330045
- Author(s):
-
HUANG Qin1,2, QIAN Wenbin1,2, WANG Yinglong1, WU Binglong2
-
1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China;
2. School of Software, Jiangxi Agricultural University, Nanchang 330045, China
-
- 关键词:
-
特征选择; 属性约简; 代价敏感; 粗糙集; 粒计算; 多标记学习; 信息熵; 正态分布
- Keywords:
-
feature selection; attribute reduction; cost-sensitive; rough sets; granular computing; multi-label learning; information entropy; normal distribution
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201807027
- 摘要:
-
在多标记学习中,特征选择是提升多标记学习分类性能的有效手段。针对多标记特征选择算法计算复杂度较大且未考虑到现实应用中数据的获取往往需要花费代价,本文提出了一种面向代价敏感数据的多标记特征选择算法。该算法利用信息熵分析特征与标记之间的相关性,重新定义了一种基于测试代价的特征重要度准则,并根据服从正态分布的特征重要度和特征代价的标准差,给出一种合理的阈值选择方法,同时通过阈值剔除冗余和不相关特征,得到低总代价的特征子集。通过在多标记数据的实验对比和分析,表明该方法的有效性和可行性。
- Abstract:
-
In multi-label learning, feature selection is an effective means to improve multi-label learning classification performance. Aiming at the problem that the existing multi-label feature selection methods have high computation complexity and do not consider the cost of data acquisition in real-world applications, this paper proposes a multi-label feature selection algorithm for cost-sensitive data. The algorithm first analyzes the relevance between the feature and label based on information entropy, and redefines a criterion for feature significance by employing feature test cost; it then gives a reasonable threshold selection method on the basis of the standard deviation of feature significance and feature cost that obey normal distribution. At the same time, the algorithm derives the feature subsets with low total cost by removing redundant and irrelevant features according to a threshold. Finally, the effectiveness and feasibility of the proposed algorithm are verified by the comparison and analysis of the experimental results on a multi-labeled dataset.
备注/Memo
收稿日期:2018-07-26。
基金项目:国家自然科学基金项目(61502213,61662023);江西省自然科学基金项目(20161BAB212047);江西省教育厅科技项目(GJJ180200).
作者简介:黄琴,女,1993年生,硕士研究生,主要研究方向为粒计算与机器学习。取得计算机软件著作权2项,发表学术论文3篇;钱文彬,男,1984年生,副教授,博士,主要研究方向为粒计算、知识发现与机器学习。主持完成国家青年科学基金项目和江西省青年科学基金项目各1项。发表学术论文20余篇;王映龙,男,1970年生,教授,博士,主要研究方向为知识发现与数据挖掘。参与国家自然科学基金项目2项,先后主持江西省自然科学基金项目3项。发表学术论文20余篇。
通讯作者:钱文彬.E-mail:qianwenbin1027@126.com
更新日期/Last Update:
1900-01-01