<-上一篇/Previous Article 下一篇/Next Article->

[1]胡军,王海峰.基于加权信息粒化的多标记数据特征选择算法[J].智能系统学报,2023,18(3):619-628.[doi:10.11992/tis.202111058]
　HU Jun,WANG Haifeng.Feature selection algorithm of multi-labeled data based on weighted information granulation[J].CAAI Transactions on Intelligent Systems,2023,18(3):619-628.[doi:10.11992/tis.202111058]

点击复制

基于加权信息粒化的多标记数据特征选择算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第3期页码: 619-628 栏目: 学术论文—人工智能基础出版日期: 2023-07-05

Title:: Feature selection algorithm of multi-labeled data based on weighted information granulation

作者:: 胡军^1,2, 王海峰^1,2; 1. 重庆邮电大学计算机科学与技术学院, 重庆 400065;
2. 重庆邮电大学计算智能重庆市重点实验室, 重庆 400065

Author(s):: HU Jun^1,2, WANG Haifeng^1,2; 1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

关键词:: 邻域粗糙集; 信息粒化; 多标记学习; 标记重要性; 标记关系; 特征权重; 特征选择; 谱聚类

Keywords:: neighborhood rough set; information granulation; multi-label learning; label significance; label relationship; feature weight; feature selection; spectral clustering

分类号:: TP391

DOI:: 10.11992/tis.202111058

摘要:: 特征选择能去除不相关和冗余的特征，是解决多标记数据维度灾难的有效工具。现有的多标记特征选择算法没有考虑标记空间存在的相关性，认为每个样本的相关标记的重要性相同，并且忽略了特征空间可能是标记重要性差异形成的内在因素，使得选择的特征不能精确全面地刻画样本且计算过程复杂。为此，本文利用标记间的相关性对标记空间进行划分以简化计算，并定义标记重要性度量和特征权重，在此基础上提出了一种基于加权信息粒化的多标记特征选择算法。通过在真实多标记数据集上的实验对比分析，本文提出的算法在各项评价指标上均优于其他对比算法，验证了算法的有效性和可行性。

Abstract:: Feature selection can remove irrelevant and redundant features. It is an efficient tool to solve the disaster of multi-labeled data dimensions. Existing multi-labeled feature selection algorithms did not take the correlation of label space into account, and considered that the relevant labels of each sample have the same importance, and ignored that the feature space may be the internal factor caused by the difference of label importance, so that the selected features can not accurately and comprehensively describe the samples and the calculation process is very complex. In this paper, the correlation between labels is used to divide the label space to simplify the calculation. Then, the label importance measure and feature weight are defined. And further, a feature selection algorithm of multi-label data based on weighted information granulation is proposed. The comparison and analysis on real multi-labeled data set of experiment show that the proposed algorithm is superior to other comparison algorithms in all evaluation indicators, which verifies effectiveness and feasibility of the algorithm.

参考文献/References:: [1] QIAN Wenbin, HUANG Jintao, WANG Yinglong, et al. Mutual information-based label distribution feature selection for multi-label learning[J]. Knowledge-based systems, 2020, 195(5): 105684.
[2] 高琪, 李德玉, 王素格. 基于模糊不一致对的多标记属性约简[J]. 智能系统学报, 2020, 15(2): 374–385
GAO Qi, LI Deyu, WANG Suge. Multi-label attribute reduction based on fuzzy inconsistency pairs[J]. CAAI transactions on intelligent systems, 2020, 15(2): 374–385
[3] SUN Lin, WANG Tianxiang, DING Weiping, et al. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification[J]. Information sciences, 2021, 578: 887–912.
[4] LIU Jinghua, LI Yuwen, WENG Wei, et al. Feature selection for multi-label learning with streaming label[J]. Neurocomputing, 2020, 387: 268–278.
[5] BOUTELL M R, LUO Jiebo, SHEN Xipeng, et al. Learning multi-label scene classification[J]. Pattern recognition, 2004, 37(9): 1757–1771.
[6] READ J, PFAHRINGER B, HOLMES G. Multi-label classification using ensembles of pruned sets[C]//2008 Eighth IEEE International Conference on Data Mining. Pisa: IEEE, 2009: 995?1000.
[7] READ J. A pruned problem transformation method for multi-label classification[C]// New Zealand Computer Science Research Student Conference 2008. Christchurch: YUMPU, 2008: 143150: 41.
[8] DOQUIRE G, VERLEYSEN M. Feature selection for multi-label classification problems[C]//International Work-Conference on Artificial Neural Networks. Berlin: Springer, 2011: 9?16.
[9] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern recognition letters, 2013, 34(3): 349–357.
[10] LEE J, KIM D W. Mutual Information-based multi-label feature selection using interaction information[J]. Expert systems with applications, 2015, 42(4): 2013–2025.
[11] LEE J, KIM D W. SCLS: Multi-label feature selection based on scalable criterion for large label set[J]. Pattern recognition, 2017, 66: 342–352.
[12] WANG Yingyao, DAI Jianhua. Label distribution feature selection based on mutual information in fuzzy rough set theory[C]//2019 International Joint Conference on Neural Networks. Budapest: IEEE, 2019: 1?2.
[13] DAI Jianhua, CHEN Jiaolong, LIU Ye, et al. Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation[J]. Knowledge-based systems, 2020, 207: 106342.
[14] QIAN Wenbin, HUANG Jintao, WANG Yinglong, et al. Label distribution feature selection for multi-label classification with rough set[J]. International journal of approximate reasoning, 2021, 128: 32–55.
[15] WANG Chenxi, LIN Yaojin, LIU Jinghua. Feature selection for multi-label learning with missing labels[J]. Applied intelligence, 2019, 49(8): 3027–3042.
[16] ZHANG Ping, GAO Wanfu. Feature relevance term variation for multi-label feature selection[J]. Applied intelligence, 2021, 51(7): 5095–5110.
[17] ZHANG Ping, LIU Guixia, GAO Wanfu, et al. Multi-label feature selection considering label supplementation[J]. Pattern recognition, 2021, 120: 108137.
[18] QIAN Wenbin, LONG Xuandong, WANG Yinglong, et al. Multi-label feature selection based on label distribution and feature complementarity[J]. Applied soft computing, 2020, 90: 106167.
[19] HU Qinghua, YU Daren, LIU Jinfu, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information sciences, 2008, 178(18): 3577–3594.
[20] LIU Keyu, LI Tianrui, YANG Xibei, et al. Granular cabin: an efficient solution to neighborhood learning in big data[J]. Information sciences, 2022, 583: 189–201.
[21] CHEN Yan, LIU Keyu, SONG Jingjing, et al. Attribute group for attribute reduction[J]. Information sciences, 2020, 535: 64–80.
[22] HU Meng, TSANG E C C, GUO Yanting, et al. A novel approach to attribute reduction based on weighted neighborhood rough sets[J]. Knowledge-based systems, 2021, 220: 106908.
[23] JIANG Zehua, DOU Huili, SONG Jingjing, et al. Data-guided multi-granularity selector for attribute redu-ction[J]. Applied intelligence, 2021, 51(2): 876–888.
[24] 段洁, 胡清华, 张灵均, 等. 基于邻域粗糙集的多标记分类特征选择算法[J]. 计算机研究与发展, 2015, 52(1): 56–65
DUAN Jie, HU Qinghua, ZHANG Lingjun, et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of computer research and development, 2015, 52(1): 56–65
[25] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied soft computing, 2016, 38: 244–256.
[26] LONG Xuandong, QIAN Wenbin, WANG Yinglong, et al. Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement[J]. Applied intelligence, 2021, 51(4): 2210–2232.
[27] 黄锦涛, 钱文彬, 王映龙. 基于标记增强的多标记代价敏感特征选择算法[J]. 小型微型计算机系统, 2020, 41(4): 685–691
HUANG Jintao, QIAN Wenbin, WANG Yinglong. Multi-label cost-sensitive feature selection algorithm based on label enhancement[J]. Journal of Chinese computer systems, 2020, 41(4): 685–691
[28] GONZALEZ-LOPEZ J, VENTURA S, CANO A. Distributed multi-label feature selection using individual mutual information measures[J]. Knowledge-based systems, 2020, 188: 105052.
[29] SUN Zhenqiang, ZHANG Jia, DAI Liang, et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing, 2019, 329: 447–456.
[30] KWAK N, CHOI C H. Input feature selection for classification problems[J]. IEEE transactions on neural networks, 2002, 13(1): 143–159.
[31] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern recognition, 2007, 40(7): 2038–2048.
[32] KASHEF S, NEZAMABADI-POUR H, NIKPOUR B. Multilabel feature selection: a comprehensive review and guiding experiments[J]. Wiley interdisciplinary reviews: data mining and knowledge discovery, 2018, 8(2): e1240.

相似文献/References:: [1]李京政,杨习贝,窦慧莉,等.重要度集成的属性约简方法研究[J].智能系统学报,2018,13(3):414.[doi:10.11992/tis.201706080]
　LI Jingzheng,YANG Xibei,DOU Huili,et al.Research on ensemble significance based attribute reduction approach[J].CAAI Transactions on Intelligent Systems,2018,13():414.[doi:10.11992/tis.201706080]
[2]王雯,康向平,武燕.概念格在不完备形式背景中的知识获取模型[J].智能系统学报,2019,14(5):1048.[doi:10.11992/tis.201809021]
　WANG Wen,KANG Xiangping,WU Yan.Knowledge acquisition model of concept lattice in an incomplete formal context[J].CAAI Transactions on Intelligent Systems,2019,14():1048.[doi:10.11992/tis.201809021]
[3]高媛,陈向坚,王平心,等.面向一致性样本的属性约简[J].智能系统学报,2019,14(6):1170.[doi:10.11992/tis.201905051]
　GAO Yuan,CHEN Xiangjian,WANG Pingxin,et al.Attribute reduction over consistent samples[J].CAAI Transactions on Intelligent Systems,2019,14():1170.[doi:10.11992/tis.201905051]
[4]杨志勇,江峰,于旭,等.采用离群点检测技术的混合型数据聚类初始化方法[J].智能系统学报,2023,18(1):56.[doi:10.11992/tis.202203031]
　YANG Zhiyong,JIANG Feng,YU Xu,et al.Mixed data clustering initialization method using outlier detection technology[J].CAAI Transactions on Intelligent Systems,2023,18():56.[doi:10.11992/tis.202203031]

备注/Memo

收稿日期:2021-11-30。
基金项目:国家自然科学基金项目（61936001,62276038）；重庆市自然科学基金项目（cstc2019jcyj-cxttX0002，cstc2021ycjh-bgzxm0013）；重庆市教委重点合作项目（HZ2021008）.
作者简介:胡军,教授,博士,主要研究方向为多粒度认知计算、人工智能安全和图分析与挖掘,近年来主持参与国家重点研发计划、国家自然科学基金、重庆市自然科学基金等科研项目10多项,授权国家发明专利5项,发表科学研究论文60多篇,出版专著3部;王海峰,硕士研究生,主要研究方向为粒计算、粗糙集
通讯作者:胡军.E-mail:hujun@cqupt.edu.cn

更新日期/Last Update: 1900-01-01

基于加权信息粒化的多标记数据特征选择算法 PDF下载HTML

备注/Memo

基于加权信息粒化的多标记数据特征选择算法

PDF下载 HTML