[1]过伶俐,陈秀宏.潜在多步马尔可夫概率的鲁棒无监督特征选择[J].智能系统学报,2023,18(5):1017-1029.[doi:10.11992/tis.202208013]
GUO Lingli,CHEN Xiuhong.Robust unsupervised feature selection via multistep Markov probability and latent representation[J].CAAI Transactions on Intelligent Systems,2023,18(5):1017-1029.[doi:10.11992/tis.202208013]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第5期
页码:
1017-1029
栏目:
学术论文—机器学习
出版日期:
2023-09-05
- Title:
-
Robust unsupervised feature selection via multistep Markov probability and latent representation
- 作者:
-
过伶俐, 陈秀宏
-
江南大学 人工智能与计算机学院, 江苏 无锡 214122
- Author(s):
-
GUO Lingli, CHEN Xiuhong
-
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
特征选择; 潜在表示学习; 多步马尔可夫转移概率; 无监督; 非负矩阵分解; 稀疏回归; L2; 1范数; 降维
- Keywords:
-
feature selection; latent representation learning; multistep Markov transition probability; unsupervised; non-negative matrix factorization; sparse regression; L2; 1-norm; dimensionality reduction
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202208013
- 摘要:
-
无监督特征选择是机器学习和数据挖掘中的一种重要的降维技术。然而当前的无监督特征选择方法侧重于从数据的邻接矩阵中学习数据的流形结构,忽视非邻接数据对之间的关联。其次这些方法都假设数据实例具有独立同一性,但现实中的数据样本其来源是不同的,这样的假设就不成立。此外,在原始数据空间中特征重要性的衡量会受到数据和特征中的噪声影响。基于以上问题,本文提出了潜在多步马尔可夫概率的鲁棒无监督特征选择方法(unsupervised feature selection via multi-step Markov probability and latent representation, MMLRL),其思想是通过最大多步马尔可夫转移概率学习数据流形结构,然后通过对称非负矩阵分解模型学习数据的潜在表示,最后在数据的潜在表示空间中选择特征。同时在6个不同类型的数据集上验证了所提出算法的有效性。
- Abstract:
-
Unsupervised feature selection is a significant dimensionality reduction technique in machine learning and data mining. However, current unsupervised feature selection methods primarily focus on learning the manifold structure of the data from the adjacency matrix, ignoring the association between non-adjacent data pairs. Second, these methods often assume that the data instances are independent and identically distributed, but in reality, the data samples originate from heterogeneous sources, and this assumption is often untenable. Additionally, the measure of feature importance in the original data space is affected by noise in the data and features. To address the aforementioned problems, this study proposes a robust unsupervised feature selection method based on multistep Markov probability and latent representation (MMLRL). The key idea is to learn the manifold structure between the data points through the maximum multistep Markov transition probability. Subsequently, a symmetric non-negative matrix factorization model was used to learn the latent representation of the data. Finally, the feature selection is performed in the latent representation space. At the same time, the proposed algorithm is evaluated on six different types of datasets to validate its effectiveness.
更新日期/Last Update:
1900-01-01