[1]GUO Lingli,CHEN Xiuhong.Robust unsupervised feature selection via multistep Markov probability and latent representation[J].CAAI Transactions on Intelligent Systems,2023,18(5):1017-1029.[doi:10.11992/tis.202208013]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
18
Number of periods:
2023 5
Page number:
1017-1029
Column:
学术论文—机器学习
Public date:
2023-09-05
- Title:
-
Robust unsupervised feature selection via multistep Markov probability and latent representation
- Author(s):
-
GUO Lingli; CHEN Xiuhong
-
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
-
- Keywords:
-
feature selection; latent representation learning; multistep Markov transition probability; unsupervised; non-negative matrix factorization; sparse regression; L2; 1-norm; dimensionality reduction
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202208013
- Abstract:
-
Unsupervised feature selection is a significant dimensionality reduction technique in machine learning and data mining. However, current unsupervised feature selection methods primarily focus on learning the manifold structure of the data from the adjacency matrix, ignoring the association between non-adjacent data pairs. Second, these methods often assume that the data instances are independent and identically distributed, but in reality, the data samples originate from heterogeneous sources, and this assumption is often untenable. Additionally, the measure of feature importance in the original data space is affected by noise in the data and features. To address the aforementioned problems, this study proposes a robust unsupervised feature selection method based on multistep Markov probability and latent representation (MMLRL). The key idea is to learn the manifold structure between the data points through the maximum multistep Markov transition probability. Subsequently, a symmetric non-negative matrix factorization model was used to learn the latent representation of the data. Finally, the feature selection is performed in the latent representation space. At the same time, the proposed algorithm is evaluated on six different types of datasets to validate its effectiveness.