[1]CHENG Kangming,XIONG Weili.Three-optimal semi-supervised regression algorithm under self-training framework[J].CAAI Transactions on Intelligent Systems,2020,15(3):568-577.[doi:10.11992/tis.201905033]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 3
Page number:
568-577
Column:
学术论文—机器学习
Public date:
2020-05-05
- Title:
-
Three-optimal semi-supervised regression algorithm under self-training framework
- Author(s):
-
CHENG Kangming1; XIONG Weili2
-
1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Key Laboratory of Advanced Process Control for Light Industry, Jiangnan University, Wuxi 214122, China
-
- Keywords:
-
industrial production; unlabeled samples; filter; semi-supervised regression; similarity; Gaussian process regression; confidence judgment; self-training; prediction
- CLC:
-
TP274
- DOI:
-
10.11992/tis.201905033
- Abstract:
-
In industrial production, due to factors such as the cost of analyzing the dominant variable, there may be cases in which there are few labeled and many unlabeled samples. To improve performance and accuracy in the use of unlabeled samples, we propose the use of a three-optimal semi-supervised regression algorithm under a self-training framework. This algorithm first filters unlabeled and labeled samples to ensure similarity between these two types of data and improve the accuracy of predicting the unlabeled samples. Then, a model is established based on the selected labeled samples using Gaussian process regression to predict the unlabeled samples, from which pseudo-label samples are obtained. Then, by determining the confidence levels of the prediction of the pseudo-label samples, samples with higher confidence levels are filtered and used to update the initial samples. Finally, through multiple filtering loops, a self-training framework is applied to improve the utilization of unlabeled samples. By modeling and simulating debutanizer process data, the proposed method was confirmed to have superior prediction performance when there are an insufficient number of labeled samples.