[1]程康明,熊伟丽.一种自训练框架下的三优选半监督回归算法[J].智能系统学报,2020,15(3):568-577.[doi:10.11992/tis.201905033]
CHENG Kangming,XIONG Weili.Three-optimal semi-supervised regression algorithm under self-training framework[J].CAAI Transactions on Intelligent Systems,2020,15(3):568-577.[doi:10.11992/tis.201905033]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第3期
页码:
568-577
栏目:
学术论文—机器学习
出版日期:
2020-05-05
- Title:
-
Three-optimal semi-supervised regression algorithm under self-training framework
- 作者:
-
程康明1, 熊伟丽2
-
1. 江南大学 物联网工程学院,江苏 无锡 214122;
2. 江南大学 轻工过程先进控制教育部实验室,江苏 无锡 214122
- Author(s):
-
CHENG Kangming1, XIONG Weili2
-
1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Key Laboratory of Advanced Process Control for Light Industry, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
工业生产; 无标签样本; 优选; 半监督回归; 相似性; 高斯过程回归; 置信度判断; 自训练; 预测
- Keywords:
-
industrial production; unlabeled samples; filter; semi-supervised regression; similarity; Gaussian process regression; confidence judgment; self-training; prediction
- 分类号:
-
TP274
- DOI:
-
10.11992/tis.201905033
- 摘要:
-
工业生产过程数据由于主导变量分析代价等因素可能出现有标签样本少而无标签样本多的情况,为提升对无标签样本利用的准确性与充分性,提出一种自训练框架下的三优选半监督回归算法。对无标签样本与有标签样本进行优选,保证两类数据的相似性,以提高无标签样本预测的准确性;利用高斯过程回归方法对所选有标签样本集建模,预测所选无标签样本集,得到伪标签样本集;通过对伪标签样本集置信度进行判断,优选出置信度高的样本用于更新初始样本集;为了进一步提高无标签样本利用的充分性,在自训练框架下,进行多次循环筛选提高无标签样本的利用率。通过对脱丁烷塔过程实际数据的建模仿真,验证了所提方法在较少有标签样本情况下的良好预测性能。
- Abstract:
-
In industrial production, due to factors such as the cost of analyzing the dominant variable, there may be cases in which there are few labeled and many unlabeled samples. To improve performance and accuracy in the use of unlabeled samples, we propose the use of a three-optimal semi-supervised regression algorithm under a self-training framework. This algorithm first filters unlabeled and labeled samples to ensure similarity between these two types of data and improve the accuracy of predicting the unlabeled samples. Then, a model is established based on the selected labeled samples using Gaussian process regression to predict the unlabeled samples, from which pseudo-label samples are obtained. Then, by determining the confidence levels of the prediction of the pseudo-label samples, samples with higher confidence levels are filtered and used to update the initial samples. Finally, through multiple filtering loops, a self-training framework is applied to improve the utilization of unlabeled samples. By modeling and simulating debutanizer process data, the proposed method was confirmed to have superior prediction performance when there are an insufficient number of labeled samples.
更新日期/Last Update:
1900-01-01