<-上一篇/Previous Article 下一篇/Next Article->

[1]程康明,熊伟丽.一种自训练框架下的三优选半监督回归算法[J].智能系统学报,2020,15(3):568-577.[doi:10.11992/tis.201905033]
　CHENG Kangming,XIONG Weili.Three-optimal semi-supervised regression algorithm under self-training framework[J].CAAI Transactions on Intelligent Systems,2020,15(3):568-577.[doi:10.11992/tis.201905033]

点击复制

一种自训练框架下的三优选半监督回归算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 15 期数: 2020年第3期页码: 568-577 栏目: 学术论文—机器学习出版日期: 2020-05-05

Title:: Three-optimal semi-supervised regression algorithm under self-training framework

作者:: 程康明¹, 熊伟丽²; 1. 江南大学物联网工程学院，江苏无锡 214122;
2. 江南大学轻工过程先进控制教育部实验室，江苏无锡 214122

Author(s):: CHENG Kangming¹, XIONG Weili²; 1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China;
2. Key Laboratory of Advanced Process Control for Light Industry, Jiangnan University, Wuxi 214122, China

关键词:: 工业生产; 无标签样本; 优选; 半监督回归; 相似性; 高斯过程回归; 置信度判断; 自训练; 预测

Keywords:: industrial production; unlabeled samples; filter; semi-supervised regression; similarity; Gaussian process regression; confidence judgment; self-training; prediction

分类号:: TP274

DOI:: 10.11992/tis.201905033

摘要:: 工业生产过程数据由于主导变量分析代价等因素可能出现有标签样本少而无标签样本多的情况，为提升对无标签样本利用的准确性与充分性，提出一种自训练框架下的三优选半监督回归算法。对无标签样本与有标签样本进行优选，保证两类数据的相似性，以提高无标签样本预测的准确性；利用高斯过程回归方法对所选有标签样本集建模，预测所选无标签样本集，得到伪标签样本集；通过对伪标签样本集置信度进行判断，优选出置信度高的样本用于更新初始样本集；为了进一步提高无标签样本利用的充分性，在自训练框架下，进行多次循环筛选提高无标签样本的利用率。通过对脱丁烷塔过程实际数据的建模仿真，验证了所提方法在较少有标签样本情况下的良好预测性能。

Abstract:: In industrial production, due to factors such as the cost of analyzing the dominant variable, there may be cases in which there are few labeled and many unlabeled samples. To improve performance and accuracy in the use of unlabeled samples, we propose the use of a three-optimal semi-supervised regression algorithm under a self-training framework. This algorithm first filters unlabeled and labeled samples to ensure similarity between these two types of data and improve the accuracy of predicting the unlabeled samples. Then, a model is established based on the selected labeled samples using Gaussian process regression to predict the unlabeled samples, from which pseudo-label samples are obtained. Then, by determining the confidence levels of the prediction of the pseudo-label samples, samples with higher confidence levels are filtered and used to update the initial samples. Finally, through multiple filtering loops, a self-training framework is applied to improve the utilization of unlabeled samples. By modeling and simulating debutanizer process data, the proposed method was confirmed to have superior prediction performance when there are an insufficient number of labeled samples.

参考文献/References:: [1] SHAHSHAHANI B M, LANDGREBE D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J]. IEEE transactions on geoscience and remote sensing, 1994, 32(5): 1087-1095.
[2] LIU Huizeng, SHI Tiezhu, CHEN Yiyun, et al. Improving spectral estimation of soil organic carbon content through semi-supervised regression[J]. Remote sensing, 2017, 9(1): 29.
[3] 姜婷, 袭肖明, 岳厚光. 基于分布先验的半监督FCM的肺结节分类[J]. 智能系统学报, 2017, 12(5): 729-734
JIANG Ting, XI Xiaoming, YUE Houguang. Classification of pulmonary nodules by semi-supervised FCM based on prior distribution[J]. CAAI transactions on intelligent systems, 2017, 12(5): 729-734
[4] 刘杨磊, 梁吉业, 高嘉伟, 等. 基于Tri-training的半监督多标记学习算法[J]. 智能系统学报, 2013, 8(5): 439-445
LIU Yanglei, LIANG Jiye, GAO Jiawei, et al. Semi-supervised multi-label learning algorithm based on Tri-training[J]. CAAI transactions on intelligent systems, 2013, 8(5): 439-445
[5] 徐蓉, 姜峰, 姚鸿勋. 流形学习概述[J]. 智能系统学报, 2006, 1(1): 44-51
XU Rong, JIANG Feng, YAO Hongxun. Overview of manifold learning[J]. CAAI transactions on intelligent systems, 2006, 1(1): 44-51
[6] LIN Tong, ZHA Hongbin. Riemannian manifold learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2008, 30(5): 796-809.
[7] 赵立杰, 王海龙, 陈斌. 基于流形正则化半监督学习的污水处理操作工况识别方法[J]. 化工学报, 2016, 67(6): 2462-2468
ZHAO Lijie, WANG Hailong, CHEN Bin. Identification of wastewater operational conditions based on manifold regularization semi-supervised learning[J]. CIESC journal, 2016, 67(6): 2462-2468
[8] 杜永贵, 李思思, 阎高伟, 等. 基于流形正则化域适应湿式球磨机负荷参数软测量[J]. 化工学报, 2018, 69(3): 1244-1251
DU Yonggui, LI Sisi, YAN Gaowei, et al. Soft sensor of wet ball mill load parameter based on domain adaptation with manifold regularization[J]. CIESC journal, 2018, 69(3): 1244-1251
[9] 陈定三, 杨慧中. 基于局部重构融合流形聚类的多模型软测量建模[J]. 化工学报, 2011, 62(8): 2281-2286
CHEN Dingsan, YANG Huizhong. Multiple model soft sensor based on local reconstruction and fusion manifold clustering[J]. CIESC journal, 2011, 62(8): 2281-2286
[10] ZHOU Zhihua, LI Ming. Semi-supervised regression with co-training[C]//Proceedings of the 19th International Joint Conference on Artificial Intelligence. Scotland, UK, 2005: 908-913.
[11] CALDAS W L, GOMES J P P, MESQUITA D P P. Fast Co-MLM: an efficient semi-supervised co-training method based on the minimal learning machine[J]. New generation computing, 2018, 36(6): 41-58.
[12] CHEN Minmin, WEINBERGER K Q, BLITZER J C. Co-training for domain adaptation[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011: 2456-2464.
[13] BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory. Wisconsin, USA, 1998: 92-100.
[14] 程玉虎, 冀杰, 王雪松. 基于Help-Training的半监督支持向量回归[J]. 控制与决策, 2012, 27(2): 205-210
CHENG Yuhu, JI Jie, WANG Xuesong, et al. Semi-supervised support vector regression based on help-training[J]. Control and decision, 2012, 27(2): 205-210
[15] 盛高斌, 姚明海. 基于半监督回归的选择性集成算法[J]. 计算机仿真, 2009, 26(10): 198-201
SHENG Gaobin, YAO Minghai. An ensemble selection algorithm based on semi-supervised regression[J]. Computer simulation, 2009, 26(10): 198-201
[16] BESEMER J, LOMSADZE A, BORODOVSKY M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions[J]. Nucleic acids research, 2001, 29(12): 2607-2618.
[17] LI Fan, CLAUSI D A, XU Linlin, et al. ST-IRGS: a region-based self-training algorithm applied to hyperspectral image classification and segmentation[J]. IEEE transactions on geoscience and remote sensing, 2018, 56(1): 3-16.
[18] SALI L, DELSANTO S, SACCHETTO D, et al. Computer-based self-training for CT colonography with and without CAD[J]. European radiology, 2018, 28(11): 4783-4791.
[19] 张博锋, 白冰, 苏金树. 基于自训练EM算法的半监督文本分类[J]. 国防科技大学学报, 2007, 29(6): 65-69
ZHANG Bofeng, BAI Bing, SU Jinshu. Semi-supervised text classification based on self-training EM algorithm[J]. Journal of national university of defense technology, 2007, 29(6): 65-69
[20] LI Yuanqing, GUAN Cuntai, LI Huiqi, et al. A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system[J]. Pattern recognition letters, 2008, 29(9): 1285-1294.
[21] 仝小敏, 吉祥. 基于自训练的回归算法[J]. 中国电子科学研究院学报, 2017, 12(5): 498-502
TONG Xiaomin, JI Xiang. Regression algorithm based on self training[J]. Journal of China academy of electronics and information technology, 2017, 12(5): 498-502
[22] KUMAR S, HEGDE R M, TRIGONI N. Gaussian process regression for fingerprinting based localization[J]. Ad hoc networks, 2016, 51: 1-10.
[23] 熊伟丽, 李妍君, 姚乐, 等. 一种动态校正的AGMM-GPR多模型软测量建模方法[J]. 大连理工大学学报, 2016, 56(1): 77-85
XIONG Weili, LI Yanjun, YAO Le, et al. A dynamically corrected AGMM-GPR multi-model soft sensor modeling method[J]. Journal of Dalian University of Technology, 2016, 56(1): 77-85
[24] ZHOU Zhihua, LI Ming. Semisupervised regression with cotraining-style algorithms[J]. IEEE transactions on knowledge and data engineering, 2007, 19(11): 1479-1493.
[25] TANHA J, VAN SOMEREN M, AFSARMANESH H. Semi-supervised self-training for decision tree classifiers[J]. International journal of machine learning and cybernetics, 2017, 8(1): 355-370.
[26] LAW M T, YU Yaoliang, CORD M, et al. Closed-form training of mahalanobis distance for supervised clustering[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 3909-3917.
[27] KNORR E M, NG R T. A unified notion of outliers: properties and computation[C]//Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. Newport Beach, USA, 1997: 219-222.
[28] 阮宏镁, 田学民, 王平. 基于联合互信息的动态软测量方法[J]. 化工学报, 2014, 65(11): 4497-4502
RUAN Hongmei, TIAN Xuemin, WANG Ping. Dynamic soft sensor method based on joint mutual information[J]. CIESC journal, 2014, 65(11): 4497-4502
[29] FORTUNA L, GRAZIANI S, RIZZO A, et al. Soft sensors for monitoring and control of industrial processes[M]. London: Springer, 2007: 229-231.
[30] 程康明, 熊伟丽. 一种双优选的半监督回归算法[J]. 智能系统学报, 2019, 14(4): 689-696
CHENG Kangming, XIONG Weili. A dual-optimal semi-supervised regression algorithm[J]. CAAI transactions on intelligent systems, 2019, 14(4): 689-696

相似文献/References:: [1]程康明,熊伟丽.一种双优选的半监督回归算法[J].智能系统学报,2019,14(4):689.[doi:10.11992/tis.201805010]
　CHENG Kangming,XIONG Weili.A dual-optimal semi-supervised regression algorithm[J].CAAI Transactions on Intelligent Systems,2019,14():689.[doi:10.11992/tis.201805010]

备注/Memo

收稿日期:2019-05-15。
基金项目:国家自然科学基金项目(61773182)；江苏省自然科学基金项目(BK20170198)
作者简介:程康明，硕士研究生，主要研究方向为工业过程建模、机器学习和大数据分析;熊伟丽，教授，博士生导师，主要研究方向为复杂工业过程建模及优化、智能优化算法及应用。主持国家自然科学基金面上项目、江苏省产学研等纵向项目8项；参与国家863计划、重点研发计划等多项。发表学术论文60余篇.
通讯作者:熊伟丽.E-mail:greenpre@163.com

更新日期/Last Update: 1900-01-01

一种自训练框架下的三优选半监督回归算法 PDF下载HTML

备注/Memo

一种自训练框架下的三优选半监督回归算法

PDF下载 HTML