[1]于本成,丁世飞.缺失数据的混合式重建方法[J].智能系统学报,2019,14(05):947-952.[doi:10.11992/tis.201807037]
 YU Bencheng,DING Shifei.Hybrid reconstruction method for missing data[J].CAAI Transactions on Intelligent Systems,2019,14(05):947-952.[doi:10.11992/tis.201807037]
点击复制

缺失数据的混合式重建方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年05期
页码:
947-952
栏目:
出版日期:
2019-09-05

文章信息/Info

Title:
Hybrid reconstruction method for missing data
作者:
于本成12 丁世飞1
1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116;
2. 徐州工业职业技术学院 信息与电气工程学院, 江苏 徐州 221004
Author(s):
YU Bencheng12 DING Shifei1
1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;
2. School of Information and Electrical Engineering, Xuzhou College of Industrial Technology, Xuzhou 221004, China
关键词:
数据挖掘协方差矩阵适应度函数粒子群优化最佳阈值进化聚类算法数据重建自联想的极限学习机
Keywords:
data miningcovariance matrixfitness functionparticle swarm optimizationoptimal thresholdevolving clustering methoddata reconstructionauto-associative extreme learning machine
分类号:
TP301.6
DOI:
10.11992/tis.201807037
摘要:
缺失数据的问题在各领域中是不可避免的,而传统的数据挖掘算法在处理不完整的数据集时表现不佳。本文将协方差矩阵及协方差矩阵的行列式应用于粒子群优化算法的适应度函数中,并以迭代的方式得出最佳阈值,再使用最佳阈值进行基于进化聚类算法的缺失值重建,解决了阈值的选取困难及其对数据重建结果的影响问题。然后,在自联想极限学习机中调用具有最佳阈值的进化聚类算法,解决了自联想极限学习机输入权值选择的随机性。最后,选取6个UCI标准数据集及9个激活函数来进行验证。实验结果表明,相对于现有的大多数数据重建方法,所提的混合式重建方法可以更有效地完成缺失数据的重建。
Abstract:
The problem of missing data is inevitable in different areas. However, traditional data mining algorithms do not process incomplete data sets well. The covariance matrix and its determinant were applied to the fitness function of particle swarm optimization, and the optimal threshold was obtained through iteration. Then, the missing data were reconstructed based on the evolving clustering method using the optimal threshold, which solved the difficulty in optimal threshold selection and determined its influence on data reconstruction results. Furthermore, the randomness of the auto-associative extreme learning machine was removed by invoking the evolving clustering method with the optimal threshold. Finally, six UCI standard data sets and nine activation functions were selected to verify the method. The results showed that compared with most existing reconstruction methods, the proposed hybrid reconstruction method can complete the reconstruction of the missing data more effectively.

参考文献/References:

[1] KENNEDY J. Particle swarm optimization[M]//SAMMUT C, WEBB G I. Encyclopedia of Machine Learning. Boston, MA:Springer, 2010.
[2] EBERHART R C, SHI Y. Comparing inertia weights and constriction factors in particle swarm optimization[C]//Proceedings of the 2000 Congress on Evolutionary Computation. La Jolla, USA, 2000:84?88.
[3] 张庆科. 粒子群优化算法及差分进行算法研究[D]. 济南:山东大学, 2017. ZHANG Qingke. Research on the particle swarm optimization and differential evolution algorithms[D]. Ji’nan:Shandong University, 2017.
[4] 王永贵, 林琳, 刘宪国. 基于改进粒子群优化的文本聚类算法研究[J]. 计算机工程, 2014, 40(11):172-177 WANG Yonggui, LIN Lin, LIU Xianguo. Research on text clustering algorithm based on improved particle swarm optimization[J]. Computer engineering, 2014, 40(11):172-177
[5] 徐林. 粒子群优化算法的改进及其应用研究[J]. 西安文理学院学报(自然科学版), 2017, 20(4):51-54 XU Lin. Research on improvement and application of the particle swarm optimization algorithm[J]. Journal of Xi’an University (natural science edition), 2017, 20(4):51-54
[6] KRISHNA M, RAVI V. Particle swarm optimization and covariance matrix based data imputation[C]//Proceedings of 2013 IEEE International Conference on Computational Intelligence and Computing Research. Enathi, India, 2013:1-6.
[7] KASABOV N K, SONG Qun. DENFIS:dynamic evolving neural-fuzzy inference system and its application for time-series prediction[J]. IEEE transactions on fuzzy systems, 2002, 10(2):144-154.
[8] KASABOV N, SONG Qun, MA Tianmin. Fuzzy-neuro systems for local and personalized modelling[M]//NIKRAVESH M, KACPRZYK J, ZADEH L A. Forging New Frontiers:Fuzzy Pioneers Ⅱ. Berlin, Heidelberg:Springer, 2008:175?197.
[9] NISHANTH K J, RAVI V. A computational intelligence based online data imputation method:an application for banking[J]. Journal of information processing systems, 2013, 9(9):633-650.
[10] GAUTAM C, RAVI V. Evolving clustering based data imputation[C]//Proceedings of 2014 International Conference on Circuits, Power and Computing Technologies. Nagercoil, Tamil Nadu, India, 2014:1763-1769.
[11] HUANG Guangbin, ZHU Qinyu, SIEW C K. Extreme learning machine:a new learning scheme of feedforward neural networks[C]//Proceedings of 2004 IEEE International Joint Conference on Neural Networks. Budapest, Hungary, 2004:985-990.
[12] HUANG Guangbin, ZHU Qinyu, SIEW C K. Extreme learning machine:theory and applications[J]. Neurocomputing, 2006, 70(1/2/3):489-501.
[13] 任阳晖. 极限学习机算法及应用研究[D]. 沈阳:沈阳航空航天大学, 2017. REN Yanghui. Extreme learning machine alorithm and application[D]. Shenyang:Shenyang Aerospace University, 2017.
[14] GAUTAM C, RAVI V. Data imputation via evolutionary computation, clustering and a neural network[J]. Neurocomputing, 2015, 156:134-142.
[15] RAVI V, KRISHNA M. A new online data imputation method based on general regression auto associative neural network[J]. Neurocomputing, 2014, 138:106-113.
[16] 申小征. 基于维数约简的区域协方差矩阵及其在人脸识别中的应用[D]. 云南:云南财经大学, 2017.
[17] ANKAIAH N, RAVI V. A novel soft computing hybrid for data imputation[C]//Proceedings of the 7th International Conference on Data Mining. Las Vegas, Nevada, USA, 2011.

相似文献/References:

[1]张继福,张素兰,胡立华.约束概念格及其构造方法[J].智能系统学报,2006,1(02):31.
 ZHANG Ji-fu,ZHANG Su-lan,HU Li-hua.Constrained concept lattice and its construction method[J].CAAI Transactions on Intelligent Systems,2006,1(05):31.
[2]王国胤,张清华,胡 军.粒计算研究综述[J].智能系统学报,2007,2(06):8.
 WANG Guo-yin,ZHANG Qing-hua,HU Jun.An overview of granular computing[J].CAAI Transactions on Intelligent Systems,2007,2(05):8.
[3]邓 貌,陈 旭,陈天翔,等.采用核聚类分析的KPCA改进算法[J].智能系统学报,2010,5(03):221.
 DENG Mao,CHEN Xu,CHEN Tian-xiang,et al.mproved kernel principal component analysis based ona clustering algorithm[J].CAAI Transactions on Intelligent Systems,2010,5(05):221.
[4]何清.物联网与数据挖掘云服务[J].智能系统学报,2012,7(03):189.
 HE Qing.The Internet of things and the data mining cloud service[J].CAAI Transactions on Intelligent Systems,2012,7(05):189.
[5]李海林,郭韧,万校基.基于特征矩阵的多元时间序列最小距离度量方法[J].智能系统学报,2015,10(03):442.[doi:10.3969/j.issn.1673-4785.201405047]
 LI Hailin,GUO Ren,WAN Xiaoji.A minimum distance measurement method for amultivariate time series based on the feature matrix[J].CAAI Transactions on Intelligent Systems,2015,10(05):442.[doi:10.3969/j.issn.1673-4785.201405047]
[6]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(04):607.[doi:10.3969/j.issn.1673-4785.201411036]
 SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10(05):607.[doi:10.3969/j.issn.1673-4785.201411036]
[7]汤建国,汪江桦,韩莉英,等.基于覆盖粗糙集的语言动力系统[J].智能系统学报,2014,9(02):229.[doi:10.3969/j.issn.1673-4785.201307018]
 TANG Jianguo,WANG Jianghua,HAN Liying,et al.Linguistic dynamic systems based on covering-based rough sets[J].CAAI Transactions on Intelligent Systems,2014,9(05):229.[doi:10.3969/j.issn.1673-4785.201307018]
[8]石磊,杜军平,周亦鹏,等.在线社交网络挖掘与搜索技术研究[J].智能系统学报,2016,11(6):777.[doi:10.11992/tis.201612007]
 SHI Lei,DU Junping,ZHOU Yipeng,et al.A survey on online social network mining and search[J].CAAI Transactions on Intelligent Systems,2016,11(05):777.[doi:10.11992/tis.201612007]
[9]淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法[J].智能系统学报,2017,12(02):229.[doi:10.11992/tis.201512036]
 GAN Wenyan,LIU Chong.An improved clustering algorithm that searches and finds density peaks[J].CAAI Transactions on Intelligent Systems,2017,12(05):229.[doi:10.11992/tis.201512036]
[10]翟俊海,刘博,张素芳.基于粗糙集相对分类信息熵和粒子群优化的特征选择方法[J].智能系统学报,2017,12(03):397.[doi:10.11992/tis.201705004]
 ZHAI Junhai,LIU Bo,ZHANG Sufang.A feature selection approach based on rough set relative classification information entropy and particle swarm optimization[J].CAAI Transactions on Intelligent Systems,2017,12(05):397.[doi:10.11992/tis.201705004]

备注/Memo

备注/Memo:
收稿日期:2018-07-31。
基金项目:国家自然科学基金项目(61379101);江苏省高等职业院校教师专业带头人高端研修项目(2017TDFX003).
作者简介:于本成,男,1981年生,副教授,博士,主要研究方向为人工智能与数据挖掘。参与国家、省级科研课题2项,授权专利、软件著作权22项。发表学术论文20余篇;丁世飞,男,1963年生,教授,博士生导师,CCF理事,CAAI理事,主要研究方向为人工智能与模式识别。主持国家、省级课题8项,取得发明专利10项。发表学术论文200余篇,出版专著4部。
通讯作者:丁世飞.E-mail:dingsf@cumt.edu.cn
更新日期/Last Update: 1900-01-01