[1]于本成,丁世飞.缺失数据的混合式重建方法[J].智能系统学报,2019,14(5):947-952.[doi:10.11992/tis.201807037]
YU Bencheng,DING Shifei.Hybrid reconstruction method for missing data[J].CAAI Transactions on Intelligent Systems,2019,14(5):947-952.[doi:10.11992/tis.201807037]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第5期
页码:
947-952
栏目:
学术论文—人工智能基础
出版日期:
2019-09-05
- Title:
-
Hybrid reconstruction method for missing data
- 作者:
-
于本成1,2, 丁世飞1
-
1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116;
2. 徐州工业职业技术学院 信息与电气工程学院, 江苏 徐州 221004
- Author(s):
-
YU Bencheng1,2, DING Shifei1
-
1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;
2. School of Information and Electrical Engineering, Xuzhou College of Industrial Technology, Xuzhou 221004, China
-
- 关键词:
-
数据挖掘; 协方差矩阵; 适应度函数; 粒子群优化; 最佳阈值; 进化聚类算法; 数据重建; 自联想的极限学习机
- Keywords:
-
data mining; covariance matrix; fitness function; particle swarm optimization; optimal threshold; evolving clustering method; data reconstruction; auto-associative extreme learning machine
- 分类号:
-
TP301.6
- DOI:
-
10.11992/tis.201807037
- 摘要:
-
缺失数据的问题在各领域中是不可避免的,而传统的数据挖掘算法在处理不完整的数据集时表现不佳。本文将协方差矩阵及协方差矩阵的行列式应用于粒子群优化算法的适应度函数中,并以迭代的方式得出最佳阈值,再使用最佳阈值进行基于进化聚类算法的缺失值重建,解决了阈值的选取困难及其对数据重建结果的影响问题。然后,在自联想极限学习机中调用具有最佳阈值的进化聚类算法,解决了自联想极限学习机输入权值选择的随机性。最后,选取6个UCI标准数据集及9个激活函数来进行验证。实验结果表明,相对于现有的大多数数据重建方法,所提的混合式重建方法可以更有效地完成缺失数据的重建。
- Abstract:
-
The problem of missing data is inevitable in different areas. However, traditional data mining algorithms do not process incomplete data sets well. The covariance matrix and its determinant were applied to the fitness function of particle swarm optimization, and the optimal threshold was obtained through iteration. Then, the missing data were reconstructed based on the evolving clustering method using the optimal threshold, which solved the difficulty in optimal threshold selection and determined its influence on data reconstruction results. Furthermore, the randomness of the auto-associative extreme learning machine was removed by invoking the evolving clustering method with the optimal threshold. Finally, six UCI standard data sets and nine activation functions were selected to verify the method. The results showed that compared with most existing reconstruction methods, the proposed hybrid reconstruction method can complete the reconstruction of the missing data more effectively.
备注/Memo
收稿日期:2018-07-31。
基金项目:国家自然科学基金项目(61379101).
作者简介:于本成,男,1981年生,副教授,博士,主要研究方向为人工智能与数据挖掘。参与国家、省级科研课题2项,授权专利、软件著作权22项。发表学术论文20余篇;丁世飞,男,1963年生,教授,博士生导师,CCF理事,CAAI理事,主要研究方向为人工智能与模式识别。主持国家、省级课题8项,取得发明专利10项。发表学术论文200余篇,出版专著4部。
通讯作者:丁世飞.E-mail:dingsf@cumt.edu.cn
更新日期/Last Update:
1900-01-01