[1]严远亭,吴亚亚,赵姝,等.构造性覆盖下不完整数据修正填充方法[J].智能系统学报,2019,14(6):1225-1232.[doi:10.11992/tis.201906015]
YAN Yuanting,WU Yaya,ZHAO Shu,et al.Improving missing data recovery with a constructive covering algorithm[J].CAAI Transactions on Intelligent Systems,2019,14(6):1225-1232.[doi:10.11992/tis.201906015]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第6期
页码:
1225-1232
栏目:
学术论文—人工智能基础
出版日期:
2019-11-05
- Title:
-
Improving missing data recovery with a constructive covering algorithm
- 作者:
-
严远亭, 吴亚亚, 赵姝, 张燕平
-
安徽大学 计算机科学与技术学院, 安徽 合肥 230601
- Author(s):
-
YAN Yuanting, WU Yaya, ZHAO Shu, ZHANG Yanping
-
School of Computer Science and Technology, Anhui University, Hefei 230601, China
-
- 关键词:
-
不完整数据; 缺失值填充; 邻域信息; 数据挖掘; 机器学习; 填充方法; 单一填充; 多重填充
- Keywords:
-
incomplete data; missing value imputation; neighborhood information; data-mining; machine learning; imputation method; single imputation; multiple imputation
- 分类号:
-
TP18
- DOI:
-
10.11992/tis.201906015
- 摘要:
-
不完整数据处理是数据挖掘、机器学习等领域中的重要问题,缺失值填充是处理不完整数据的主流方法。当前已有的缺失值填充方法大多运用统计学和机器学习领域的相关技术来分析原始数据中的剩余信息,从而得到较为合理的值来替代缺失部分。缺失值填充大致可以分为单一填充和多重填充,这些填充方法在不同的场景下有着各自的优势。但是,很少有方法能进一步考虑样本空间分布中的邻域信息,并以此对缺失值的填充结果进行修正。鉴于此,本文提出了一种可广泛应用于诸多现有填充方法的框架用以提升现有方法的填充效果,该框架由预填充、空间邻域信息挖掘和修正填充三部分构成。本文对7种填充方法在8个UCI数据集上进行了实验,实验结果验证了本文所提框架的有效性和鲁棒性。
- Abstract:
-
Incomplete data processing is one of the most active avenues in the fields of data mining, machine learning, etc. Missing value imputation is the mainstream method used to deal with incomplete data. At present, most existing missing value imputation methods utilize relevant techniques in the field of statistics and machine learning to analyze surplus information from original data to replace the missing attributes with plausible values. Missing value imputation can be roughly divided into single imputation and multiple imputation, which have their own advantages in different scenarios. However, there are few methods that can further consider neighborhood information in the spatial distribution of samples and modify the filling results of missing values. In view of this, this paper proposes a new framework that can be widely used in many existing imputation methods to enhance the imputation effect of existing methods. It is composed of three modules, called pre-filling, spatial neighborhood information mining, and modification of the results of pre-filling separately. In this paper, seven existing imputation methods were evaluated on eight UCI datasets. Experimental results verified the validity and robustness of the framework proposed in this paper.
备注/Memo
收稿日期:2019-06-06。
基金项目:国家自然科学基金项目(61806002,61872002,61673020,61876001,61602003);安徽省自然科学基金项目(1708085QF143,1808085MF197);安徽大学博士科研启动基金项目(J01003253).
作者简介:严远亭,男,1986年生,讲师,博士,中国人工智能学会会员,主要研究方向为机器学习、粒计算和生物信息学。主持国家自然科学基金青年项目1项,发表学术论文10余篇;吴亚亚,男,1995年生,硕士研究生,中国人工智能学会会员,主要研究方向为机器学习和不完整数据处理;赵姝,女,1979年生,教授,博士生导师,博士,中国人工智能学会粒计算与知识发现专委会委员,安徽省人工智能学会常务理事,主要研究方向为机器学习、粒计算。获得发明专利和软件著作权多项,发表学术论文60余篇。
通讯作者:张燕平.E-mail:zhangyp2@gmail.com
更新日期/Last Update:
2019-12-25