[1]YAN Yuanting,WU Yaya,ZHAO Shu,et al.Improving missing data recovery with a constructive covering algorithm[J].CAAI Transactions on Intelligent Systems,2019,14(6):1225-1232.[doi:10.11992/tis.201906015]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
14
Number of periods:
2019 6
Page number:
1225-1232
Column:
学术论文—人工智能基础
Public date:
2019-11-05
- Title:
-
Improving missing data recovery with a constructive covering algorithm
- Author(s):
-
YAN Yuanting; WU Yaya; ZHAO Shu; ZHANG Yanping
-
School of Computer Science and Technology, Anhui University, Hefei 230601, China
-
- Keywords:
-
incomplete data; missing value imputation; neighborhood information; data-mining; machine learning; imputation method; single imputation; multiple imputation
- CLC:
-
TP18
- DOI:
-
10.11992/tis.201906015
- Abstract:
-
Incomplete data processing is one of the most active avenues in the fields of data mining, machine learning, etc. Missing value imputation is the mainstream method used to deal with incomplete data. At present, most existing missing value imputation methods utilize relevant techniques in the field of statistics and machine learning to analyze surplus information from original data to replace the missing attributes with plausible values. Missing value imputation can be roughly divided into single imputation and multiple imputation, which have their own advantages in different scenarios. However, there are few methods that can further consider neighborhood information in the spatial distribution of samples and modify the filling results of missing values. In view of this, this paper proposes a new framework that can be widely used in many existing imputation methods to enhance the imputation effect of existing methods. It is composed of three modules, called pre-filling, spatial neighborhood information mining, and modification of the results of pre-filling separately. In this paper, seven existing imputation methods were evaluated on eight UCI datasets. Experimental results verified the validity and robustness of the framework proposed in this paper.