[1]LI Yinan,WANG Shitong.A feature augmentation method for enhancing the labeling quality of crowdsourcing data[J].CAAI Transactions on Intelligent Systems,2020,15(2):227-234.[doi:10.11992/tis.201810014]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 2
Page number:
227-234
Column:
学术论文—知识工程
Public date:
2020-03-05
- Title:
-
A feature augmentation method for enhancing the labeling quality of crowdsourcing data
- Author(s):
-
LI Yinan; WANG Shitong
-
School of Digital Media, Jiangnan University, Wuxi 214122, China
-
- Keywords:
-
crowdsourcing; labeling quality; feature augmentation; expert labeling; noise identification; noise correction; noise probability; upper limit of noise number
- CLC:
-
TP181
- DOI:
-
10.11992/tis.201810014
- Abstract:
-
Crowdsourcing is a new method of collecting the labels of data. Although it is economical, crowdsourcing faces an unavoidable problem, i.e., the quality of the labels cannot be guaranteed. In particular, when the quality of labeling work is low because of the existence of objective causes, the result of crowdsourcing will be unreliable. In this study, a feature augmentation method for enhancing the labeling quality of crowdsourcing data is proposed. In the proposed method, first, a small amount of expert data is labeled by several people with professional knowledge. Then, the crowdsourcing data are used to create the classifiers and predict the expert data. The resultant predicted labels are used to augment the expert data. Then, the augmented expert data are used to create the classifiers, predict the original data, and calculate the probability of noise for each instance and the upper limit of noise number to filter out the high-quality dataset from potentially noisy labels. Similarly, the filtered high-quality dataset is utilized to further correct the noisy labels using the proposed feature augmentation method. The experiments conducted on eight UCI datasets show that the proposed feature augmentation method has achieved encouraging results when the number of repeated labels is comparatively small or the quality of labeling is comparatively low.