Research progress on outlier mining
北京大学视觉与听觉处理国家重点实验室,北京 100871
WANG Hong-ding TONG Yun-hai TAN Shao-hua TANG Shi-wei Y ANG Dong-qing
National Laboratory on Machine Perception, Peking University, Beijing 100871, China
异常点是数据集中与其他数据显著不同的数据.一个人的噪声对另一个人而言可能是有用的数据,因此,随着人们对数据质量、欺诈检测、网络入侵、故障诊断、自动军事侦察等问题的关注, 异常点挖掘在信息科学研究领域日益受到重视.在充分调研国内外异常点挖掘研究文献基础上,系统地综述了数据库研究领域中异常点挖掘的研究现状,对已有各种异常点挖掘方法进行了总结和比较,并结合当前研究热点,展望了异常点挖掘未来的研究方向及其面临的挑战.
An outlier is a data point that is significantly diff erent from the others in a data set. One person’s noise could be another person ’s signal, and therefore the problem of outlier mining attracts more and more interests in research of information science when the research fields of data quality, fraud detection, i ntrusion detection, fault diagnosis, military scout and so on receive wide atten tions. In this paper, a survey was presented for the problem of outlier mining from the basic concepts to the principal research problems and the underlying te chniques, including origination of outlier, definition of outlier and the compar ison of popular outlier mining methods. A summary of th e current state of the art of these techniques, a discussion on future rese arch topics, and the challenges of the outlier mining were also presented.


唐世渭,男,1939年生,教授,博士生导师,中国计算机学会数据库专业委员会副主任. 主要研究方向为数据库与信息系统.先后主持多项国家重大科技攻关课题和“973”课题,曾获国家科技进步二等奖等多项奖励,在国内外重要期刊和学术会议发表论文多篇.
