[1]冯骥,冉瑞生,魏延.基于自然邻居邻域图的无参数离群检测算法[J].智能系统学报,2019,14(05):998-1006.[doi:10.11992/tis.201809032]
 FENG Ji,RAN Ruisheng,WEI Yan.A parameter-free outlier detection algorithm based on natural neighborhood graph[J].CAAI Transactions on Intelligent Systems,2019,14(05):998-1006.[doi:10.11992/tis.201809032]
点击复制

基于自然邻居邻域图的无参数离群检测算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年05期
页码:
998-1006
栏目:
出版日期:
2019-09-05

文章信息/Info

Title:
A parameter-free outlier detection algorithm based on natural neighborhood graph
作者:
冯骥 冉瑞生 魏延
重庆师范大学 计算机与信息科学学院, 重庆 401331
Author(s):
FENG Ji RAN Ruisheng WEI Yan
College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
关键词:
无参数自适应最近邻居加权图离群检测离群因子全局离群点局部离群点
Keywords:
parameter-freeadaptive neighbornearest neighborweighted graphoutlier detectionoutlier factorglobal outlierlocal outlier
分类号:
TP311
DOI:
10.11992/tis.201809032
摘要:
数据挖掘领域,基于最近邻居思想的离群检测算法在面对复杂数据时,很难在没有足够先验知识条件下进行适当的参数选择。为了解决这个问题,本文在自然邻居方法的基础上,提出一种利用加权自然邻居邻域图进行离群检测的算法。该算法在整个过程不需要人为设置参数,并且能在不同分布特征的数据中准确找到数据集中的全局离群点和局部离群点。人工数据集和真实数据的离群检测结果均证明,本算法能够取得和有参数的算法中最优参数相近的效果,算法检测结果远好于对参数敏感算法的大部分情况,且更优于对参数不敏感的算法,具有更强的普适性和实用性。
Abstract:
This study aims to deal with the practical shortages of nearest-neighbor-based data mining techniques, particularly outlier detection. In particular, when data sets have arbitrarily shaped clusters and varying density, determining the appropriate parameters without a priori knowledge becomes difficult. To address this issue, on the basis of the natural neighbor method, which can better reflect the relationship between elements in a data set than the k-nearest neighbor method, we present a graph called the weighted natural neighborhood graph for outlier detection. The weighted natural neighborhood graph does not need to set parameters artificially in the entire process and can identify global and local outliers in the data set with different distribution characteristics. The outlier detection results of artificial dataset and real data prove that the algorithm can obtain an effect similar to that of the optimal parameter in the algorithm with parameters. The algorithm detection result is far better than that of most parameter-sensitive algorithms and is much better than that of the parameter-insensitive algorithm, which has stronger universality and more practicality.

参考文献/References:

[1] BOLTON R J, HAND D J. Statistical fraud detection:a review[J]. Statistical science, 2002, 17(3):235-255.
[2] DENG Hongmei, XU R. Model selection for anomaly detection in wireless ad hoc networks[C]//Proceedings of 2007 IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, USA, 2007:540-546.
[3] DURAN O, PETROU M. A Time-efficient method for anomaly detection in hyperspectral images[J]. IEEE Transactions on geoscience and remote sensing, 2007, 45(12):3894-3904.
[4] PODGORELEC V, HERICKO M, ROZMAN I. Improving mining of medical data by outliers prediction[C]//Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems. Dublin, Ireland, 2005:91-96.
[5] NASI J, SORSA A, LEIVISKA K. Sensor validation and outlier detection using fuzzy limits[C]//Proceedings of the 44th IEEE Conference on Decision and Control. Seville, Spain, 2005:7828-7833.
[6] KIM S, CHO N W, KANG B, et al. Fast outlier detection for very large log data[J]. Expert systems with applications, 2011, 38(8):9587-9596.
[7] CAMPELLO R J G B, MOULAVI D, ZIMEK A, et al. Hierarchical density estimates for data clustering, visualization, and outlier detection[J]. ACM transactions on knowledge discovery from data, 2015, 10(1):5.
[8] 苟和平, 景永霞, 冯百明, 等. 基于DBSCAN聚类的改进KNN文本分类算法[J]. 科学技术与工程, 2013, 13(1):219-222 GOU Heping, JING Yongxia, FENG Baiming, et al. An improved KNN text categorization algorithm based on DBSCAN[J]. Science technology and engineering, 2013, 13(1):219-222
[9] 周芳芳, 高飞, 刘勇刚, 等. 基于密度-距离图的交互式体数据分类方法[J]. 软件学报, 2016, 27(5):1061-1073 ZHOU Fangfang, GAO Fei, LIU Yonggang, et al. Interactive volume data classification based on density-distance graph[J]. Journal of software, 2016, 27(5):1061-1073
[10] 周国兵, 吴建鑫, 周嵩. 一种基于近邻表示的聚类方法[J]. 软件学报, 2015, 26(11):2847-2855 ZHOU Guobing, WU Jianxin, ZHOU Song. Clustering method based on nearest neighbors representation[J]. Journal of software, 2015, 26(11):2847-2855
[11] 王习特, 申德荣, 白梅, 等. BOD:一种高效的分布式离群点检测算法[J]. 计算机学报, 2016, 39(1):36-51 WANG Xite, SHEN Derong, BAI Mei, et al. BOD:an efficient algorithm for distributed outlier detection[J]. Chinese journal of computers, 2016, 39(1):36-51
[12] 陆海青, 葛洪伟. 自适应灰度加权的鲁棒模糊C均值图像分割[J]. 智能系统学报, 2018, 13(4):584-593 LU Haiqing, GE Hongwei. Adaptive gray-weighted robust fuzzy C-means algorithm for image segmentation[J]. CAAI transactions on intelligent systems, 2018, 13(4):584-593
[13] 赵冠哲, 齐建鹏, 于彦伟, 等. 移动社交网络异常签到在线检测算法[J]. 智能系统学报, 2017, 12(5):752-759 ZHAO Guanzhe, QI Jianpeng, YU Yanwei, et al. Online check-in outlier detection method in mobile social networks[J]. CAAI transactions on intelligent systems, 2017, 12(5):752-759
[14] 张美琴, 白亮, 王俊斌. 基于加权聚类集成的标签传播算法[J]. 智能系统学报, 2018, 13(6):994-998 ZHANG Meiqin, BAI Liang, WANG Junbin. Label propagation algorithm based on weighted clustering ensemble[J]. CAAI transactions on intelligent systems, 2018, 13(6):994-998
[15] HA J, SEOK S, LEE J S. Robust outlier detection using the instability factor[J]. Knowledge-based systems, 2014, 63(2):15-23.
[16] 冯骥, 张程, 朱庆生. 一种具有动态邻域特点的自适应最近邻居算法[J]. 计算机科学, 2017, 44(12):194-201 FENG Ji, ZHANG Cheng, ZHU Qingsheng. Adaptive nearest neighbor algorithm with dynamic neighborhood[J]. Computer science, 2017, 44(12):194-201

相似文献/References:

[1]徐长明,南晓斐,王 骄,等.中国象棋机器博弈的时间自适应分配策略研究[J].智能系统学报,2006,1(02):39.
 XU Chang-ming,NAN Xiao-fei,WANG Jiao,et al.Adaptive time allocation strategy in computer game of Chinese Chess[J].CAAI Transactions on Intelligent Systems,2006,1(05):39.
[2]李 晔,常文田,万 磊,等.水下机器人自适应卡尔曼滤波技术研究[J].智能系统学报,2006,1(02):44.
 LI Ye,CHANG Wen-tian,WAN Lei,et al.Research on underwater vehicle adaptive Kalman filter[J].CAAI Transactions on Intelligent Systems,2006,1(05):44.
[3]陈小波,程显毅.一种基于MAS的自适应图像分割方法[J].智能系统学报,2007,2(04):80.
 CHEN Xiao-bo,CHENG Xian-yi.An adaptive image segmentation technique based on multiAgent system[J].CAAI Transactions on Intelligent Systems,2007,2(05):80.
[4]杨振宇,唐珂.差分进化算法参数控制与适应策略综述[J].智能系统学报,2011,6(05):415.
 YANG Zhenyu,TANG Ke.An overview of parameter control and adaptation strategiesin differential evolution algorithm[J].CAAI Transactions on Intelligent Systems,2011,6(05):415.
[5]陈明杰,黄佰川,张旻.混合改进蚁群算法的函数优化[J].智能系统学报,2012,7(04):370.
 CHEN Mingjie,HUANG Baichuan,ZHANG Min.Function optimization based on an improved hybrid ACO[J].CAAI Transactions on Intelligent Systems,2012,7(05):370.
[6]孙文新,穆华平.自适应群体结构的粒子群优化算法[J].智能系统学报,2013,8(04):372.[doi:10.3969/j.issn.1673-4785.201211041]
 SUN Wenxin,MU Huaping.Particle swarm optimization based on self-adaptive population structure[J].CAAI Transactions on Intelligent Systems,2013,8(05):372.[doi:10.3969/j.issn.1673-4785.201211041]
[7]刘昌芬,韩红桂,乔俊飞.广义逆向学习方法的自适应差分算法[J].智能系统学报,2015,10(01):131.[doi:10.3969/j.issn.1673-4785.201310068]
 LIU Changfen,HAN Honggui,QIAO Junfei.Self-adaptive DE algorithm via generalized opposition-based learning[J].CAAI Transactions on Intelligent Systems,2015,10(05):131.[doi:10.3969/j.issn.1673-4785.201310068]
[8]马利民.欠驱动AUV全局无抖振滑模轨迹跟踪控制[J].智能系统学报,2016,11(2):200.[doi:10.11992/tis.201512015]
 MA Limin.Global chattering-free sliding mode trajectory tracking control of underactuated autonomous underwater vehicles[J].CAAI Transactions on Intelligent Systems,2016,11(05):200.[doi:10.11992/tis.201512015]
[9]王晓燕,鲁华祥,金敏,等.基于相关性的小波熵心电信号去噪算法[J].智能系统学报,2016,11(6):827.[doi:10.11992/tis.201611017]
 WANG Xiaoyan,LU Huaxiang,JIN Min,et al.Wavelet entropy denoising algorithm of electrocardiogram signals based on correlation[J].CAAI Transactions on Intelligent Systems,2016,11(05):827.[doi:10.11992/tis.201611017]
[10]杨晓兰,强彦,赵涓涓,等.基于医学征象和卷积神经网络的肺结节CT图像哈希检索[J].智能系统学报,2017,12(06):857.[doi:10.11992/tis.201706035]
 YANG Xiaolan,QIANG Yan,ZHAO Juanjuan,et al.Hashing retrieval for CT images of pulmonary nodules based on medical signs and convolutional neural networks[J].CAAI Transactions on Intelligent Systems,2017,12(05):857.[doi:10.11992/tis.201706035]

备注/Memo

备注/Memo:
收稿日期:2018-09-16。
基金项目:教育部人文社会科学研究项目(18XJC880002);重庆市教委科技项目(KJQN201800539);重庆市自然科学基金项目(cstc2013jcyjA40049);重庆师范大学基金项目(17XLB003).
作者简介:冯骥,男,1986年生,讲师,博士,主要研究方向为机器学习和数据挖掘。发表学术论文10余篇;冉瑞生,男,1976年生,教授,博士,主要研究方向为模式识别、机器学习。发表学术论文20余篇;魏延,男,1970年生,教授,博士,中国大数据应用联盟人工智能专家委员会委员,中国计算机学会教育专委会委员,全国高等学校计算机教育研究会理事,主要研究方向为机器学习与智能计算、数据挖掘、支持向量机理论与算法应用。主持或参与重庆市科研项目9项。发表学术论文40余篇。
通讯作者:冯骥.E-mail:jifeng@cqnu.edu.cn
更新日期/Last Update: 1900-01-01