<-上一篇/Previous Article 下一篇/Next Article->

[1]张本才,王志海,孙艳歌.一种多样性和精度加权的数据流集成分类算法[J].智能系统学报,2019,14(1):179-185.[doi:10.11992/tis.201806021]
　ZHANG Bencai,WANG Zhihai,SUN Yange.An ensemble classification algorithm based on diversity and accuracy weighting for data streams[J].CAAI Transactions on Intelligent Systems,2019,14(1):179-185.[doi:10.11992/tis.201806021]

点击复制

一种多样性和精度加权的数据流集成分类算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 14 期数: 2019年第1期页码: 179-185 栏目: 学术论文—人工智能基础出版日期: 2019-01-05

Title:: An ensemble classification algorithm based on diversity and accuracy weighting for data streams

作者:: 张本才¹, 王志海¹, 孙艳歌^1,2; 1. 北京交通大学计算机与信息技术学院, 北京 100044;
2. 信阳师范学院计算机与信息技术学院, 河南信阳 464000

Author(s):: ZHANG Bencai¹, WANG Zhihai¹, SUN Yan’ge^1,2; 1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;
2. School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China

关键词:: 数据流; 概念漂移; 多样性; 精度; 集成学习; 数据块; 价值度量; MOA

Keywords:: data stream; concept drift; diversity; accuracy; ensemble learning; data chunk; value measurement; MOA

分类号:: TP391

DOI:: 10.11992/tis.201806021

摘要:: 为了克服数据流中概念漂移对分类的影响，提出了一种基于多样性和精度加权的集成分类方法（diversity and accuracy weighting ensemble classification algorithm， DAWE），该方法与已有的其他集成方法不同的地方在于，DAWE同时考虑了多样性和精度这两种度量标准，将分类器在最新数据块上的精度及其在集成分类器中的多样性进行线性加权，以此来衡量一个分类器对于当前集成分类器的价值，并将价值度量用于基分类器替换策略。提出的DAWE算法与MOA中最新算法分别在真实数据和人工合成数据上进行了对比实验，实验表明，提出的方法是有效的，在所有数据集上的平均精度优于其他算法，该方法能有效处理数据流挖掘中的概念漂移问题。

Abstract:: To overcome the effect of concept drift on data stream classification, we propose an ensemble classification algorithm based on diversity and accuracy weighting named DAWE. The difference between DAWE and other existing ensemble methods is that DAWE considers both diversity and accuracy. The classifier’s accuracy on the new data chunk and its diversity in the ensemble were linearly weighted to measure the value of the current ensemble classifier and the measured value was applied to the substitute strategy of the base classifier. The DAWE algorithm proposed in this paper was experimentally compared with the latest algorithms in massive online analysis (MOA), using both synthetic and real-world datasets. Experiments showed that the method proposed in this paper was effective and the average overall accuracy of the data sets was superior to that of other algorithms. Overall, this method can effectively manage concept drift in data stream mining.

参考文献/References:: [1] GOMES H M, BARDDAL J P, ENEMBRECK F, et al. A survey on ensemble learning for data stream classification[J]. ACM computing surveys, 2017, 50(2):23.
[2] BRZEZINSKI D, STEFANOWSKI J. Reacting to different types of concept drift:the Accuracy Updated Ensemble algorithm[J]. IEEE transactions on neural networks and learning systems, 2014, 25(1):81-94.
[3] PIETRUCZUK L, RUTKOWSKI L, JAWORSKI M, et al. How to adjust an ensemble size in stream data mining[J]. Information sciences, 2017, 381:46-54.
[4] 孙宇. 针对含有概念漂移问题的增量学习算法研究[D]. 合肥:中国科学技术大学, 2017. SUN Yu. Incremental learning algorithms with concept drift adaptation[D]. Hefei:University of Science and Technology of China, 2017.
[5] SUN Yu, TANG Ke, ZHU Zexuan, et al. Concept drift adaptation by exploiting historical knowledge[J]. IEEE transactions on neural networks and learning systems, 2018, 29(10):4822-4832.
[6] VAN RIJN J N, HOLMES G, PFAHRINGER B, et al. Having a Blast:meta-learning and heterogeneous ensembles for data streams[C]//Proceedings of the 2015 IEEE International Conference on Data Mining. Atlantic City, USA, 2015:1003-1008.
[7] CHANDRA A, CHEN Huanhuan, YAO Xin. Trade-off between diversity and accuracy in ensemble generation[M]//JIN Yaochu. Multi-Objective Machine Learning. Berlin Heidelberg:Springer, 2006:429-464.
[8] LI Ye, XU Li, WANG Yagang, et al. A new diversity measure for classifier fusion[M]//WANG F L, LEI Jingsheng, LAU R W H, et al. Multimedia and Signal Processing. Berlin Heidelberg:Springer, 2012:396-403.
[9] 孙博, 王建东, 陈海燕, 等. 集成学习中的多样性度量[J]. 控制与决策, 2014, 29(3):385-395 SUN Bo, WANG Jiandong, CHEN Haiyan, et al. Diversity measures in ensemble learning[J]. Control and decision, 2014, 29(3):385-395
[10] BIFET A, HOLMES G, KIRKBY R, et al. MOA:massive online analysis[J]. Journal of machine learning research, 2010, 11(5):1601-1604.
[11] STREET W N, KIM Y S. A streaming ensemble algorithm (SEA) for large-scale classification[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2001:377-382.
[12] GOMES H M, BIFET A, READ J, et al. Adaptive random forests for evolving data stream classification[J]. Machine learning, 2017, 106(9/10):1469-1495.

相似文献/References:: [1]王宏鼎,童云海,谭少华,等.异常点挖掘研究进展[J].智能系统学报,2006,1(1):67.
　WANG Hong-ding,TONG Yun-hai,TAN Shao-hua,et al.Research progress on outlier mining[J].CAAI Transactions on Intelligent Systems,2006,1():67.
[2]易磊,潘志松,邱俊洋,等.在线学习的大规模网络流量分类研究[J].智能系统学报,2016,11(3):318.[doi:10.11992/tis.201603033]
　YI Lei,PAN Zhisong,QIU Junyang,et al.Large-scale network traffic classification based on online learning[J].CAAI Transactions on Intelligent Systems,2016,11():318.[doi:10.11992/tis.201603033]
[3]史荧中,王士同,邓赵红,等.基于核心向量机的多任务概念漂移数据快速分类[J].智能系统学报,2018,13(6):935.[doi:10.11992/tis.201712019]
　SHI Yingzhong,WANG Shitong,DENG Zhaohong,et al.The core vector machine-based rapid classification of multi-task concept drift dataset[J].CAAI Transactions on Intelligent Systems,2018,13():935.[doi:10.11992/tis.201712019]
[4]富春岩,葛茂松.一种能够适应概念漂移变化的数据流分类方法[J].智能系统学报,2007,2(4):86.
　FU Chun-yan,GE Mao-song.A data stream classification methods adaptive to concept drift[J].CAAI Transactions on Intelligent Systems,2007,2():86.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(2):95.[doi:10.3969/j.issn.1673-4785.201208012]
　WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8():95.[doi:10.3969/j.issn.1673-4785.201208012]

备注/Memo

收稿日期:2018-06-07。
基金项目:国家自然科学基金项目（61672086，61702030，61771058）；北京市自然科学基金项目（4182052）.
作者简介:张本才,男,1994年,硕士研究生,主要研究方向为数据流挖掘;王志海,男,1963年,教授,博士生导师,中国计算机学会高级会员,主要研究方向为机器学习和数据挖掘;孙艳歌,女,1982年,博士研究生,主要研究方向为机器学习和数据挖掘。
通讯作者:王志海.E-mail:zhhwang@bjtu.edu.cn

更新日期/Last Update: 1900-01-01

一种多样性和精度加权的数据流集成分类算法 PDF下载HTML

备注/Memo

一种多样性和精度加权的数据流集成分类算法

PDF下载 HTML