[1]张本才,王志海,孙艳歌.一种多样性和精度加权的数据流集成分类算法[J].智能系统学报,2019,14(01):179-185.[doi:10.11992/tis.201806021]
 ZHANG Bencai,WANG Zhihai,SUN Yange.An ensemble classification algorithm based on diversity and accuracy weighting for data streams[J].CAAI Transactions on Intelligent Systems,2019,14(01):179-185.[doi:10.11992/tis.201806021]
点击复制

一种多样性和精度加权的数据流集成分类算法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年01期
页码:
179-185
栏目:
出版日期:
2019-01-05

文章信息/Info

Title:
An ensemble classification algorithm based on diversity and accuracy weighting for data streams
作者:
张本才1 王志海1 孙艳歌12
1. 北京交通大学 计算机与信息技术学院, 北京 100044;
2. 信阳师范学院 计算机与信息技术学院, 河南 信阳 464000
Author(s):
ZHANG Bencai1 WANG Zhihai1 SUN Yan’ge12
1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;
2. School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
关键词:
数据流概念漂移多样性精度集成学习数据块价值度量MOA
Keywords:
data streamconcept driftdiversityaccuracyensemble learningdata chunkvalue measurementMOA
分类号:
TP391
DOI:
10.11992/tis.201806021
摘要:
为了克服数据流中概念漂移对分类的影响,提出了一种基于多样性和精度加权的集成分类方法(diversity and accuracy weighting ensemble classification algorithm, DAWE),该方法与已有的其他集成方法不同的地方在于,DAWE同时考虑了多样性和精度这两种度量标准,将分类器在最新数据块上的精度及其在集成分类器中的多样性进行线性加权,以此来衡量一个分类器对于当前集成分类器的价值,并将价值度量用于基分类器替换策略。提出的DAWE算法与MOA中最新算法分别在真实数据和人工合成数据上进行了对比实验,实验表明,提出的方法是有效的,在所有数据集上的平均精度优于其他算法,该方法能有效处理数据流挖掘中的概念漂移问题。
Abstract:
To overcome the effect of concept drift on data stream classification, we propose an ensemble classification algorithm based on diversity and accuracy weighting named DAWE. The difference between DAWE and other existing ensemble methods is that DAWE considers both diversity and accuracy. The classifier’s accuracy on the new data chunk and its diversity in the ensemble were linearly weighted to measure the value of the current ensemble classifier and the measured value was applied to the substitute strategy of the base classifier. The DAWE algorithm proposed in this paper was experimentally compared with the latest algorithms in massive online analysis (MOA), using both synthetic and real-world datasets. Experiments showed that the method proposed in this paper was effective and the average overall accuracy of the data sets was superior to that of other algorithms. Overall, this method can effectively manage concept drift in data stream mining.

参考文献/References:

[1] GOMES H M, BARDDAL J P, ENEMBRECK F, et al. A survey on ensemble learning for data stream classification[J]. ACM computing surveys, 2017, 50(2):23.
[2] BRZEZINSKI D, STEFANOWSKI J. Reacting to different types of concept drift:the Accuracy Updated Ensemble algorithm[J]. IEEE transactions on neural networks and learning systems, 2014, 25(1):81-94.
[3] PIETRUCZUK L, RUTKOWSKI L, JAWORSKI M, et al. How to adjust an ensemble size in stream data mining[J]. Information sciences, 2017, 381:46-54.
[4] 孙宇. 针对含有概念漂移问题的增量学习算法研究[D]. 合肥:中国科学技术大学, 2017. SUN Yu. Incremental learning algorithms with concept drift adaptation[D]. Hefei:University of Science and Technology of China, 2017.
[5] SUN Yu, TANG Ke, ZHU Zexuan, et al. Concept drift adaptation by exploiting historical knowledge[J]. IEEE transactions on neural networks and learning systems, 2018, 29(10):4822-4832.
[6] VAN RIJN J N, HOLMES G, PFAHRINGER B, et al. Having a Blast:meta-learning and heterogeneous ensembles for data streams[C]//Proceedings of the 2015 IEEE International Conference on Data Mining. Atlantic City, USA, 2015:1003-1008.
[7] CHANDRA A, CHEN Huanhuan, YAO Xin. Trade-off between diversity and accuracy in ensemble generation[M]//JIN Yaochu. Multi-Objective Machine Learning. Berlin Heidelberg:Springer, 2006:429-464.
[8] LI Ye, XU Li, WANG Yagang, et al. A new diversity measure for classifier fusion[M]//WANG F L, LEI Jingsheng, LAU R W H, et al. Multimedia and Signal Processing. Berlin Heidelberg:Springer, 2012:396-403.
[9] 孙博, 王建东, 陈海燕, 等. 集成学习中的多样性度量[J]. 控制与决策, 2014, 29(3):385-395 SUN Bo, WANG Jiandong, CHEN Haiyan, et al. Diversity measures in ensemble learning[J]. Control and decision, 2014, 29(3):385-395
[10] BIFET A, HOLMES G, KIRKBY R, et al. MOA:massive online analysis[J]. Journal of machine learning research, 2010, 11(5):1601-1604.
[11] STREET W N, KIM Y S. A streaming ensemble algorithm (SEA) for large-scale classification[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2001:377-382.
[12] GOMES H M, BIFET A, READ J, et al. Adaptive random forests for evolving data stream classification[J]. Machine learning, 2017, 106(9/10):1469-1495.

相似文献/References:

[1]王宏鼎,童云海,谭少华,等.异常点挖掘研究进展[J].智能系统学报,2006,1(01):67.
 WANG Hong-ding,TONG Yun-hai,TAN Shao-hua,et al.Research progress on outlier mining[J].CAAI Transactions on Intelligent Systems,2006,1(01):67.
[2]易磊,潘志松,邱俊洋,等.在线学习的大规模网络流量分类研究[J].智能系统学报,2016,11(3):318.[doi:10.11992/tis.201603033]
 YI Lei,PAN Zhisong,QIU Junyang,et al.Large-scale network traffic classification based on online learning[J].CAAI Transactions on Intelligent Systems,2016,11(01):318.[doi:10.11992/tis.201603033]
[3]史荧中,王士同,邓赵红,等.基于核心向量机的多任务概念漂移数据快速分类[J].智能系统学报,2018,13(06):935.[doi:10.11992/tis.201712019]
 SHI Yingzhong,WANG Shitong,DENG Zhaohong,et al.The core vector machine-based rapid classification of multi-task concept drift dataset[J].CAAI Transactions on Intelligent Systems,2018,13(01):935.[doi:10.11992/tis.201712019]
[4]富春岩,葛茂松.一种能够适应概念漂移变化的数据流分类方法[J].智能系统学报,2007,2(04):86.
 FU Chun-yan,GE Mao-song.A data stream classification methods adaptive to concept drift[J].CAAI Transactions on Intelligent Systems,2007,2(01):86.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(01):95.[doi:10.3969/j.issn.1673-4785.201208012]

备注/Memo

备注/Memo:
收稿日期:2018-06-07。
基金项目:国家自然科学基金项目(61672086,61702030,61771058);北京市自然科学基金项目(4182052).
作者简介:张本才,男,1994年,硕士研究生,主要研究方向为数据流挖掘;王志海,男,1963年,教授,博士生导师,中国计算机学会高级会员,主要研究方向为机器学习和数据挖掘;孙艳歌,女,1982年,博士研究生,主要研究方向为机器学习和数据挖掘。
通讯作者:王志海.E-mail:zhhwang@bjtu.edu.cn
更新日期/Last Update: 1900-01-01