[1]张本才,王志海,孙艳歌.一种多样性和精度加权的数据流集成分类算法[J].智能系统学报,2019,14(1):179-185.[doi:10.11992/tis.201806021]
ZHANG Bencai,WANG Zhihai,SUN Yange.An ensemble classification algorithm based on diversity and accuracy weighting for data streams[J].CAAI Transactions on Intelligent Systems,2019,14(1):179-185.[doi:10.11992/tis.201806021]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第1期
页码:
179-185
栏目:
学术论文—人工智能基础
出版日期:
2019-01-05
- Title:
-
An ensemble classification algorithm based on diversity and accuracy weighting for data streams
- 作者:
-
张本才1, 王志海1, 孙艳歌1,2
-
1. 北京交通大学 计算机与信息技术学院, 北京 100044;
2. 信阳师范学院 计算机与信息技术学院, 河南 信阳 464000
- Author(s):
-
ZHANG Bencai1, WANG Zhihai1, SUN Yan’ge1,2
-
1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;
2. School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
-
- 关键词:
-
数据流; 概念漂移; 多样性; 精度; 集成学习; 数据块; 价值度量; MOA
- Keywords:
-
data stream; concept drift; diversity; accuracy; ensemble learning; data chunk; value measurement; MOA
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201806021
- 摘要:
-
为了克服数据流中概念漂移对分类的影响,提出了一种基于多样性和精度加权的集成分类方法(diversity and accuracy weighting ensemble classification algorithm, DAWE),该方法与已有的其他集成方法不同的地方在于,DAWE同时考虑了多样性和精度这两种度量标准,将分类器在最新数据块上的精度及其在集成分类器中的多样性进行线性加权,以此来衡量一个分类器对于当前集成分类器的价值,并将价值度量用于基分类器替换策略。提出的DAWE算法与MOA中最新算法分别在真实数据和人工合成数据上进行了对比实验,实验表明,提出的方法是有效的,在所有数据集上的平均精度优于其他算法,该方法能有效处理数据流挖掘中的概念漂移问题。
- Abstract:
-
To overcome the effect of concept drift on data stream classification, we propose an ensemble classification algorithm based on diversity and accuracy weighting named DAWE. The difference between DAWE and other existing ensemble methods is that DAWE considers both diversity and accuracy. The classifier’s accuracy on the new data chunk and its diversity in the ensemble were linearly weighted to measure the value of the current ensemble classifier and the measured value was applied to the substitute strategy of the base classifier. The DAWE algorithm proposed in this paper was experimentally compared with the latest algorithms in massive online analysis (MOA), using both synthetic and real-world datasets. Experiments showed that the method proposed in this paper was effective and the average overall accuracy of the data sets was superior to that of other algorithms. Overall, this method can effectively manage concept drift in data stream mining.
备注/Memo
收稿日期:2018-06-07。
基金项目:国家自然科学基金项目(61672086,61702030,61771058);北京市自然科学基金项目(4182052).
作者简介:张本才,男,1994年,硕士研究生,主要研究方向为数据流挖掘;王志海,男,1963年,教授,博士生导师,中国计算机学会高级会员,主要研究方向为机器学习和数据挖掘;孙艳歌,女,1982年,博士研究生,主要研究方向为机器学习和数据挖掘。
通讯作者:王志海.E-mail:zhhwang@bjtu.edu.cn
更新日期/Last Update:
1900-01-01