[1]富春岩,葛茂松.一种能够适应概念漂移变化的数据流分类方法[J].智能系统学报,2007,2(04):86-91.
 FU Chun-yan,GE Mao-song.A data stream classification methods adaptive to concept drift[J].CAAI Transactions on Intelligent Systems,2007,2(04):86-91.
点击复制

一种能够适应概念漂移变化的数据流分类方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第2卷
期数:
2007年04期
页码:
86-91
栏目:
出版日期:
2007-08-25

文章信息/Info

Title:
A data stream classification methods adaptive to concept drift
文章编号:
1673-4785(2007)04-0086-06
作者:
富春岩葛茂松
佳木斯大学公共计算机教研部,黑龙江佳木斯154007
Author(s):
FU Chun-yan GE Mao-song
Commonality Teaching Department of Computer, Jiamusi University, Jiamus i 154007,China
关键词:
数据流分类概念漂移在线学习决策树
Keywords:
data streams classification concept drifting onli n e learning decision tree
分类号:
TP311.13
文献标志码:
A
摘要:
目前多数的数据流分类方法都是基于数据稳定分布这一假设,忽略了真实数据在一段时间内会发生潜在概念性的变化,这可能会降低分类模型的预测精度. 针对数据流的特性,提出一种能够识别并适应概念漂移发生的在线分类算法,实验表明它能根据目前概念漂移的状况,自动地调整训练窗口和模型重建期间新样本的个数.
Abstract:
At present, most classification methods for data streams are developed with the assumption of steady data distribution. However, the data collected fr om the real world will change over a period of time in the underlying concepts ( known as concept drifting). This lowers the predictive precision of a classifica tion model. This paper proposes a classification algorithm that can identify and adapt to occurrences of concept drifting according to the characteristics of the data stream. Experiments show that the proposed algorithm dynamically adjusts the size of the training window and the number of new examples during model rec onstruction according to the current rate of concept drifting.

参考文献/References:

[1]QUINLAN J R. Induction on decision trees[J]. Machine Learning,1986,13(1 ):81-106.
[2]QUINLAN J R.C4.5:programs for machine learning[M].San Mateo:Morgan Kaufma nn, 1993.
[3]BREIMAN L,FRIEDMAN J,OLSHEN R,et al.Classification and regression tr ees monterey[M].Belmont:Wadsworth International Group, 1984.
[4]HELMBOLD D P, LONG P M. Tracking drifting concepts by minimizing disagreem ents[J]. Machine Learning, 1994,21(14):27-45.
[5]WANG H, FAN W, YU P,HAN J.Mining conceptdrifting data streams using ensem bl e classifiers[A].The 9th ACM International Conference on Knowledge Discove ry and Data Mining (SIGKDD’s03)[C].New York: ACM Press, 2003.
[6]GANTI V, GEHRKE J, RAMAKRISHNAN R. Mining data streams under block evolut ion[A]. SIGKDD’s02[C]. New York: ACM Press,2002.
[7]WIDMER G,KUBAT M.Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996,23(1):69-101.
[8]DOMINGOS P, HULTEN G. Mining highspeed data streams[A]. In Pro ceeding s o f th e Association for Computing Machinery Sixth International Conference on Knowledg e Discovery and Data Mining[C]. New York: ACM Press, 2000.
[9]PAPADIMITRIOU S, FALOUTSOS C, BROCKWELL A. Adaptive, handsoff stre am mi ning[A]. Proceedings of the 29th International Conference on Very Large Dat a Bases(V LDB’s03)[C]. Berlin:Springer Press, 2003.
[10]AGGARWAL C, HAN J, WANG J, YU P S. On demand classification of data streams [A]. Proc 2004 Int Conf on Knowledge Discovery and Data Mining[C]. New York: ACM Press, 2004
[11】LAST M. Online classification of nonstationary data  streams[J].Intelli gent Data Analysis,2002,6(2):129-147.
[12]DING Q, DING Q, PERRIZO W. Decision tree classification of spatia l data s tr eams using peano count trees[A]. Proceedings of the ACM Symposium on Applied C omputing[C]. New York: ACM Press, 2002.
[13]GABER M M,KRISHNASWAMY S, ZASLAVSKY A.Onboard mining of data streams in se nsor networks[M].Springer Verlag,2003
 [14]MEHTA M, AGRAWAL R, RISSANEN J. SLIQ: A fast scalable classifier for data m ining[A]. Proc 1996 Int Conf Extending Database Technology (EDBT’s96)[C ]. Springer Press, 1996.
[15]SHAFER J, AGRAWAL R, MEHTA M. SPRINT: A fast scalable parallel classifier f or data mining[A]. Proc 1996 Int Conf Very Large Data Bases (VLDB’s96)[ C]. Springer Press, 1996.
[16]MITCHELL T M.Machine learning[M].New York:McGrawHill, 1997.

相似文献/References:

[1]王宏鼎,童云海,谭少华,等.异常点挖掘研究进展[J].智能系统学报,2006,1(01):67.
 WANG Hong-ding,TONG Yun-hai,TAN Shao-hua,et al.Research progress on outlier mining[J].CAAI Transactions on Intelligent Systems,2006,1(04):67.
[2]刘三阳 杜喆.一种改进的模糊支持向量机算法[J].智能系统学报,2007,2(03):30.
 LIU San-yang,DU Zhe.An improved fuzzy support vector machine method[J].CAAI Transactions on Intelligent Systems,2007,2(04):30.
[3]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(04):95.[doi:10.3969/j.issn.1673-4785.201208012]
[4]王定桥,李卫华,杨春燕.从用户需求语句建立问题可拓模型的研究[J].智能系统学报,2015,10(6):865.[doi:10.11992/tis.201507038]
 WANG Dingqiao,LI Weihua,YANG Chunyan.Research on building an extension model from user requirements[J].CAAI Transactions on Intelligent Systems,2015,10(04):865.[doi:10.11992/tis.201507038]
[5]王晓初,包芳,王士同,等.基于最小最大概率机的迁移学习分类算法[J].智能系统学报,2016,11(1):84.[doi:10.11992/tis.201505024]
 WANG Xiaochu,BAO Fang,WANG Shitong,et al.Transfer learning classification algorithms based on minimax probability machine[J].CAAI Transactions on Intelligent Systems,2016,11(04):84.[doi:10.11992/tis.201505024]
[6]刘威,刘尚,周璇.BP神经网络子批量学习方法研究[J].智能系统学报,2016,11(2):226.[doi:10.11992/tis.201509015]
 LIU Wei,LIU Shang,ZHOU Xuan.Subbatch learning method for BP neural networks[J].CAAI Transactions on Intelligent Systems,2016,11(04):226.[doi:10.11992/tis.201509015]
[7]李海林,梁叶.分段聚合近似和数值导数的动态时间弯曲方法[J].智能系统学报,2016,11(2):249.[doi:10.11992/tis.201507064]
 LI Hailin,LIANG Ye.Dynamic time warping based on piecewise aggregate approximation and data derivatives[J].CAAI Transactions on Intelligent Systems,2016,11(04):249.[doi:10.11992/tis.201507064]
[8]胡小生,温菊屏,钟勇.动态平衡采样的不平衡数据集成分类方法[J].智能系统学报,2016,11(2):257.[doi:10.11992/tis.201507015]
 HU Xiaosheng,WEN Juping,ZHONG Yong.Imbalanced data ensemble classification using dynamic balance sampling[J].CAAI Transactions on Intelligent Systems,2016,11(04):257.[doi:10.11992/tis.201507015]
[9]易磊,潘志松,邱俊洋,等.在线学习的大规模网络流量分类研究[J].智能系统学报,2016,11(3):318.[doi:10.11992/tis.201603033]
 YI Lei,PAN Zhisong,QIU Junyang,et al.Large-scale network traffic classification based on online learning[J].CAAI Transactions on Intelligent Systems,2016,11(04):318.[doi:10.11992/tis.201603033]
[10]花小朋,孙一颗,丁世飞.一种改进的投影孪生支持向量机[J].智能系统学报,2016,11(3):384.[doi:10.11992/tis.201603049]
 HUA Xiaopeng,SUN Yike,DING Shifei.An improved projection twin support vector machine[J].CAAI Transactions on Intelligent Systems,2016,11(04):384.[doi:10.11992/tis.201603049]

备注/Memo

备注/Memo:
收稿日期:2007-03-20.
作者简介:
富春岩,女,1974年生,讲师,主要研究方向为现代数据管理技术、数据流、海量数据处理. E-mail:jmsfu@126.com. 
葛茂松,男,1971年生,高级实验师,硕士研究生,主要研究方向为数据挖掘、数据流、海量数据处理.
更新日期/Last Update: 2009-05-07