[1]易磊,潘志松,邱俊洋,等.在线学习的大规模网络流量分类研究[J].智能系统学报编辑部,2016,11(3):318-327.[doi:10.11992/tis.201603033]
YI Lei,PAN Zhisong,QIU Junyang,et al.Large-scale network traffic classification based on online learning[J].CAAI Transactions on Intelligent Systems,2016,11(3):318-327.[doi:10.11992/tis.201603033]
点击复制
《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷:
11
期数:
2016年第3期
页码:
318-327
栏目:
学术论文—知识工程
出版日期:
2016-06-25
- Title:
-
Large-scale network traffic classification based on online learning
- 作者:
-
易磊, 潘志松, 邱俊洋, 薛胶, 任会峰
-
中国人民解放军理工大学 指挥信息系统学院, 江苏 南京 210007
- Author(s):
-
YI Lei, PAN Zhisong, QIU Junyang, XUE Jiao, REN Huifeng
-
Institute of Command Information System, PLA University of Science and Technology, Nanjing 210007, China
-
- 关键词:
-
在线学习; 大规模; 网络流量分类; 时序相关性; 数据流; 随机优化
- Keywords:
-
online learning; large-scale; traffic classification; timing correlation; data stream; stochastic optimization
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.201603033
- 摘要:
-
传统的批处理机器学习方法在面对大规模网络流量分类问题时存在分类器训练速度慢、计算复杂度高的缺陷。近年来迅速发展的在线学习方法是解决大规模问题的有效途径。本文针对高速骨干网上的大规模网络流量分类问题,提出了一个基于在线学习的分类框架,并应用了8种在线学习算法。在真实数据集上的实验表明,在分类精度相当的情况下,在线学习算法与支持向量机(SVM)相比空间开销小、模型训练时间显著缩短。同时,为了考察网络流量中样本顺序对分类效果的影响,本文对比了样本按时序处理与随机处理两种方式的差异,验证了网络流量样本存在着时序上的相关性。
- Abstract:
-
Facing the challenges of large-scale network traffic classification problem, traditional batch machine learning algorithms suffer from slow training process and high computational complexity. In recent years, the rapid developing online learning technology is an effective way to solve large-scale problems. To address the issue of large-scale network traffic classification problem on a high-speed backbone network, we proposed a traffic classification scheme based on online learning and applied eight online learning algorithms. Experiments on real network traffic data sets showed that in the classification accuracy similar situation, online learning algorithm has less space overhead and training time than the support vector machine. Meanwhile, to examine the impact of the order of network traffic samples on the classification results, this paper compared the difference between the two ways of processing samples, sequentially and random, we verified that the presence of timing correlation in network traffic samples by comparing online learning and stochastic optimization.
备注/Memo
收稿日期:2016-3-18;改回日期:。
基金项目:国家自然科学基金项目(61473149).
作者简介:易磊,男,1991年生,硕士研究生,主要研究方向为机器学习及其在大规模网络流量分类中的应用。潘志松,男,1973年生,教授,博士生导师,江苏省计算机学会模式识别与人工智能专委会委员,主要研究方向为模式识别、机器学习、网络安全。主持国家科研项目多项,发表学术论文30余篇。邱俊洋,男,1989年生,博士研究生,主要研究方向为机器学习及其在大规模网络数据流异常检测中的应用,发表学术论文2篇。
通讯作者:易磊.E-mail:yileinjut@163.com.
更新日期/Last Update:
1900-01-01