[1]杨晓峰,李伟,孙明明,等.基于文本聚类的网络攻击检测方法[J].智能系统学报,2014,9(1):40-46.[doi:10.3969/j.issn.1673-4785.201108007]
YANG Xiaofeng,LI Wei,SUN Mingming,et al.Web attack detection method on the basis of text clustering[J].CAAI Transactions on Intelligent Systems,2014,9(1):40-46.[doi:10.3969/j.issn.1673-4785.201108007]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
9
期数:
2014年第1期
页码:
40-46
栏目:
学术论文—自然语言处理与理解
出版日期:
2014-02-25
- Title:
-
Web attack detection method on the basis of text clustering
- 作者:
-
杨晓峰1, 李伟1,2, 孙明明1, 胡雪蕾1
-
1. 南京理工大学 计算机科学与技术学院, 江苏 南京 210094;
2. 哈佛医学院 Dana-Farber癌症研究所, 波士顿 马萨诸塞州 02115, 美国
- Author(s):
-
YANG Xiaofeng1, LI Wei1,2, SUN Mingming1, HU Xuelei1
-
1. School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China;
2. Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
-
- 关键词:
-
网络攻击; 网络攻击检测; 文本聚类; 非监督检测算法
- Keywords:
-
Web attack; Web attack detection; text clustering; unsupervised detection algorithm
- 分类号:
-
TP393
- DOI:
-
10.3969/j.issn.1673-4785.201108007
- 摘要:
-
针对Web服务应用的攻击是近年来网络上广泛传播的攻击方式, 现有的攻击检测算法多采用监督学习的方法确定正常行为和攻击行为的分类边界;但由于监督检测模型在检测之前需要复杂的学习过程, 往往会降低系统的实用效果。因此, 根据现实中正常访问样本和攻击样本在数量和分布上的差异, 提出了一种基于文本聚类的非监督检测算法。算法首先采用迭代聚类过程聚类样本, 直至聚为一类;同时根据异常与正常样本的分布规律, 在聚类过程中选择最优的最大类别作为正常样本类, 将其余的作为异常样本类。最优方案的选择采用了使得分类误差最小的原则确定。实验表明, 与多种经典检测方法相比, 该方法省去了复杂的学习过程, 增强了方法的适应性, 具有较好的检测率和误报率。
- Abstract:
-
The attacks aiming at Web service applications within the past several years have become more widely-propagated, and the present attack detection algorithms mostly use the supervision study to determine the border between normal the behavior and attack behavior; however, for the supervision and detection model, before the detection, a complex studying process is necessary, this will lower the practical effects of the system. Therefore, on the basis of the realistic difference between the normal visit specimen and the attack specimen on the aspects of quantity and distribution, an unsupervised detection algorithm based on text clustering is proposed. In the algorithm, firstly, the iteratively clustered process is applied to cluster specimens, until reaching a category; in addition, according to the distribution law of the abnormal and normal specimens, in the clustering process, the optimal maximum category is considered as the normal specimen category and the others are considered as an abnormal specimen category. The optimal scheme is determined on the basis of the principle of the minimum classification error. The experiment shows that, in comparison with many traditional detection methods, the method used in this paper omits complex study processes and improves adaptability; the detection rate and the false positive rate are excellent.
备注/Memo
收稿日期:2011-08-29。
基金项目:国家自然科学基金资助项目(60705020);江苏省自然科学基金资助项目(BK207594).
作者简介:杨晓峰,男,1982年生,博士研究生,主要研究方向为网络安全、机器学习;孙明明,男,1981年生,讲师,主要研究方向为模式识别、机器学习。
通讯作者:李伟,男,1978年生,博士,主要研究方向为复杂网络、模式识别、机器学习.E-mail:liweinust@hotmail.com.
更新日期/Last Update:
1900-01-01