[1]秦娅,申国伟,余红星.基于Hadoop的大规模网络安全实体识别方法[J].智能系统学报,2019,14(5):1017-1025.[doi:10.11992/tis.201809024]
QIN Ya,SHEN Guowei,YU Hongxing.Large-scale network security entity recognition method based on Hadoop[J].CAAI Transactions on Intelligent Systems,2019,14(5):1017-1025.[doi:10.11992/tis.201809024]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第5期
页码:
1017-1025
栏目:
学术论文—智能系统
出版日期:
2019-09-05
- Title:
-
Large-scale network security entity recognition method based on Hadoop
- 作者:
-
秦娅1,2, 申国伟1,2, 余红星1,2
-
1. 贵州大学 计算机科学与技术学院, 贵州 贵阳 550025;
2. 贵州大学 贵州省公共大数据重点实验室, 贵州 贵阳 550025
- Author(s):
-
QIN Ya1,2, SHEN Guowei1,2, YU Hongxing1,2
-
1. Department of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
2. Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
-
- 关键词:
-
大数据; 异构数据; 网络安全; 知识图谱; 安全实体; 实体识别; 网络数据; Hadoop; CRF算法
- Keywords:
-
big data; heterogeneous data; network security; knowledge graph; security entity; entity recognition; network data; Hadoop; CRF algorithm
- 分类号:
-
TP391.0
- DOI:
-
10.11992/tis.201809024
- 摘要:
-
随着大数据时代的到来,如何从多源异构数据中准确地识别网络安全实体是构建网络安全知识图谱的基础问题。因此本文针对网络安全相关文本数据,研究支持海量网络数据的安全实体识别算法,为构建网络安全知识图谱奠定基础。针对海量的文本类网络数据中安全实体的高效精准抽取问题,本文基于Hadoop分布式计算框架提出改进的条件随机场(conditional random fields,CRF)算法,对数据集进行有效分割,实现安全实体的高效准确识别。在大规模真实网络数据集上的实验证明,本文提出的算法达到了较高的网络安全实体识别准确率,同时提高了识别的效率。
- Abstract:
-
In this era of big data, a fundamental problem for constructing network security knowledge graphs is how to efficiently and accurately identify the network security entities present in multi-source heterogeneous data. This study focuses on text data related to network safety and investigate the use of a security entity recognition algorithm that supports massive-network text data, thereby laying a foundation for building the network security knowledge graph. To efficiently and accurately extract the security entities in massive-network text data, we propose an improved conditional random fields (CRF) algorithm based on the Hadoop distributed computing framework to segment data sets effectively, which realize efficient and accurate recognition of security entities. The experimental results reveal that the proposed security entity recognition algorithm achieved a high precision rate on a large-scale real network data set and improved the efficiency of network security entity recognition..
备注/Memo
收稿日期:2018-09-13。
基金项目:国家自然科学基金项目(61802081);贵州省公共大数据重点实验室开放课题(2017BDKFJJ024);贵州省自然科学基金项目(20161052).
作者简介:秦娅,女,1992年生,硕士研究生,主要研究方向为网络安全知识图谱;申国伟,男,1986年出生,副教授,主要研究方向为大数据、网络与信息安全、数据挖掘;余红星,男,1993年生,硕士研究生,主要研究方向为大数据技术。
通讯作者:申国伟.E-mail:gwshen@gzu.edu.cn
更新日期/Last Update:
1900-01-01