<-上一篇/Previous Article 下一篇/Next Article->

[1]秦娅,申国伟,余红星.基于Hadoop的大规模网络安全实体识别方法[J].智能系统学报,2019,14(5):1017-1025.[doi:10.11992/tis.201809024]
　QIN Ya,SHEN Guowei,YU Hongxing.Large-scale network security entity recognition method based on Hadoop[J].CAAI Transactions on Intelligent Systems,2019,14(5):1017-1025.[doi:10.11992/tis.201809024]

点击复制

基于Hadoop的大规模网络安全实体识别方法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 14 期数: 2019年第5期页码: 1017-1025 栏目: 学术论文—智能系统出版日期: 2019-09-05

Title:: Large-scale network security entity recognition method based on Hadoop

作者:: 秦娅^1,2, 申国伟^1,2, 余红星^1,2; 1. 贵州大学计算机科学与技术学院, 贵州贵阳 550025;
2. 贵州大学贵州省公共大数据重点实验室, 贵州贵阳 550025

Author(s):: QIN Ya^1,2, SHEN Guowei^1,2, YU Hongxing^1,2; 1. Department of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
2. Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China

关键词:: 大数据; 异构数据; 网络安全; 知识图谱; 安全实体; 实体识别; 网络数据; Hadoop; CRF算法

Keywords:: big data; heterogeneous data; network security; knowledge graph; security entity; entity recognition; network data; Hadoop; CRF algorithm

分类号:: TP391.0

DOI:: 10.11992/tis.201809024

摘要:: 随着大数据时代的到来，如何从多源异构数据中准确地识别网络安全实体是构建网络安全知识图谱的基础问题。因此本文针对网络安全相关文本数据，研究支持海量网络数据的安全实体识别算法，为构建网络安全知识图谱奠定基础。针对海量的文本类网络数据中安全实体的高效精准抽取问题，本文基于Hadoop分布式计算框架提出改进的条件随机场（conditional random fields，CRF）算法，对数据集进行有效分割，实现安全实体的高效准确识别。在大规模真实网络数据集上的实验证明，本文提出的算法达到了较高的网络安全实体识别准确率，同时提高了识别的效率。

Abstract:: In this era of big data, a fundamental problem for constructing network security knowledge graphs is how to efficiently and accurately identify the network security entities present in multi-source heterogeneous data. This study focuses on text data related to network safety and investigate the use of a security entity recognition algorithm that supports massive-network text data, thereby laying a foundation for building the network security knowledge graph. To efficiently and accurately extract the security entities in massive-network text data, we propose an improved conditional random fields (CRF) algorithm based on the Hadoop distributed computing framework to segment data sets effectively, which realize efficient and accurate recognition of security entities. The experimental results reveal that the proposed security entity recognition algorithm achieved a high precision rate on a large-scale real network data set and improved the efficiency of network security entity recognition..

参考文献/References:: [1] 廖建新. 大数据技术的应用现状与展望[J]. 电信科学, 2015, 31(7):1-12 LIAO Jianxin. Big data technology:current applications and prospects[J]. Telecommunications science, 2015, 31(7):1-12
[2] 单琳. 网络威胁情报发展现状综述[J]. 保密科学技术, 2016(8):28-33 SHAN Lin. Overview of the development status of cyber threat intelligence[J]. Security science and technology, 2016(8):28-33
[3] 南湘浩, 陈钟. 网络安全技术概论[J]. 计算机安全, 2003(30):76 NAN Xianghao, CHEN Zhong. Introduction to network security technology[J]. Computer security, 2003(30):76
[4] 陈兴蜀, 曾雪梅, 王文贤, 等. 基于大数据的网络安全与情报分析[J]. 工程科学与技术, 2017, 49(3):1-12 CHEN Xingshu, ZENG Xuemei, WANG Wenxian, et al. Big data analytics for network security and intelligence[J]. Advanced engineering sciences, 2017, 49(3):1-12
[5] 张晓艳, 王挺, 陈火旺. 命名实体识别研究[J]. 计算机科学, 2005, 32(4):44-48 ZHANG Xiaoyan, WANG Ting, CHRN Huowang. Research on named entity recognition[J]. Computer science, 2005, 32(4):44-48
[6] JONES C L, BRIDGES R A, HUFFE K M T, et al. Towards a relation extraction framework for cyber-security concepts[C]//Proceedings of the 10th Annual Cyber and Information Security Research Conference. Oak Ridge, USA, 2015:11.
[7] JOSHI A, LAL R, FININ T, et al. Extracting cybersecurity related linked data from text[C]//Proceedings of 2013 IEEE Seventh International Conference on Semantic Computing. Irvine, USA, 2013:252-259.
[8] LAL R. Information extraction of security related entities and concepts from unstructured text[D]. Baltimore County:University of Maryland, 2013.
[9] MULWAD V, LI Wenjia, JOSHI A, et al. Extracting information about security vulnerabilities from web text[C]//Proceedings of 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology. Lyon, France, 2011:257-260.
[10] 翟菊叶, 陈春燕, 张钰, 等. 基于CRF与规则相结合的中文电子病历命名实体识别研究[J]. 包头医学院学报, 2017, 33(11):124-125, 130 ZHAI Juye, CHEN Chunyan, ZHANG Yu, et al. A study on the named entity recognition of Chinese electronic medical record based on combination of CRF and rules[J]. Journal of Baotou Medical College, 2017, 33(11):124-125, 130
[11] 张晓艳, 王挺, 陈火旺. 基于混合统计模型的汉语命名实体识别方法[J]. 计算机工程与科学, 2006, 28(6):135-139 ZHANG Xiaoyan, WANG Ting, CHEN Huowang. A mixed statistical model-based method for chinese named entity recognition[J]. Computer engineering and science, 2006, 28(6):135-139
[12] 徐梓豪. 基于统计模型的中文命名实体识别方法研究及应用[D]. 北京:北京化工大学, 2017. XU Zihao. Statistical model based Chinese named entity recognition methods and its application to medical records[D]. Beijing:Beijing University of Chemical Technology, 2017.
[13] RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[M]//WAIBEL A, LEE K F. Readings in Speech Recognition. San Francisco:Morgan Kaufmann, 1990:267-296.
[14] KOELING R. Chunking with maximum entropy models[C]//Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Lisbon, Portugal, 2000:139-141.
[15] 郑秋生, 刘守喜. 基于CRF的互联网文本命名实体识别研究[J]. 中原工学院学报, 2016, 27(1):70-73, 95 ZHRNG Qiusheng, LIU Shouxi. Research of web text named entity recognition based on CRF[J]. Journal of Zhongyuan University of Technology, 2016, 27(1):70-73, 95
[16] 朱颢东, 杨立志, 丁温雪, 等. 基于主题标签和CRF的中文微博命名实体识别[J]. 华中师范大学学报(自然科学版), 2018, 52(3):316-321 ZHU Haodong, YANG Lizhi, DING Wenxue, et al. Named entity recognition of Chinese microblog based on theme tag and CRF[J]. Journal of Central China Normal University (Natural Sciences), 2018, 52(3):316-321
[17] TELNOV Y, SAVICHEV I. Ontology-based competency management:infrastructures for the knowledge intensive learning organization[C]//Proceedings of the 1st International Early Research Career Enhancement School. Cham, Switzerland, 2016:249-256.
[18] IANNACONE M, BOHN S, NAKAMURA G, et al. Developing an ontology for cyber security knowledge graphs[C]//Proceedings of the 10th Annual Cyber and Information Security Research Conference. Oak Ridge, USA, 2015:12.
[19] DEAN J, GHEMAWAT S. MapReduce:simplified data processing on large clusters[C]//Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation. San Francisco, USA, 2004:10.
[20] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, USA, 2001:282-289.

相似文献/References:: [1]辛雨璇,闫子飞.基于手绘草图的图像检索技术研究进展[J].智能系统学报,2015,10(2):167.[doi:10.3969/j.issn.1673-4785.201401045]
　XIN Yuxuan,YAN Zifei.Research progress of image retrieval based on hand-drawn sketches[J].CAAI Transactions on Intelligent Systems,2015,10():167.[doi:10.3969/j.issn.1673-4785.201401045]
[2]王德文,孙志伟.一种基于内存计算的电力用户聚类分析方法[J].智能系统学报,2015,10(4):569.[doi:10.3969/j.issn.1673-4785.201411011]
　WANG Dewen,SUN Zhiwei.A method for cluster analysis of electric power consumers based on in-memory computing[J].CAAI Transactions on Intelligent Systems,2015,10():569.[doi:10.3969/j.issn.1673-4785.201411011]
[3]申彦,朱玉全.CMP上基于数据集划分的K-means多核优化算法[J].智能系统学报,2015,10(4):607.[doi:10.3969/j.issn.1673-4785.201411036]
　SHEN Yan,ZHU Yuquan.An optimized algorithm of K-means based on data set partition on CMP systems[J].CAAI Transactions on Intelligent Systems,2015,10():607.[doi:10.3969/j.issn.1673-4785.201411036]
[4]黄河燕,曹朝,冯冲.大数据情报分析发展机遇及其挑战[J].智能系统学报,2016,11(6):719.[doi:10.11992/tis.201610025]
　HUANG Heyan,CAO Zhao,FENG Chong.Opportunities and challenges of big data intelligence analysis[J].CAAI Transactions on Intelligent Systems,2016,11():719.[doi:10.11992/tis.201610025]
[5]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[6]苗夺谦,张清华,钱宇华,等.从人类智能到机器实现模型——粒计算理论与方法[J].智能系统学报,2016,11(6):743.[doi:10.11992/tis.201612014]
　MIAO Duoqian,ZHANG Qinghua,QIAN Yuhua,et al.From human intelligence to machine implementation model: theories and applications based on granular computing[J].CAAI Transactions on Intelligent Systems,2016,11():743.[doi:10.11992/tis.201612014]
[7]严新平,柳晨光.智能航运系统的发展现状与趋势[J].智能系统学报,2016,11(6):807.[doi:10.11992/tis.201605007]
　YAN Xinping,LIU Chenguang.Review and prospect for intelligent waterway transportation system[J].CAAI Transactions on Intelligent Systems,2016,11():807.[doi:10.11992/tis.201605007]
[8]许立波,潘旭伟,袁平,等.知识智能涌现创新：概念、体系与路径[J].智能系统学报,2017,12(1):47.[doi:10.11992/tis.201610014]
　XU Libo,PAN Xuwei,YUAN Ping,et al.Knowledge innovation by intelligent emergence—concept, framework and its pathway[J].CAAI Transactions on Intelligent Systems,2017,12():47.[doi:10.11992/tis.201610014]
[9]何明,常盟盟,刘郭洋,等.基于SQL-on-Hadoop查询引擎的日志挖掘及其应用[J].智能系统学报,2017,12(5):717.[doi:10.11992/tis.201706016]
　HE Ming,CHANG Mengmeng,LIU Guoyang,et al.Log mining and application based on sql-on-hadoop query engine[J].CAAI Transactions on Intelligent Systems,2017,12():717.[doi:10.11992/tis.201706016]
[10]马钰,张岩,王宏志,等.面对智能导诊的个性化推荐算法[J].智能系统学报,2018,13(3):352.[doi:10.11992/tis.201711036]
　MA Yu,ZHANG Yan,WANG Hongzhi,et al.A personalized recommendation algorithm for intelligent guidance[J].CAAI Transactions on Intelligent Systems,2018,13():352.[doi:10.11992/tis.201711036]

备注/Memo

收稿日期:2018-09-13。
基金项目:国家自然科学基金项目（61802081）；贵州省公共大数据重点实验室开放课题（2017BDKFJJ024）；贵州省自然科学基金项目（20161052）.
作者简介:秦娅,女,1992年生,硕士研究生,主要研究方向为网络安全知识图谱;申国伟,男,1986年出生,副教授,主要研究方向为大数据、网络与信息安全、数据挖掘;余红星,男,1993年生,硕士研究生,主要研究方向为大数据技术。
通讯作者:申国伟.E-mail:gwshen@gzu.edu.cn

更新日期/Last Update: 1900-01-01

基于Hadoop的大规模网络安全实体识别方法 PDF下载HTML

备注/Memo

基于Hadoop的大规模网络安全实体识别方法

PDF下载 HTML