[1]毛明毅,吴晨,钟义信,等.加入自注意力机制的BERT命名实体识别模型[J].智能系统学报,2020,15(4):772-779.[doi:10.11992/tis.202003003]
 MAO Mingyi,WU Chen,ZHONG Yixin,et al.BERT named entity recognition model with self-attention mechanism[J].CAAI Transactions on Intelligent Systems,2020,15(4):772-779.[doi:10.11992/tis.202003003]
点击复制

加入自注意力机制的BERT命名实体识别模型(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年4期
页码:
772-779
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2020-10-30

文章信息/Info

Title:
BERT named entity recognition model with self-attention mechanism
作者:
毛明毅1 吴晨1 钟义信2 陈志成2
1. 北京工商大学 计算机与信息工程学院,北京 100048;
2. 北京邮电大学 计算机学院,北京 100876
Author(s):
MAO Mingyi1 WU Chen1 ZHONG Yixin2 CHEN Zhicheng2
1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China;
2. School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
关键词:
命名实体识别BERT自注意力机制深度学习条件随机场自然语言处理双向长短期记忆网络序列标注
Keywords:
named entity recognitionbidirectional encoder representation from transformersself-attention mechanismdeep learningconditional random fieldnatural language processingbi-directional long short-term memorysequence tagging
分类号:
TP391
DOI:
10.11992/tis.202003003
摘要:
命名实体识别属于自然语言处理领域词法分析中的一部分,是计算机正确理解自然语言的基础。为了加强模型对命名实体的识别效果,本文使用预训练模型BERT(bidirectional encoder representation from transformers)作为模型的嵌入层,并针对BERT微调训练对计算机性能要求较高的问题,采用了固定参数嵌入的方式对BERT进行应用,搭建了BERT-BiLSTM-CRF模型。并在该模型的基础上进行了两种改进实验。方法一,继续增加自注意力(self-attention)层,实验结果显示,自注意力层的加入对模型的识别效果提升不明显。方法二,减小BERT模型嵌入层数。实验结果显示,适度减少BERT嵌入层数能够提升模型的命名实体识别准确性,同时又节约了模型的整体训练时间。采用9层嵌入时,在MSRA中文数据集上F1值提升至94.79%,在Weibo中文数据集上F1值达到了68.82%。
Abstract:
Named entity recognition is a part of lexical analysis in the field of natural language processing. It is the basis for a computer to correctly understand natural language. In order to strengthen the recognition effect of the model on named entities, in this study, the pre-trained model BERT (bidirectional encoder representation from transformers) was used as the embedding layer of the model, and fixed parameter embedding was adopted to solve the problem of high computer performance required for BERT fine-tuning training. A BERT-BiLSTM-CRF model was built, and on the basis of this model, two improved experiments were carried out. Method one is to continue to add a self-attention layer. Experimental results show that the addition of the self-attention layer does not significantly improve the recognition effect of the model. Method two is to reduce the number of embedding layers of the BERT model. Experimental results show that moderately reducing the number of BERT embedding layers can improve the model’s named entity recognition accuracy, while saving the overall training time of the model. When using 9-layer embedding, thevalue on the MSRA Chinese data set increased to 94.79%, and thevalue on the Weibo Chinese data set reached 68.82%.

参考文献/References:

[1] 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340
LIU Liu, WANG Dongbo. A review on named entity recognition[J]. Journal of the China society for scientific and technical information, 2018, 37(3): 329-340
[2] BIKEL D M. An algorithm that learns what’s in a name[J]. Machine learning, 1999, 34(1/2/3): 211-231.
[3] MAYFIELD J, MCNAMEE P, PIATKO C D, et al. Named entity recognition using hundreds of thousands of features[C]//North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, 2003: 184-187.
[4] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, 2003: 188-191.
[5] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(1): 2493-2537.
[6] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL].[2015-08-09]. https://arxiv.org/abs/1508.01991.
[7] MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, 2016: 1064-1074.
[8] LUO L, YANG Z, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J]. Bioinformatics, 2018, 34(8): 1381-1388.
[9] YANG Y, CHEN W, LI Z, et al. Distantly supervised NER with partial annotation learning and reinforcement learning[C]//International Conference on Computational Linguistics. Santa Fe, USA, 2018: 2159-2169.
[10] 彭嘉毅, 方勇, 黄诚, 等. 基于深度主动学习的信息安全领域命名实体识别研究[J]. 四川大学学报(自然科学版), 2019, 56(3): 457-462
PENG Jiayi, FANG Yong, HUANG Cheng, et al. Cyber security named entity recognition based on deep active learning[J]. Journal of sichuan university (natural science edition), 2019, 56(3): 457-462
[11] 朱艳辉, 李飞, 冀相冰, 等. 反馈式K近邻语义迁移学习的领域命名实体识别[J]. 智能系统学报, 2019, 14(4): 820-830
ZHU Yanhui, LI Fei, JI Xiangbing, et al. Domain-named entity recognition based on feedback k-nearest semantic transfer learning[J]. CAAI transactions on intelligent systems, 2019, 14(4): 820-830
[12] 王红斌, 沈强, 线岩团. 融合迁移学习的中文命名实体识别[J]. 小型微型计算机系统, 2017, 38(2): 346-351
WANG Hongbin, SHEN Qiang, XIAN Yantuan. Research on Chinese named entity recognition fusing transfer learning[J]. Journal of Chinese computer systems, 2017, 38(2): 346-351
[13] 冯鸾鸾, 李军辉, 李培峰, 等. 面向国防科技领域的技术和术语识别方法研究[J]. 计算机科学, 2019, 46(12): 231-236
FENG Luanluan, LI Junhui, LI Peifeng, et al. Technology and terminology detection oriented national defense science[J]. Computer science, 2019, 46(12): 231-236
[14] 杨维, 孙德艳, 张晓慧, 等. 面向电力智能问答系统的命名实体识别算法[J]. 计算机工程与设计, 2019, 40(12): 3625-3630
YANG Wei, SUN Deyan, ZHANG Xiaohui, et al. Named entity recognition for intelligent answer system in power service[J]. Computer engineering and design, 2019, 40(12): 3625-3630
[15] 李冬梅, 檀稳. 植物属性文本的命名实体识别方法研究[J]. 计算机科学与探索, 2019, 13(12): 2085-2093
LI Dongmei, TAN Wen. Research on named entity recognition method in plant attribute text[J]. Journal of frontiers of computer science and technology, 2019, 13(12): 2085-2093
[16] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Annual meeting of the association for computational linguistics. Melbourne, Australia, 2018: 1554-1564.
[17] 李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版), 2019, 59(6): 461-467
LI Mingyang, KONG Fang. Combined self-attention mechanism for named entity recognition in social media[J]. Journal of Tsinghua university (science and technology edition), 2019, 59(6): 461-467
[18] CAO P, CHEN Y, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 2018: 182-192.
[19] PETERS M E, RUDER S, SMITH N A, et al. To tune or not to tune? Adapting pretrained representations to diverse tasks.[C]//Proceedings of the 4th Workshop on Representation Llearning for NLP. Florence, Italy, 2019: 7-14.
[20] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. Computation and language, 2018(10): 1810-4805.
[21] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Neural Information Processing Systems. Vancouver, Canada, 2019: 5753-5763.
[22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[23] LAFFERTY J, MCCALLUM A, PEREIRA F, et al. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//International Conference on Machine Learning. San Francisco, USA, 2001: 282-289.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 5998-6008.
[25] HE H, SUN X. F-Score driven max margin neural network for named entity recognition in Chinese social media[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, 2017: 713-718.

备注/Memo

备注/Memo:
收稿日期:2020-03-02。
基金项目:北京市自然科学基金项目(4202016)
作者简介:毛明毅,副教授,博士,中国人工智能学会高级会员,主要研究方向为人工智能基础理论、泛逻辑学,主持和参与国家自然基金项目和北京市自然科学基金项目及其他纵向课题8项,主持横向课题10余项,获专利授权和软件著作权10余项,获得全国竞赛“优秀指导教师”等多种荣誉。发表学术论文50余篇,出版专著2部;吴晨,硕士研究生,主要研究方向为人工智能基础、智能机器人、自然语言理解;钟义信,教授,博士生导师,发展中世界工程技术科学院院士,中国人工智能学会原理事长,现任国际信息研究学会中国分会主席,北京邮电大学?格分维人工智能联合实验室学术委员会主任,主要研究方向为通信理论、信息科学、人工智能。主持国家级和省部级项目数十项。先后提出和建立“全信息理论”“全信息自然语言理解理论”“机制主义人工智能统一理论”以及“机器知行学”理论,发现和总结了“信息转换与智能创生定律”,先后获得“有突出贡献的归国留学人员”、“全国优秀教师”等称号;获得首届吴文俊科学技术成就奖和首届中国电子学会信息理论杰出贡献奖。发表学术论文500余篇,出版学术专著18部
通讯作者:毛明毅.E-mail:maomy@th.btbu.edu.cn
更新日期/Last Update: 2020-07-25