<-Previous Article Next Article->

[1]MAO Mingyi,WU Chen,ZHONG Yixin,et al.BERT named entity recognition model with self-attention mechanism[J].CAAI Transactions on Intelligent Systems,2020,15(4):772-779.[doi:10.11992/tis.202003003]

Copy

BERT named entity recognition model with self-attention mechanism

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 15 Number of periods: 2020 4 Page number: 772-779 Column: 吴文俊人工智能科学技术奖论坛 Public date: 2020-07-05

Title:: BERT named entity recognition model with self-attention mechanism

Author(s):: MAO Mingyi¹; WU Chen¹; ZHONG Yixin²; CHEN Zhicheng²; 1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China;
2. School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China

Keywords:: named entity recognition; bidirectional encoder representation from transformers; self-attention mechanism; deep learning; conditional random field; natural language processing; bi-directional long short-term memory; sequence tagging

CLC:: TP391

DOI:: 10.11992/tis.202003003

Abstract:: Named entity recognition is a part of lexical analysis in the field of natural language processing. It is the basis for a computer to correctly understand natural language. In order to strengthen the recognition effect of the model on named entities, in this study, the pre-trained model BERT (bidirectional encoder representation from transformers) was used as the embedding layer of the model, and fixed parameter embedding was adopted to solve the problem of high computer performance required for BERT fine-tuning training. A BERT-BiLSTM-CRF model was built, and on the basis of this model, two improved experiments were carried out. Method one is to continue to add a self-attention layer. Experimental results show that the addition of the self-attention layer does not significantly improve the recognition effect of the model. Method two is to reduce the number of embedding layers of the BERT model. Experimental results show that moderately reducing the number of BERT embedding layers can improve the model’s named entity recognition accuracy, while saving the overall training time of the model. When using 9-layer embedding, thevalue on the MSRA Chinese data set increased to 94.79%, and thevalue on the Weibo Chinese data set reached 68.82%.

References:: [1] 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340
LIU Liu, WANG Dongbo. A review on named entity recognition[J]. Journal of the China society for scientific and technical information, 2018, 37(3): 329-340
[2] BIKEL D M. An algorithm that learns what’s in a name[J]. Machine learning, 1999, 34(1/2/3): 211-231.
[3] MAYFIELD J, MCNAMEE P, PIATKO C D, et al. Named entity recognition using hundreds of thousands of features[C]//North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, 2003: 184-187.
[4] MCCALLUM A, LI W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons[C]//North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, 2003: 188-191.
[5] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(1): 2493-2537.
[6] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL].[2015-08-09]. https://arxiv.org/abs/1508.01991.
[7] MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, 2016: 1064-1074.
[8] LUO L, YANG Z, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J]. Bioinformatics, 2018, 34(8): 1381-1388.
[9] YANG Y, CHEN W, LI Z, et al. Distantly supervised NER with partial annotation learning and reinforcement learning[C]//International Conference on Computational Linguistics. Santa Fe, USA, 2018: 2159-2169.
[10] 彭嘉毅, 方勇, 黄诚, 等. 基于深度主动学习的信息安全领域命名实体识别研究[J]. 四川大学学报(自然科学版), 2019, 56(3): 457-462
PENG Jiayi, FANG Yong, HUANG Cheng, et al. Cyber security named entity recognition based on deep active learning[J]. Journal of sichuan university (natural science edition), 2019, 56(3): 457-462
[11] 朱艳辉, 李飞, 冀相冰, 等. 反馈式K近邻语义迁移学习的领域命名实体识别[J]. 智能系统学报, 2019, 14(4): 820-830
ZHU Yanhui, LI Fei, JI Xiangbing, et al. Domain-named entity recognition based on feedback k-nearest semantic transfer learning[J]. CAAI transactions on intelligent systems, 2019, 14(4): 820-830
[12] 王红斌, 沈强, 线岩团. 融合迁移学习的中文命名实体识别[J]. 小型微型计算机系统, 2017, 38(2): 346-351
WANG Hongbin, SHEN Qiang, XIAN Yantuan. Research on Chinese named entity recognition fusing transfer learning[J]. Journal of Chinese computer systems, 2017, 38(2): 346-351
[13] 冯鸾鸾, 李军辉, 李培峰, 等. 面向国防科技领域的技术和术语识别方法研究[J]. 计算机科学, 2019, 46(12): 231-236
FENG Luanluan, LI Junhui, LI Peifeng, et al. Technology and terminology detection oriented national defense science[J]. Computer science, 2019, 46(12): 231-236
[14] 杨维, 孙德艳, 张晓慧, 等. 面向电力智能问答系统的命名实体识别算法[J]. 计算机工程与设计, 2019, 40(12): 3625-3630
YANG Wei, SUN Deyan, ZHANG Xiaohui, et al. Named entity recognition for intelligent answer system in power service[J]. Computer engineering and design, 2019, 40(12): 3625-3630
[15] 李冬梅, 檀稳. 植物属性文本的命名实体识别方法研究[J]. 计算机科学与探索, 2019, 13(12): 2085-2093
LI Dongmei, TAN Wen. Research on named entity recognition method in plant attribute text[J]. Journal of frontiers of computer science and technology, 2019, 13(12): 2085-2093
[16] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Annual meeting of the association for computational linguistics. Melbourne, Australia, 2018: 1554-1564.
[17] 李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版), 2019, 59(6): 461-467
LI Mingyang, KONG Fang. Combined self-attention mechanism for named entity recognition in social media[J]. Journal of Tsinghua university (science and technology edition), 2019, 59(6): 461-467
[18] CAO P, CHEN Y, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 2018: 182-192.
[19] PETERS M E, RUDER S, SMITH N A, et al. To tune or not to tune? Adapting pretrained representations to diverse tasks.[C]//Proceedings of the 4th Workshop on Representation Llearning for NLP. Florence, Italy, 2019: 7-14.
[20] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. Computation and language, 2018(10): 1810-4805.
[21] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding[C]// Neural Information Processing Systems. Vancouver, Canada, 2019: 5753-5763.
[22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[23] LAFFERTY J, MCCALLUM A, PEREIRA F, et al. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//International Conference on Machine Learning. San Francisco, USA, 2001: 282-289.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 5998-6008.
[25] HE H, SUN X. F-Score driven max margin neural network for named entity recognition in Chinese social media[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, 2017: 713-718.

Similar References:

Memo

Last Update: 2020-07-25

BERT named entity recognition model with self-attention mechanism PDF DownloadHTML

Memo

BERT named entity recognition model with self-attention mechanism

PDF Download HTML