[1]王宇晖,杜军平,邵蓥侠.基于Transformer与技术词信息的知识产权实体识别方法[J].智能系统学报,2023,18(1):186-193.[doi:10.11992/tis.202203036]
WANG Yuhui,DU Junping,SHAO Yingxia.An intellectual property entity recognition method based on Transformer and technological word information[J].CAAI Transactions on Intelligent Systems,2023,18(1):186-193.[doi:10.11992/tis.202203036]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
186-193
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2023-01-05
- Title:
-
An intellectual property entity recognition method based on Transformer and technological word information
- 作者:
-
王宇晖1,2, 杜军平1,2, 邵蓥侠1,2
-
1. 北京邮电大学 计算机学院,北京 100876;
2. 北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876
- Author(s):
-
WANG Yuhui1,2, DU Junping1,2, SHAO Yingxia1,2
-
1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
-
- 关键词:
-
中文命名实体识别; 知识产权; Transformer编码器; 信息融合; 向量表示; 科技大数据; 专利; 深度学习
- Keywords:
-
entity recognition named in Chinese; intellectual property; Transformer encoder; information fusion; vector representation; science and technology big data; patent; deep learning
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202203036
- 摘要:
-
专利文本中包含了大量实体信息,通过命名实体识别可以从中抽取包含关键信息的知识产权实体信息,帮助研究人员更快了解专利内容。现有的命名实体提取方法难以充分利用专业词汇变化带来的词层面的语义信息。本文提出基于Transformer和技术词信息的知识产权实体提取方法,结合BERT语言方法提供精准的字向量表示,并在字向量生成过程中,加入利用字向量经迭代膨胀卷积网络提取的技术词信息,提高对知识产权实体的表征能力。最后使用引入相对位置编码的Transformer编码器,从字向量序列中学习文本的深层语义信息,并实现实体标签预测。在公开数据集和标注的专利数据集的实验结果表明,该方法提升了实体识别的准确性。
- Abstract:
-
Patent text contains abundant entity information, from which the intellectual property (IP) entity information containing key information can be extracted through named entity recognition, which helps researchers understand patent content faster. For the existing named entity extraction method, the semantic information at the word level brought by a change in technical words is difficult to fully use. In this paper, the IP entity information extraction method based on Transformer and technical word information is proposed, which provides exact word vector representation based on the BERT language model. In the process of word vector generation, this method improves the representation ability of IP entities by adding the technical word information extracted by iterated dilated convolution neural network. Finally, the Transformer encoder with relative position coding is used to learn the deep semantic information of the text from the word vector sequence, realizing the prediction of the entity label. Experimental results on public and annotated patent datasets show that this method improves entity recognition accuracy.
备注/Memo
收稿日期:2022-03-21。
基金项目:国家重点研发计划项目(2018YFB1402600);国家自然科学基金项目(61772083).
作者简介:王宇晖,硕士研究生,CCF会员,主要研究方向为自然语言处理和数据挖掘;杜军平,教授,CCF会士,主要研究方向为人工智能、机器学习和模式识别。荣获吴文俊人工智能自然科学奖二等奖;邵蓥侠,副教授,CCF高级会员,主要研究方向为大规模图分析、并行计算框架和知识图谱分析
通讯作者:杜军平.E-mail:junpingdu@126.com
更新日期/Last Update:
1900-01-01