[1]WANG Yuhui,DU Junping,SHAO Yingxia.An intellectual property entity recognition method based on Transformer and technological word information[J].CAAI Transactions on Intelligent Systems,2023,18(1):186-193.[doi:10.11992/tis.202203036]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
18
Number of periods:
2023 1
Page number:
186-193
Column:
吴文俊人工智能科学技术奖论坛
Public date:
2023-01-05
- Title:
-
An intellectual property entity recognition method based on Transformer and technological word information
- Author(s):
-
WANG Yuhui1; 2; DU Junping1; 2; SHAO Yingxia1; 2
-
1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
-
- Keywords:
-
entity recognition named in Chinese; intellectual property; Transformer encoder; information fusion; vector representation; science and technology big data; patent; deep learning
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202203036
- Abstract:
-
Patent text contains abundant entity information, from which the intellectual property (IP) entity information containing key information can be extracted through named entity recognition, which helps researchers understand patent content faster. For the existing named entity extraction method, the semantic information at the word level brought by a change in technical words is difficult to fully use. In this paper, the IP entity information extraction method based on Transformer and technical word information is proposed, which provides exact word vector representation based on the BERT language model. In the process of word vector generation, this method improves the representation ability of IP entities by adding the technical word information extracted by iterated dilated convolution neural network. Finally, the Transformer encoder with relative position coding is used to learn the deep semantic information of the text from the word vector sequence, realizing the prediction of the entity label. Experimental results on public and annotated patent datasets show that this method improves entity recognition accuracy.