[1]邸剑,刘骏华,曹锦纲.利用BERT和覆盖率机制改进的HiNT文本检索模型[J].智能系统学报,2024,19(3):719-727.[doi:10.11992/tis.202201020]
DI Jian,LIU Junhua,CAO Jingang.An improved HiNT text retrieval model using BERT and coverage mechanism[J].CAAI Transactions on Intelligent Systems,2024,19(3):719-727.[doi:10.11992/tis.202201020]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第3期
页码:
719-727
栏目:
学术论文—自然语言处理与理解
出版日期:
2024-05-05
- Title:
-
An improved HiNT text retrieval model using BERT and coverage mechanism
- 作者:
-
邸剑1,2, 刘骏华1,2, 曹锦纲1,2
-
1. 华北电力大学 控制与计算机工程学院, 河北 保定 071003;
2. 复杂能源系统智能计算教育部工程研究中心, 河北 保定 071003
- Author(s):
-
DI Jian1,2, LIU Junhua1,2, CAO Jingang1,2
-
1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Engineering Research Center of Intelligent Computing for Complex Energy Systems, Ministry of Education, Baoding 071003, China
-
- 关键词:
-
基于变换器的双向编码器; 分层神经匹配模型; 覆盖率机制; 文本检索; 语义表示; 特征提取; 自然语言处理; 相似度; 多粒度
- Keywords:
-
bidirectional encoder representations from transformers; hierarchical neural matching model; coverage mechanism; text retrieval; semantic representation; feature extraction; natural language processing; similarity; multigranularity
- 分类号:
-
TP311
- DOI:
-
10.11992/tis.202201020
- 文献标志码:
-
2023-09-27
- 摘要:
-
为有效提升文本语义检索的准确度,本文针对当前文本检索模型衡量查询和文档的相关性时不能很好地解决文本歧义和一词多义等问题,提出一种基于改进的分层神经匹配模型(hierarchical neural matching model,HiNT)。该模型先对文档的各个段提取关键主题词,然后用基于变换器的双向编码器(bidirectional encoder representations from transformers, BERT)模型将其编码为多个稠密的语义向量,再利用引入覆盖率机制的局部匹配层进行处理,使模型可以根据文档的局部段级别粒度和全局文档级别粒度进行相关性计算,提高检索的准确率。本文提出的模型在MS MARCO和webtext2019zh数据集上与多个检索模型进行对比,取得了最优结果,验证了本文提出模型的有效性。
- Abstract:
-
To effectively improve the accuracy of text semantic retrieval, an improved hierarchical neural matching model is proposed, which can solve the problems of text ambiguity and polysemy when using text retrieval models to measure the relevance of queries and documents. The model first extracts key subject words from each segment of the document and then encodes them into multiple dense semantic vectors using the BERT model. Afterward, the local matching layer introduced with the coverage mechanism is used for processing so that the model can calculate the correlation according to the local segment-level granularity and the global document-level granularity of the document and improve the retrieval accuracy. The proposed model is compared with multiple retrieval models on the MS MARCO and webtext2019zh datasets, and the optimal results obtained verify the effectiveness of the proposed model.
备注/Memo
收稿日期:2022-01-13。
基金项目:中央高校基本科研业务费专项(2021MS085).
作者简介:邸剑,高级工程师,主要研究方向为人工智能及应用、物联网技术及应用、大数据与云计算。先后主研、参研科技项目20余项,获省部级科技进步奖2项,获授权发明专利1项。发表学术论文30余篇,参编教材1部。E-mail:dijian6880@163.com;刘骏华,硕士研究生,主要研究方向为深度学习、自然语言处理。E-mail:220192221061@ncepu.edu.cn;曹锦纲,讲师,博士,主要研究方向为图像处理和模式识别。发表学术论文 10 余篇。E-mail:caojg168@126.com
通讯作者:曹锦纲. E-mail:caojg168@126.com
更新日期/Last Update:
1900-01-01