[1]于润羽,李雅文,李昂.融合领域特征的科技学术会议语义相似性计算方法[J].智能系统学报,2022,17(4):737-743.[doi:10.11992/tis.202203050]
YU Runyu,LI Yawen,LI Ang.Semantic similarity computing for scientific and technological conferences[J].CAAI Transactions on Intelligent Systems,2022,17(4):737-743.[doi:10.11992/tis.202203050]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
17
期数:
2022年第4期
页码:
737-743
栏目:
学术论文—自然语言处理与理解
出版日期:
2022-07-05
- Title:
-
Semantic similarity computing for scientific and technological conferences
- 作者:
-
于润羽1, 李雅文2, 李昂1
-
1. 北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876;
2. 北京邮电大学 经济管理学院,北京 100876
- Author(s):
-
YU Runyu1, LI Yawen2, LI Ang1
-
1. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China
-
- 关键词:
-
科技学术会议; 深度学习; 自然语言处理; 语义学习; 知识抽取; 语义相似度; 预训练模型; 孪生网络
- Keywords:
-
science and technological conference; deep learning; natural language processing; semantic learning; knowledge extraction; semantic similarity; pre-training model; siamese network
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202203050
- 摘要:
-
针对目前的语义文本相似度计算方法难以准确估计科技学术会议语义相似性的问题,提出了一种融合领域特征的科技学术会议语义相似度计算方法(siamese-BERT semantic similarity calculation algorithm fused with domain feature, SBFD)。通过实体识别和关键词抽取等方式获取会议的领域特征信息,将其作为特征与会议信息共同输入到基于变换器的双向编码器表示网络(bidirectional encoder representations from transformers,BERT)中,采用孪生网络(Siamese Network)的结构解决BERT的各向异性的问题,并对网络的输出进行池化和标准化,利用余弦相似度计算两个会议之间的相似程度。实验结果表明SBFD方法在不同数据集上都取得了较好的效果,斯皮尔曼相关系数有一定程度的提高。
- Abstract:
-
Aiming at the problem that the current semantic text similarity calculation methods have difficulty in calculating semantic similarity for scientific and technological conference data accurately, a siamese-BERT semantic similarity calculation algorithm for scientific and technological conferences fused with domain features (SBFD) is proposed in this paper. At first, the domain feature information of conference is obtained through entity recognition and keyword extraction, and it is input into the bidirectional encoder representations from transformers (BERT) network as a feature, together with conference information. The structure of the Siamese network is then used to solve the anisotropy problem of BERT. The output of the network is pooled and normalized, and finally the cosine similarity is used to calculate the similarity between the two conferences. Experimental results show that the SBFD algorithm achieves good results on different data sets, with the Spearman’s rank correlation coefficient improved in a certain extent.
备注/Memo
收稿日期:2022-03-24。
基金项目:国家重点研发计划项目(2018YFB1402600);国家自然科学基金项目(61772083,61802028);广西科技重大专项(桂科AA18118054)
作者简介:于润羽,硕士研究生,主要研究方向为深度学习、数据挖掘;李雅文,副教授,主要研究方向为企业创新、人工智能、大数据;李昂,博士研究生,主要研究方向为信息检索、数据挖掘、机器学习
通讯作者:李雅文. E-mail:warmly0716@126.com
更新日期/Last Update:
1900-01-01