[1]胡峰,杨新瑞,汤成富,等.基于自适应损失函数的句子级远程监督关系抽取[J].智能系统学报,2024,19(3):697-706.[doi:10.11992/tis.202205034]
HU Feng,YANG Xinrui,TANG Chengfu,et al.Sentence-level distant supervision relation extraction based on self-adaptive loss function[J].CAAI Transactions on Intelligent Systems,2024,19(3):697-706.[doi:10.11992/tis.202205034]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第3期
页码:
697-706
栏目:
学术论文—自然语言处理与理解
出版日期:
2024-05-05
- Title:
-
Sentence-level distant supervision relation extraction based on self-adaptive loss function
- 作者:
-
胡峰, 杨新瑞, 汤成富, 邓维斌, 刘群
-
重庆邮电大学 计算智能重庆市重点实验室, 重庆 400065
- Author(s):
-
HU Feng, YANG Xinrui, TANG Chengfu, DENG Weibin, LIU Qun
-
Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
- 关键词:
-
自然语言处理; 信息抽取; 关系抽取; 远程监督; 噪声分离; 噪声标注; 负训练; 自适应损失函数
- Keywords:
-
natural language processing; information extraction; relation extraction; distant supervision; noise separation; noise label; negative training; self-adaptive loss function
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202205034
- 文献标志码:
-
2023-10-09
- 摘要:
-
远程监督关系抽取是一种关系抽取方法,现有方法主要采用多实例学习,在具有相同实体对的样例包上进行关系抽取。但是,包级方法只能缓解却并不能完全解决错误标签问题。基于此,文中首先分析了干净数据和噪声数据的分布,提出了一种新的自适应损失函数;在此基础上,提出了一种基于自适应损失函数的句子级远程监督关系抽取方法。在公开数据集NYT-10以及基于TACRED的合成数据集上的实验结果表明:文中提出的方法优于对比文献中的方法,能够更有效地区分错误标签噪声样例和干净样例,提高了句子级远程监督关系抽取的准确率。
- Abstract:
-
Distant supervision relation extraction is a kind of relation extraction method. The existing methods, which mainly employ multi-instance learning and relation extraction, are conducted in the sample bag that contains the same entity pair. However, the bag-level method can only alleviate but cannot completely solve the problem of wrong labeling. Therefore, herein, the distribution of clean data and noise data is analyzed, proposing a new self-adaptive loss function. On this basis, a method for sentence-level distant supervision relation extraction based on self-adaptive loss function is given. The experimental results obtained on the public dataset NYT-10 and the TACRED-based synthetic dataset show that the proposed method is better than that given in the compared studies. It can distinguish the wrongly labeled noise samples from the clean samples more effectively, improving the accuracy of sentence-level distant supervision relation extraction.
备注/Memo
收稿日期:2022-05-23。
基金项目:国家重点研发计划项目 (2018YFC0832102);重庆市教委重点合作项目(HZ2021008); 重庆市自然科学基金项目 (cstc2021jcyj-msxmX0849).
作者简介:胡峰,博士,教授,主要研究方向为粗糙集、粒计算、数据挖掘。主持和参与国家自然科学基金项目4项,参与科技部重点研发计划项目3项,作为参与者获吴文俊人工智能科学技术奖、重庆市自然科学奖各1项,发表学术论文40余篇。 E-mail: hufeng@cqupt. edu.cn;杨新瑞,硕士研究生,主要研究方向为自然语言处理、信息抽取、数据挖掘。E-mail: 1158737962@qq.com;汤成富,硕士研究生,主要研究方向为计算机视觉、图像识别、数据挖掘。E-mail: tangcfmail@163.com
通讯作者:胡峰. E-mail: hufeng@cqupt.edu.cn
更新日期/Last Update:
1900-01-01