[1]李荣军,郭秀焱,杨静远.面向鲁棒口语理解的声学组块混淆语言模型微调算法[J].智能系统学报,2023,18(1):131-137.[doi:10.11992/tis.202109024]
LI Rongjun,GUO Xiuyan,YANG Jingyuan.A fine-tuning algorithm for acoustic text chunk confusion language model orienting to understand robust spoken language[J].CAAI Transactions on Intelligent Systems,2023,18(1):131-137.[doi:10.11992/tis.202109024]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
131-137
栏目:
学术论文—自然语言处理与理解
出版日期:
2023-01-05
- Title:
-
A fine-tuning algorithm for acoustic text chunk confusion language model orienting to understand robust spoken language
- 作者:
-
李荣军, 郭秀焱, 杨静远
-
华为技术有限公司 AI应用研究中心,广东 深圳 518129
- Author(s):
-
LI Rongjun, GUO Xiuyan, YANG Jingyuan
-
AI Application Research Center, Huawei Technologies Co., Ltd., Shenzhen 518129, China
-
- 关键词:
-
自然语言理解; 口语语言理解; 意图识别; 预训练语言模型; 语音识别; 鲁棒性; 语言模型微调; 深度学习
- Keywords:
-
natural language understanding; spoken language understanding; intent recognition; pre-trained language model; speech recognition; robust; fine-tuning of language model; deep learning
- 分类号:
-
TP18
- DOI:
-
10.11992/tis.202109024
- 摘要:
-
利用预训练语言模型(pre-trained language models,PLM)提取句子的特征表示,在处理下游书面文本的自然语言理解的任务中已经取得了显著的效果。但是,当将其应用于口语语言理解(spoken language understanding,SLU)任务时,由于前端语音识别(automatic speech recognition,ASR)的错误,会导致SLU精度的下降。因此,本文研究如何增强PLM提高SLU模型对ASR错误的鲁棒性。具体来讲,通过比较ASR识别结果和人工转录结果之间的差异,识别出连读和删除的文本组块,通过设置新的预训练任务微调PLM,使发音相近的文本组块产生类似的特征嵌入表示,以达到减轻ASR错误对PLM影响的目的。通过在3个基准数据集上的实验表明,所提出的方法相比之前的方法,精度有较大提升,验证方法的有效性。
- Abstract:
-
Employing the pre-trained language model (PLM) to extract the feature representations of sentences has achieved remarkable results in processing downstream natural language understanding tasks based on texts. However, when applying PLM to spoken language understanding (SLU) tasks, it shows degraded performance resulting from erroneous text from front-end automatic speech recognition (ASR). To address this issue, this paper investigates how to enhance a PLM for better SLU robustness against ASR errors. Specifically, by comparing the differences between ASR recognition and manual transcription results, we identify the concatenated and deleted text chunks. Then, we set up a new pre-training task to fine-tune the PLM to make text chunks with similar pronunciation produce similar feature embedding representations to reduce the influence of ASR errors on PLM. Experiments conducted on three SLU benchmark datasets validate the efficiency of our proposal by showing significant accuracy improvements through comparison with prior arts.
备注/Memo
收稿日期:2021-09-13。
作者简介:李荣军,主任工程师,主要研究方向为人机对话、语音识别;郭秀焱,高级工程师,主要研究方向为知识图谱、人机对话、语音识别;杨静远,高级工程师,主要研究方向为智能问答、任务型对话系统、语音纠错
通讯作者:李荣军.E-mail:lirongjun3@huawei.com
更新日期/Last Update:
1900-01-01