[1]LI Rongjun,GUO Xiuyan,YANG Jingyuan.A fine-tuning algorithm for acoustic text chunk confusion language model orienting to understand robust spoken language[J].CAAI Transactions on Intelligent Systems,2023,18(1):131-137.[doi:10.11992/tis.202109024]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
18
Number of periods:
2023 1
Page number:
131-137
Column:
学术论文—自然语言处理与理解
Public date:
2023-01-05
- Title:
-
A fine-tuning algorithm for acoustic text chunk confusion language model orienting to understand robust spoken language
- Author(s):
-
LI Rongjun; GUO Xiuyan; YANG Jingyuan
-
AI Application Research Center, Huawei Technologies Co., Ltd., Shenzhen 518129, China
-
- Keywords:
-
natural language understanding; spoken language understanding; intent recognition; pre-trained language model; speech recognition; robust; fine-tuning of language model; deep learning
- CLC:
-
TP18
- DOI:
-
10.11992/tis.202109024
- Abstract:
-
Employing the pre-trained language model (PLM) to extract the feature representations of sentences has achieved remarkable results in processing downstream natural language understanding tasks based on texts. However, when applying PLM to spoken language understanding (SLU) tasks, it shows degraded performance resulting from erroneous text from front-end automatic speech recognition (ASR). To address this issue, this paper investigates how to enhance a PLM for better SLU robustness against ASR errors. Specifically, by comparing the differences between ASR recognition and manual transcription results, we identify the concatenated and deleted text chunks. Then, we set up a new pre-training task to fine-tune the PLM to make text chunks with similar pronunciation produce similar feature embedding representations to reduce the influence of ASR errors on PLM. Experiments conducted on three SLU benchmark datasets validate the efficiency of our proposal by showing significant accuracy improvements through comparison with prior arts.