[1]李荣军,郭秀焱,杨静远.面向鲁棒口语理解的声学组块混淆语言模型微调算法[J].智能系统学报,2023,18(1):131-137.[doi:10.11992/tis.202109024]
 LI Rongjun,GUO Xiuyan,YANG Jingyuan.A fine-tuning algorithm for acoustic text chunk confusion language model orienting to understand robust spoken language[J].CAAI Transactions on Intelligent Systems,2023,18(1):131-137.[doi:10.11992/tis.202109024]
点击复制

面向鲁棒口语理解的声学组块混淆语言模型微调算法

参考文献/References:
[1] 程高峰, 颜永红. 多语言语音识别声学模型建模方法最新进展[J]. 计算机科学, 2022, 49(1): 47–52
CHENG Gaofeng, YAN Yonghong. Latest development of multilingual speech recognition acoustic model modeling methods[J]. Computer science, 2022, 49(1): 47–52
[2] 赵宁, 徐俊利, 徐洋航, 等. 客户来电意图识别研究[J]. 中文信息学报, 2021, 35(3): 125–133
ZHAO Ning, XU Junli, XU Yanghang, et al. Intention detection of customer’s call[J]. Journal of Chinese information processing, 2021, 35(3): 125–133
[3] 吕坤儒, 吴春国, 梁艳春, 等. 融合语言模型的端到端中文语音识别算法[J]. 电子学报, 2021, 49(11): 2177–2185
LYU Kunru, WU Chunguo, LIANG Yanchun, et al. An end-to-end Chinese speech recognition algorithm integrating language model[J]. Acta electronica sinica, 2021, 49(11): 2177–2185
[4] 徐扬, 王建成, 刘启元, 等. 基于上下文信息的口语意图检测方法[J]. 计算机科学, 2020, 47(1): 205–211
XU Yang, WANG Jiancheng, LIU Qiyuan, et al. Intention detection in spoken language based on context information[J]. Computer science, 2020, 47(1): 205–211
[5] 李蕾, 周延泉, 钟义信. 基于语用的自然语言处理研究与应用初探[J]. 智能系统学报, 2006, 1(2): 1–6
LI Lei, ZHOU Yanquan, ZHONG Yixin. Pragmatic information based NLP research and application[J]. CAAI transactions on intelligent systems, 2006, 1(2): 1–6
[6] SERDYUK D, WANG Yongqiang, FUEGEN C, et al. Towards end-to-end spoken language understanding[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary: IEEE, 2018: 5754?5758.
[7] HAGHANI P, NARAYANAN A, BACCHIANI M, et al. From audio to semantics: approaches to end-to-end spoken language understanding[C]//2018 IEEE Spoken Language Technology Workshop. Athens: IEEE, 2018: 720?726.
[8] LUGOSCH L, RAVANELLI M, IGNOTO P, et al. Speech model pre-training for end-to-end spoken language understanding[C]//20th Annual Conference of the International Speech Communication Association. Graz: ISCA, 2019: 814?818.
[9] HUANG Yinghui, KUO H K, THOMAS S, et al. Leveraging unpaired text data for training end-to-end speech-to-intent systems[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 7984-7988.
[10] KUO H K J, TüSKE Z, THOMAS S, et al. End-to-end spoken language understanding without full transcripts[C]//21st Annual Conference of the International Speech Communication Association, Shanghai: ISCA, 2020: 906?910.
[11] SUNDARARAMAN M N, KUMAR A, VEPA J. Phoneme-BERT: joint language modelling of phoneme sequence and ASR transcript[EB/OL]. (2021?02?01)[2022?09?12].https://arxiv.org/abs/2102.00804.
[12] ?VEC J, ?MíDL L, IRCING P. Hierarchical discriminative model for spoken language understanding[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 8322?8326.
[13] ?VEC J, CHYLEK A, ?MíDL L, et al. A study of different weighting schemes for spoken language understanding based on convolutional neural networks[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai: IEEE, 2016: 6065?6069.
[14] LADHAK F, GANDHE A, DREYER M, et al. LatticeRnn: recurrent neural networks over lattices[C]//17th Annual Conference of the International Speech Communication Association. San Francisco: ISCA, 2016: 695?699.
[15] HUANG Chaowei, CHEN Yunnung. Learning spoken language representations with neural lattice language modeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 3764?3769.
[16] WENG Yue, MIRYALA S S, KHATRI C, et al. Joint contextual modeling for ASR correction and language understanding[C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 6349?6353.
[17] MASUMURA R, IJIMA Y, ASAMI T, et al. Neural confnet classification: fully neural network based spoken utterance classification using word confusion networks[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary: IEEE, 2018: 6039?6043.
[18] HUANG Chaowei, CHEN Yunnung. Learning asr-robust contextualized embeddings for spoken language understanding[C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 8009?8013.
[19] NAMAZIFAR M, TUR G, HAKKANI-TüR D. Warped language models for noise robust language understanding[C]//2021 IEEE Spoken Language Technology Workshop. Shenzhen: IEEE, 2021: 981?988.
[20] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: Association for Computational Linguistics, 2018: 328?339.
[21] DEVLIN J, CHANG Mingwei, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: ACL, 2019: 4171-4186.
[22] COUCKE A, SAADE A, BALL A, et al. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces[EB/OL]. (2018?05?25)[2022?09?12].https://arxiv.org/abs/1805.10190.
[23] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit[C]//IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
[24] HEMPHILL C T, GODFREY J J, DODDINGTON G R, et al. The ATIS spoken language systems pilot corpus[C]//Proceedings of the workshop on Speech and Natural Language-HLT’90. Hidden Valley: Association for Computational Linguistics, 1990: 96-101.
[25] GUPTA S, SHAH R, MOHIT M, et al. Semantic parsing for task oriented dialog using hierarchical representations[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 2787?2792.
[26] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[C]//7th International Conference on Learning Representations. New Orleans LA: ICLR, 2019.
[27] VAN DER MAATEN L, GEOFFREY H. Visualizing data using t-SNE[J]. Journal of machine learning research, 2008, 9: 2579–2605.
相似文献/References:
[1]朱?? 倩,程显毅,韩? 飞.汉语句子语义三维表示模型[J].智能系统学报,2009,4(2):122.
 ZHU Qian,CHENG Xian-yi,HAN Fei.A threedimensional representative model of Chinese sentence semantics[J].CAAI Transactions on Intelligent Systems,2009,4():122.
[2]毛莉娜,李卫华.利用智能引导和KDML增强可拓模型人机建模能力研究[J].智能系统学报,2017,12(3):348.[doi:10.11992/tis.201610017]
 MAO Lina,LI Weihua.Research on enhancing the human-machine modeling ability for an extension model using the intelligent guide and KDML[J].CAAI Transactions on Intelligent Systems,2017,12():348.[doi:10.11992/tis.201610017]

备注/Memo

收稿日期:2021-09-13。
作者简介:李荣军,主任工程师,主要研究方向为人机对话、语音识别;郭秀焱,高级工程师,主要研究方向为知识图谱、人机对话、语音识别;杨静远,高级工程师,主要研究方向为智能问答、任务型对话系统、语音纠错
通讯作者:李荣军.E-mail:lirongjun3@huawei.com

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com