<-上一篇/Previous Article 下一篇/Next Article->

[1]张毅,谢延义,罗元,等.一种语音特征提取中Mel倒谱系数的后处理算法[J].智能系统学报编辑部,2016,11(2):208-215.[doi:10.11992/tis.201511008]
　ZHANG Yi,XIE Yanyi,LUO Yuan,et al.Postprocessing method of MFCC in speech feature extraction[J].CAAI Transactions on Intelligent Systems,2016,11(2):208-215.[doi:10.11992/tis.201511008]

点击复制

一种语音特征提取中Mel倒谱系数的后处理算法

PDF下载 HTML

《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷: 11 期数: 2016年第2期页码: 208-215 栏目: 学术论文—机器感知与模式识别出版日期: 2016-04-25

Title:: Postprocessing method of MFCC in speech feature extraction

作者:: 张毅¹, 谢延义², 罗元³, 席兵³; 1. 重庆邮电大学先进制造工程学院, 重庆 400065;
2. 重庆邮电大学自动化学院, 重庆 400065;
3. 重庆邮电大学光电工程学院, 重庆 400065

Author(s):: ZHANG Yi¹, XIE Yanyi², LUO Yuan³, XI Bing³; 1. Institute of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. College of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
3. College of Opto Electronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

关键词:: 后处理; 语音特征; 语音识别; 噪声; 鲁棒性

Keywords:: postprocessing; phonetic feature; speech recognition; noise; robustness

分类号:: TP391.4

DOI:: 10.11992/tis.201511008

摘要:: 为提高语音识别系统的鲁棒性,本文以Mel频率倒谱系数(MFCC)为基础,结合均值消减法、方差归一化、时间序列滤波法和加权自回归移动平均滤波法,提出了一种后处理算法,本文将该算法命名为MVDA后处理法,所得语音特征参数简称MVDA。本文首先从理论上推导了MVDA后处理法可以去除加性噪声和卷积噪声的干扰,接着针对MVDA与MFCC做了对比试验,并分析了含噪语音与语音信号的欧氏距离变化,证明MVDA后处理法的每一步均有效降低了噪声的干扰,且得出了MVDA在不同噪声环境中均更优的结论。这种简洁的语音特征不仅可以达到许多复杂语音特征处理方法的效果,而且有效减少了自动语音识别系统的计算量。

Abstract:: To improve the robustness of automatic speech recognition systems, a new speech feature postprocessing method based on the Mel-frequency Cepstral Coefficient (MFCC) is proposed, which is named the MVDA postprocessing method. The postprocessed feature parameters are named MVDAs. This technique combines mean subtraction, variance normalization, time sequence fltering, and autoregressive moving average flters. Experiments were conducted to compare MVDA and MFCC. Changes in the Euclidean distance of the speech with noise and the speech signal were analyzed, proving that every step of MVDA postprocessing could effectively reduce the noise interference. Thus, all MVDAs in different noise environments were superior. This simple feature does not only achieve the effect of many complex speech feature processing methods but also effectively reduces the computational complexity of automatic speech recognition systems.

参考文献/References:: [1] PALIWAL K K, BASU A. A speech enhancement method based on Kalman fltering[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Dallas, USA, 1997: 177-180.
[2] GIBSON J D, KOO B, GRAY S D. Filtering of Colored Noise for Speech Enhancement and Coding[J]. IEEE Transactions on Signal Processing, 1991, 39(8): 1732-1742.
[3] ZELINSKI R. A microphone array with adaptive post-filtering for noise reduction in reverberant rooms[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. New York, USA, 1998: 2578-2581.
[4] MYLLYMAKI M, VIRTANEN T. Non-stationary noise model compensation in voice activity detection[C]//Proceedings of IEEE International Conference on Signal Processing Conference. Glasgow, Scotland, 2009: 2186-2190.
[5] RAMFREZ J, SEGURA J C, BENFTEZ C, et al. Efficient voice activity detection algorithms using long-term speech information[J]. Speech communication, 2004, 42(3/4): 271-287.
[6] CHOWDHURY M, SELOUANI S A, O’SHAUGHNESSY D. A soft computing approach to improve the robustness of on-line ASR in previously unseen highly non-stationary acoustic environments[C]//Proceedings of the 11th IEEE International Conference on Information Science, Signal Processing and their Applications. Montreal, Canada, 2012: 522-527.
[7] GUPTA H A, RAJU A, ALWAN A. Non-linear dimension reduction of Gabor features for noise-robust ASR[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy, 2014: 1715-1719.
[8] HANSEN J H L, VARADARAJAN V. Analysis and compensation of lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition[J]. IEEE transactions on audio, speech, and language processing, 2009, 17(2): 366-378.
[9] COOK G, ROBINSON T. Transcribing broadcast news with the 1997 abbot system[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Seattle, USA, 1998: 917-920.
[10] KIM D S, LEE S Y, KIL R M. Auditory processing of speech signals for robust speech recognition in real-world noisy environments[J]. IEEE transactions on speech and audio processing, 1999, 7(1): 55-69.
[11] HAIN T, WOODLAND P C, EVERMANN G, et al. New features in the CU-HTK system for transcription of conversational telephone speech[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, UT, 2001(1): 57-60.
[12] LIN S H, CHEN B, YEH Y M. Exploring the use of speech features and their corresponding distribution characteristics for robust speech recognition[J]. IEEE transactions on audio, speech, and language processing, 2009, 17(1): 84-94.
[13] MORTIA S, UNOKI M, LU Xugang, et al. Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments[C]//Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP). Singapore, 2014: 108-112.
[14] CHANG J E, BAI J Y, ZENG Fangang. Unintelligible low frequency sound enhances simulated cochlear implant speech recognition in noise[J]. IEEe transactions on biomedical engineering, 2006, 53(12): 2598-2601.
[15] BOLL S F. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE transactions on acoustics, speech, and signal processing, 1999, 27(2): 113-120.
[16] MAMMONE R J, ZHANG Xiaoyu, RAMACHANDRAN R P. Robust speaker recognition: a feature-based approach[J]. IEEE signal processing magazine, 1996, 13(5): 58-71.
[17] BOLL S F. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE transactions on acoustics, speech, and signal processing, 1999, 27(2): 113-120.

备注/Memo

收稿日期:2015-11-6;改回日期:。
基金项目:重庆市科委前沿技术专项重点项目(cstc2015jcyjBX0066).
作者简介:张毅,男,1966年生,教授,博士生导师。主要研究方向机器人及应用、数据融合、信息无障碍技术。任重庆邮电大学国家信息无障碍工程研发中心主任,智能系统及机器人实验室主任,发表学术论文多篇;谢延义,男,1989年生,硕士研究生,主要研究方向为语音识别与智能机器人;罗元,女,1972年生,教授,博士,主要研究方向为信号与信息处理、数字图像处理。
通讯作者:谢延义.E-mail:811719530@qq.com.

更新日期/Last Update: 1900-01-01

一种语音特征提取中Mel倒谱系数的后处理算法 PDF下载HTML

备注/Memo

一种语音特征提取中Mel倒谱系数的后处理算法

PDF下载 HTML