<-上一篇/Previous Article 下一篇/Next Article->

[1]罗元,童开国,张毅,等.多个声源下基于人耳听觉特性的语音分离[J].智能系统学报,2012,7(2):121-128.
　LUO Yuan,TONG Kaiguo,ZHANG Yi,et al.Sound source separation of a multi voice environment based on human ear listening properties[J].CAAI Transactions on Intelligent Systems,2012,7(2):121-128.

点击复制

多个声源下基于人耳听觉特性的语音分离

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 7 期数: 2012年第2期页码: 121-128 栏目: 学术论文—机器感知与模式识别出版日期: 2012-04-25

Title:: Sound source separation of a multi voice environment based on human ear listening properties

文章编号:: 1673-4785(2012)02-0121-08

作者:: 罗元，童开国，张毅,邢武超，陈凯，陈红松，何春江，陈君; 重庆邮电大学智能系统及机器人研究所，重庆 400065

Author(s):: LUO Yuan, TONG Kaiguo, ZHANG Yi, XING Wuchao, CHEN Kai,CHEN Hongsong, HE Chunjiang, CHEN Jun; Research Center of Intelligent System and Robot, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

关键词:: 多声源; 人耳听觉特性; 双耳时间差; 双耳水平差; 语音分离

Keywords:: multivoice source environment; human ear listening properties; interaural time difference; interaural level difference; sound source separation

分类号:: TP311

文献标志码:: A

摘要:: 受声学研究启发，结合人脑人耳听觉特性对语音的处理方式，建立了一个完整的模拟听觉中枢系统的语音分离模型．首先利用外周听觉模型对语音信号进行多频谱分析，然后建立重合神经元模型提取语音信号的特征，最后在脑下丘的神经细胞模型中完成对语音的分离．基于现有的语音识别方法，该模型能够很好地解决绝大多数的语音识别方法都只能在单声源和低噪声的环境下使用的问题．实验结果表明，该模型能够实现多声源环境下语音的分离并且具有较高的鲁棒性．随着研究的深入，基于人耳听觉特性的语音分离模型将有很广泛的应用前景．

Abstract:: Inspired by acoustics, an integrated voice separation model simulating the central auditory system was established to process a voice by imitating the listening properties of human ears. First, multispectral analysis of voice signals was carried out by a peripheral auditory model. Next, a coincidence neuron model was established to extract the features of voice signals. Last, the voices were separated in the cell model of the brain inferior colliculus. Compared to the majority of speech recognition models that can only be used in a single sound source and lownoise environment, this model is a good choice. Experimental results show that the model can separate voices in a multisound source environment, thus having a high robustness. With further research, speech separation models based on human ear listening properties will have a wide range of applications. 

参考文献/References:: ［1］OZEROV A, VINCENT E, BIMBOT F. A general modular framework for audio source separation［C］//9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA’10). SaintMalo, France, 2010: 3340.
［2］VINCENT E, BERTIN N, BADEAU R. Harmonic and inharmonic on negative matrix factorization for polyphonic pitch transcription［C］//Proc of IEEE International Conference on Acoustics, Speech, and Signal Processing. Rennes Cedex, France, 2008: 109112.
［3］FITZGERALD D, GAINZA M. Single channel vocal separation using median filtering and factorization techniques［J］. ISAST Transactions on Electronic and Signal Processing, 2010, 4(1): 6273.
［4］赵鹤鸣，葛良，陈雪勤，等. 基于声音定位和听觉掩蔽效应的语音分离研究［J］. 半导体学报， 2005, 33(1): 158160.
ZHAO Heming, GE Liang, CHEN Xueqin, et al. Research based on sound localization and auditory masking effect of voice separation［J］.Journal of Semiconductors, 2005, 33(1): 158160.
［5］LIU Jindong, ERWIN H, WERMTER S. Mobile robot broadband sound localisation using a biologically inspired spiking neural network［C］//Proceedings of IEEE/RSJ Int Conf on Intelligent Robots and Systems in Nice. ［S.l.］, 2008: 21912196.
［6］DURRIEU J L, RICHARD G, DAVID B. An iterative approach to monaural musical mixture desoloing［C］//Proc of IEEE International Conference on Acoustics, Speech, and Signal Processing. Paris, France, 2009: 105108.
［7］KONIARIS C, CHATTERJEE S, KLEIJN W B. Towards effective singing voice extraction from stereophonic recordings［C］//2010 IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP). Hatfield, UK, 2010: 233236.
［8］BROWN G J, FERRY R T, MEDDIS R. A computer model of auditory efferent suppression: implications for the recognition of speech in noise［J］. Acoustical Society of America, 2010, 127(2): 943954. 
［9］DUONG N, VINCENT E, GRIBONVAL R. Spatial covariance models for underdetermined reverberant audio source separation［C］//Applications of Signal Processing to Audio and Acoustics 2009 (WASPAA’09). Rennes, France, 2009: 129132.
［10］DONG Yi, MIHALAS S, NIEBUR E. Improved integral equation solution for the first passage time of leaky integrateandfire neurons［J］. Neural Computation, 2011, 23(2): 421434.
［11］VOUTSAS K, ADAMY J. A biologically inspired spiking neural network for sound source lateralization［J］. IEEE Trans Neural Networks, 2007, 18(6): 17851799.

备注/Memo

收稿日期： 2011-09-28.
基金项目：科技部国际合作资助项目（2010DF12160）；重庆市攻关计划资助项目（CSTC：2010AA2055）.
通信作者：童开国.????????????E-mail：359018647@qq.com.
作者简介：
罗元，女，1972年生，教授，博士．近年来参与和负责了包括科技部国际合作项目、教育部留学回国人员项目、重庆市科研项目等多项国家级、省部级项目．主要研究方向为机器视觉、人机交互、基于图像视频处理的测试．近年来发表学术论文60余篇，其中20余篇被SCI、EI检索，获得国家发明专利3项．
童开国，男，1985年生，硕士研究生，主要研究方向为语音识别与智能机器人，发表学术论文4篇．
张毅,男，1966年生，教授，博士生导师，博士后，近年来承担了科技部国际合作项目、人事部留学人员科技活动项目择优资助重点项目以及重庆市科技攻关项目“轮椅式机器人导航与控制系统研发”课题；国际期刊International Journal of Modelling, Identification and Control、International Journal of Automation and Computing和International Journal of Advanced Mechatronic Systems关于智能系统及机器人专刊的编委．

更新日期/Last Update: 2012-07-12

多个声源下基于人耳听觉特性的语音分离 PDF下载HTML

备注/Memo

多个声源下基于人耳听觉特性的语音分离

PDF下载 HTML