<-上一篇/Previous Article 下一篇/Next Article->

[1]张少白,刘欣.基于DIVA模型的语音-映射单元自动获取[J].智能系统学报,2013,8(4):305-311.[doi:10.3969/j.issn.1673-4785.201304049]
　ZHANG Shaobai,LIU Xin.Automatic acquisition of speech sound-target cells based on DIVA model[J].CAAI Transactions on Intelligent Systems,2013,8(4):305-311.[doi:10.3969/j.issn.1673-4785.201304049]

点击复制

基于DIVA模型的语音-映射单元自动获取

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 8 期数: 2013年第4期页码: 305-311 栏目: 学术论文—机器感知与模式识别出版日期: 2013-08-25

Title:: Automatic acquisition of speech sound-target cells based on DIVA model

文章编号:: 1673-4785(2013)04-0305-07

作者:: 张少白，刘欣; 南京邮电大学计算机学院，江苏南京 210046

Author(s):: ZHANG Shaobai， LIU Xin; College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210046, China

关键词:: DIVA模型; 音素; 语音-映射单元; 语音生成与获取

Keywords:: DIVA model; phoneme; speech sound-target cells; speech acquisition and production

分类号:: TP31

DOI:: 10.3969/j.issn.1673-4785.201304049

文献标志码:: A

摘要:: 针对DIVA模型中存在的“感知能力与语音生成技巧发育不平衡”问题，提出了一种自动获取语音-映射单元的方法.该方法将人耳模拟为一个具有不同带宽的并联带通滤波器组，分别与模型中21维度的听觉存储空间相关联，对不同听觉的不同反应，分别考虑其频带的屏蔽效应、听觉响度与频率的关系.在读取语音输入信号的过程中，模型能较好地获得初始听觉表示，其方式与婴儿咿呀学语的过程基本一致.仿真实验表明，通过边界定义、相似性比较以及搜索更新等步骤，此方法能很好地进行初始输入模式的自组织匹配，并最终使DIVA模型更具语音获取的自然特性.

Abstract:: Contraposing the shortage of Directions Into Velocities of Articulators (DIVA) model about “infants perceptual abilities do develop faster at first than their speech production skills”, the paper presents an automatic acquisition method of speech sound-target cells. The method simulates the human ear as a parallel band-pass filter group with different bandwidth and associates respectively; the filter with the 21-dimensional storage space of auditory sense in DIVA model. This method was done in order for different auditory reactions, the shielding effect of frequency band, sound loudness, and frequency relation could be considered respectively for this study. In the process of reading the input signal of speech, the model can acquire good initial hearing and the process is consistent with baby’s babble. The simulation results show that through boundary definition, similarity comparison, searching and updates and so on, the method has nicer self-organized pattern matching effect for initial input, which makes the DIVA model a more natural characteristic regarding speech acquisition.

参考文献/References:: ［1］GUENTHER F H, BRUMBERG J S, WRIGHT E J, et al. Wireless brain-machine interface for real-time speech synthesis［J］. PLoS ONE, 2009, 4 (12): 8218.
［2］BRUMBERG J S, NIETO-CASTANON A, KENNEDY P R，et al. Brain-computer interfaces for speech communication［J］. Speech Communication, 2010, 52 (4): 367-379.
［3］TOURVILLE J T, GUENTHER F H. The DIVA model: a neural theory of speech acquisition and production［J］. Language and Cognitive Processes, 2011, 25(7): 952-981.
［4］GUENTHER F H, VLADUSICH T. A neural theory of speech acquisition and production［J］. Journal of Neurolinguistics, 2012, 25(5): 408-422.
［5］GUENTHER F H. A neural network model of speech acquisition and motor equivalent speech production［J］. Biological Cybernetics, 1994, 72(1): 43-53.
［6］GHOSH S S. Understanding cortical contributions to speech production through modeling and functional imaging［D］. Boston, USA: Boston University, 2005: 1-36.
［7］GUENTHER F H, GHOSH S S. A neural model of speech production［C］//Proceedings of the 6th International Seminar on Speech Production. Sydney, Australia, 2003: 85-90.
［8］TOURVILLE J A, REILLY K J. Neural mechanisms underlying auditory feedback control of speech［J］. NeuroImage, 2008, 39 (3): 1429-1443.
［9］MAX L, GHOSH S S. Unstable or insufficiently activated internal models and feedback-biased motor control as sources of dysfluency: a theoretical model of stuttering［J］. Contemporary Issues in Communication Science and Disorders, 2004, 31: 105-122.
［10］CIVIER O, GUENTHER F H. Simulations of feedback and feedforward control in stuttering［C］//Proceedings of the 7th Oxford Dysfluency Conference. Oxford， UK， 2005: 1-7.
［11］NIETO-CASTANON A, PERKELL J S, CURTIN H D. A modeling investigation of articulatory variability and acoustic stability during American English /r/ production［J］. Journal of the Acoustical Society of America, 2005, 117 (5): 3196-3212.
［12］GUENTHER F H. Cortical interactions underlying the production of speech sounds［J］. Journal of Communication Disorders, 2006, 39(5): 350-365.
［13］GUENTHER F H, GHOSH S S, TOURVILLE J A. Neural modeling and imaging of the cortical interactions underlying syllable production［J］. Brain and Language, 2006, 96 (3): 280-301.
［14］ZHANG Shaobai, XU Lei, CHENG Xiefeng. Research on classification method of speech signal based on DIVA model［J］. International Review on Computers and Software, 2012, 7 (6): 108-113.
［15］ZHANG Shaobai, HUANG Dandan. Electroencephalography feature extraction using high time frequency resolution analysis［J］. Indonesian Journal of Electrical Engineering， 2012, 10(6): 1415-1421.
［16］ZHANG Shaobai，HAN Yanbin，LI Jinping，et al. Research on improved mean shift algorithm based on local distribution in EEG signal classification［J］. Journal of Artificial Intelligence Research， 2012, 3(3): 117-122.
［17］ZHANG Shaobai, CHENG Weiqing. An application of cerebellar control model for prehension movements［J］. Neural Computing and Application, DOI:10.1007/s 00521-012-1335-1.
［18］SMITH R. Speech production2: models of speech production, foundations of speech communication［EB/OL］.［2013-02-24］. http://www.ling.cam.ac.uk/li9/L4_0910_SpeechProduction2.pdf.
［19］LACERDA F, KLINTFORS E, GUSTAVSSON L. Multisensory information as an improvement for communication systems’ efficiency［C］//Proceedings from Fonetik 2005. Gothenburg, Sweden, 2005: 83-86.
［20］CHEN Y, WENG J. Developmental learning: a case study in understanding “object permanence”［C］//Proceedings of Fourth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund, Sweden, 2004: 35-42.
［21］KUHL P K, WILLIAMS K A, LACERDA F, et al. Linguistic experience alters phonetic perception in infants by 6 months of age［J］. Science, 1992(255): 606-608.

备注/Memo

收稿日期：2013-04-16.??? 网络出版日期：2013-06-21. 
基金项目：国家自然科学基金资助项目(61073115，61271334,61373065).
通信作者：张少白. E-mail：adzsb@163.com.
作者简介：
张少白，男，1953年生，主要研究方向为智能系统与模式识别.主持国家级项目多项.发表学术论文多篇.
刘欣，男，1987年生，主要研究方向为模式识别与智能系统.

更新日期/Last Update: 2013-09-25

基于DIVA模型的语音-映射单元自动获取 PDF下载HTML

备注/Memo

基于DIVA模型的语音-映射单元自动获取

PDF下载 HTML