[1]方鹏,李贤,汪增福.运用核聚类和偏最小二乘回归的歌唱声音转换[J].智能系统学报编辑部,2016,11(1):55-60.[doi:10.11992/tis.201506022]
FANG Peng,LI Xian,WANG Zengfu.Conversion of singing voice based on kernel clustering and partial least squares regression[J].CAAI Transactions on Intelligent Systems,2016,11(1):55-60.[doi:10.11992/tis.201506022]
点击复制
《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷:
11
期数:
2016年第1期
页码:
55-60
栏目:
学术论文—机器感知与模式识别
出版日期:
2016-02-25
- Title:
-
Conversion of singing voice based on kernel clustering and partial least squares regression
- 作者:
-
方鹏1,2,3, 李贤1,3, 汪增福1,2,3
-
1. 中国科学技术大学信息科学技术学院, 安徽合肥 230027;
2. 中国科学院合肥智能机械研究所, 安徽合肥 230031;
3. 语音及语言信息处理国家工程实验室, 安徽合肥 230027
- Author(s):
-
FANG Peng1,2,3, LI Xian1,3, WANG Zengfu1,2,3
-
1. Department of Automation, University of Science and Technology of China, Hefei 230027, China;
2. Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China;
3. National Engineering Laboratory of Speech and Language Information Processing, Hefei 230027, China
-
- 关键词:
-
计算机视觉; 语音转换; 歌唱声音; 核聚类; 偏最小二乘回归; 高斯混合模型; MLSA
- Keywords:
-
computer vision; voice conversion; singing voice; kernel clustering; partial least squares regression; Gaussian mixture model; Mel log spectrum approximation
- 分类号:
-
TN912;TP37
- DOI:
-
10.11992/tis.201506022
- 摘要:
-
语音转换是计算机听觉领域的热点问题之一,将歌声运用于语音转换是一种新的研究方向,同时拓宽了语音转换的应用范围。经典的高斯混合模型的方法在少量训练数据时会出现过拟合的现象,而且在转换时并未有效利用音乐信息。为此提出一种歌唱声音转换方法以实现少量训练数据时的音色转换,并且利用歌曲的基频信息提高转换歌声的声音质量。该方法使用核聚类和偏最小二乘回归进行训练得到转换函数,采用梅尔对数频谱近似(MLSA)滤波器对源歌唱声音的波形直接进行滤波来获得转换后的歌唱声音,以此提高转换歌声的声音质量。实验结果表明,在少量训练数据时,该方法在相似度和音质方面都有更好的效果,说明在少量训练数据时该方法优于传统的高斯混合模型的方法。
- Abstract:
-
Voice conversion is a popular topic in the field of computer hearing, and the application of singing voices to voice conversion is a relatively new research direction, which widens the application scope of voice conversion. When a training dataset is small, the conventional Gaussian mixture model (GMM) method may cause overfitting and insufficient utilization of music information. In this study, we propose a method for converting the voice timbre of a source singer into that of a target singer and employ fundamental frequency to improve the converted singing voice quality. We use kernel clustering and partial least squares regression to train the dataset, thereby obtaining the conversion function. To improve the converted singing voice quality, we applied the Mel log spectrum approximation (MLSA) filter, which synthesizes the converted singing voice by filtering the source singing waveform. Based on our experiment results, the proposed method demonstrates better voice similarity and quality, and therefore is a better choice than the GMM-based method when the training dataset is small.
备注/Memo
收稿日期:2015-06-11;改回日期:。
基金项目:国家自然科学基金资助项目(61472393,613031350).
作者简介:方鹏,男,1990年生,硕士研究生,主要研究方向为歌唱声音转换;李贤,男,1988年生,博士研究生,主要研究方向为情感语音、语音转换、歌唱合成等;汪增福,男,1960年生,教授、博士生导师,现任《模式识别与人工智能》编委、InternationalJournalofInformationAcquisition副主编。获ACMMultimedia2009最佳论文奖。主要研究方向为计算机视觉、计算机听觉、人机交互和智能机器人等,发表学术论文180余篇。
通讯作者:汪增福.E-mail:zfwang@ustc.edu.cn.
更新日期/Last Update:
1900-01-01