[1]李茜茜,沈晓燕,任福继,等.面向数据增强的多种语音情感分类算法研究[J].智能系统学报,2021,16(1):170-177.[doi:10.11992/tis.202103005]
 LI Qianqian,SHEN Xiaoyan,REN Fuji,et al.Investigation of multiple speech emotion classification algorithms based on data enhancement[J].CAAI Transactions on Intelligent Systems,2021,16(1):170-177.[doi:10.11992/tis.202103005]
点击复制

面向数据增强的多种语音情感分类算法研究(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年1期
页码:
170-177
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2021-01-05

文章信息/Info

Title:
Investigation of multiple speech emotion classification algorithms based on data enhancement
作者:
李茜茜12 沈晓燕1 任福继2 康鑫2
1. 南通大学 信息科学技术学院,江苏 南通 226019;
2. 日本德岛大学 智能信息工学部,日本 德岛 7708501
Author(s):
LI Qianqian12 SHEN Xiaoyan1 REN Fuji2 KANG Xin2
1. Institute of Information Science and Technology, Nantong University, Nantong 226019, China;
2. Department of Intelligent Information Engineering, Tokushima University, Tokushima 7708501, Japan
关键词:
语音情感识别数据增强情感特征支持向量机随机森林K最邻近低级描述特征机器学习
Keywords:
speech emotion recognitiondata enhancementemotion featuresupport vector machinerandom forestK-nearest neighborlow-level description featuresmachine learning
分类号:
TP181
DOI:
10.11992/tis.202103005
摘要:
目前语音情感识别存在语音样本不足、提取的特征数据量大和无关特征多使得识别率不高的问题。针对语音样本不足的情况,在预处理阶段提出了时频域的数据增强方法,对原有的数据库进行扩充;根据传统算法中提取的特征数据量大且与情感无关的特征多的现状,提取了1 582维的情感特征和10组低级描述特征。分别在支持向量机、随机森林和K最邻近3种机器学习算法上做了对比实验。实验证明:支持向量机的平均识别率比较好。在所提取的10组特征组中,LogMelFreqBand特征在3种算法上的精确度分别为74.63%、64.93%和66.42%;而pcm_fftMag_mfcc特征的精确度分别为84.33%、73.13%和58.21%。
Abstract:
Currently, problems in speech emotion recognition, such as insufficient speech samples and numerous extracted and irrelevant features, make the recognition rate low. To solve the problem of insufficient speech samples, a time-frequency domain data enhancement method is proposed in the preprocessing stage to expand the original database. Considering the current situation where traditional algorithms extract a large amount of feature data and many are emotion-independent, 1582-dimensional emotion features and 10 groups of low-level description features were extracted. Finally, a comparative experiment was performed on three machine learning algorithms: the support vector machine, random forest, and K-nearest neighbor. Experiments showed that the average recognition rate of the support vector machine was superior. Among the ten sets of features, the accuracy of LogMelFreqBand in the three algorithms was 74.63%, 64.93%, and 66.42%, respectively, and the accuracy of pcm_fftMag_mfcc was 84.33%, 73.13%, and 58.21%, respectively.

参考文献/References:

[1] 吴雪, 宋晓茹, 高嵩, 等. 基于数据增强的卷积神经网络火灾识别[J]. 科学技术与工程, 2020, 20(3):1113-1117.
WU Xue, SONG Xiaoru, GAO Song, et al. Convolution neural network based on data enhancement for fire identification[J]. Science technology and engineering, 2020, 20(3):1113-1117.
[2] CHATZIAGAPI A, PARASKEVOPOULOS G, SGOUROPOULOS D, et al. Data augmentation using GANs for speech emotion recognition[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association. Graz, Austria, 2019.
[3] ESCUDERO J P, NOVOA J, MAHU R, et al. An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition[J]. arXiv:1803.09016, 2018.
[4] REN Fuji, MATSUMOTO K. Semi-automatic creation of youth slang corpus and its application to affective computing[J]. IEEE transactions on affective computing, 2016, 7(2):176-189.
[5] KAWASE T, NIWA K, HIOKA Y, et al. Automatic parameter switching of noise reduction for speech recognition[J]. Journal of signal processing, 2017, 21(2):63-71.
[6] YOUSEFI H, KANI A T, KANI I M, et al. Wavelet-based iterative data enhancement for implementation in purification of modal frequency for extremely noisy ambient vibration tests in Shiraz-lran[J]. Frontiers of structural and civil engineering, 2020, 14(2):446-472.
[7] ELBAROUGY R, AKAGI M. Feature selection method for real-time speech emotion recognition[C]//Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. Seoul, Korea, 2017:1-6.
[8] MANGALAM K, GUHA T. Learning spontaneity to improve emotion recognition in speech[C]//Proceedings of the 19th Annual Conference of the International Speech Communication Association. Hyderabad, India, 2018.
[9] CUBUK E D, ZOPH B, MANé D, et al. AutoAugment:learning augmentation strategies from data[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA, 2019.
[10] CHEN Mingyi, HE Xuanji, YANG Jing, et al. 3-D convolutional recurrent neural networks with atten-tion model for speech emotion recognition[J]. IEEE signal processing letters, 2018, 25(10):1440-1444.
[11] SCHULLER B W, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan, 2010:1342-6230.
[12] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh, PA, USA, 1992:144-152.
[13] VAPNIK V N. The nature of statistical learning theory[M]. New York:Springer-Verlag, 1995.
[14] VAPNIK V N. An overview of statistical learning the-ory[J]. IEEE transactions on neural networks, 1999, 10(5):988-999.
[15] 戴志诚, 李小年, 陈增照, 等. 基于KNN算法的可变权值室内指纹定位算法[J]. 计算机工程, 2019, 45(6):310-314.DAI Zhicheng, LI Xiaonian, CHEN Zengzhao, et al. Variable-weight indoor fingerprinting localization algorithm based on KNN algorithm[J]. Computer engineering, 2019, 45(6):310-314.
[16] LARIJANI M R, ASLI-ARDEH E A, KOZEGAR E, et al. Evaluation of image processing technique in identifying rice blast disease in field conditions based on KNN algorithm improvement by K-means[J]. Food science & nutrition, 2019, 7(12):3922-3930.
[17] SATHISHKUMAR R, KALAIARASAN K, PRABHAKARAN A, et al. Detection of lung cancer using SVM classifier and KNN algorithm[C]//Proceedings of 2019 IEEE International Conference on System, Computation, Automation and Networking. Pondicherry, India, 2019.
[18] 连天友, 余勤. 改进KNN算法对人体身份的识别[J]. 计算机工程与应用, 2019, 55(11):142-146, 243.LIAN Tianyou, YU Qin. Human identity recognition using improved KNN method[J]. Computer engineering and applications, 2019, 55(11):142-146, 243.
[19] PAUL A, MUKHERJEE D P, DAS P, et al. Improved random forest for classification[J]. IEEE transactions on image processing, 2018, 27(8):4012-4024.
[20] YE?ILKANAT C M. Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm[J]. Chaos, solitons & fractals, 2020, 140:1-8.
[21] BUTT A M, BHATTI Y K, HUSSAIN F. Emotional speech recognition using SMILE features and random forest tree[M]. BI Yaxin, BHATIA R, KAPOOR S. Intelligent Systems and Applications. Cham:Springer, 2020.
[22] DAI Jingzhao, ZHANG Yaan, HOU Jintao, et al. Sparse wavelet decomposition and filter banks with CNN deep learning for speech recognition[C]//Proceedings of 2019 IEEE International Conference on Electro Information Technology. Brookings, SD, USA, 2019.

相似文献/References:

[1]何锐波,狄岚,梁久祯.一种改进的深度学习的道路交通标识识别算法[J].智能系统学报,2020,15(6):1121.[doi:10.11992/tis.201811009]
 HE Ruibo,DI Lan,LIANG Jiuzhen.An improved deep learning algorithm for road traffic identification[J].CAAI Transactions on Intelligent Systems,2020,15(1):1121.[doi:10.11992/tis.201811009]
[2]王德文,魏波涛.基于孪生变分自编码器的小样本图像分类方法[J].智能系统学报,2021,16(2):254.[doi:10.11992/tis.201906022]
 WANG Dewen,WEI Botao.A small-sample image classification method based on a Siamese variational auto-encoder[J].CAAI Transactions on Intelligent Systems,2021,16(1):254.[doi:10.11992/tis.201906022]
[3]陈立潮,闫耀东,张睿,等.融合迁移学习的AlexNet神经网络不锈钢焊缝缺陷分类[J].智能系统学报,2021,16(3):537.[doi:10.11992/tis.202005013]
 CHEN Lichao,YAN Yaodong,ZHANG Rui,et al.Welding defect classification of stainless steel based on AlexNet neural network combined with transfer learning[J].CAAI Transactions on Intelligent Systems,2021,16(1):537.[doi:10.11992/tis.202005013]

备注/Memo

备注/Memo:
收稿日期:2021-03-16。
基金项目:国家自然科学基金项目(61534003,81371663);德岛大学研究集群项目(2003002)
作者简介:李茜茜,硕士研究生,主要研究方向为语音情感识别和特征处理;沈晓燕,教授,博士,江苏省“六大人才高峰”高层次人才培养目标、南通市“226”工程二级中青年科技领军人才、南通市康复医学会康复教育专业委员会委员、南通大学信息科学与技术学院信息与通信工程专业医学信息技术学科带头人,主要研究方向为生物神经接口技术、神经信号检测电路和功能电激励电路设计、神经信号和肌电信号采集技术与分析、神经信号再生和功能重建。发表学术论文40余篇;任福继,教授,博士,日本工程院院士和欧盟科学院院士,中国人工智能学会名誉副理事长,日本工学会、IEICE、CAAI Fellow,日本国际先进信息研究所主席,获吴文俊人工智能科学技术奖创新一等奖等,主要研究方向为人工智能、情感计算、自然言语理解、模式识别。申请发明专利 10 余项。发表学术论文 500 余篇
通讯作者:沈晓燕. E-mail:xiaoyansho@ntu.edu.cn
更新日期/Last Update: 2021-02-25