<-上一篇/Previous Article 下一篇/Next Article->

[1]李茜茜,沈晓燕,任福继,等.面向数据增强的多种语音情感分类算法研究[J].智能系统学报,2021,16(1):170-177.[doi:10.11992/tis.202103005]
　LI Qianqian,SHEN Xiaoyan,REN Fuji,et al.Investigation of multiple speech emotion classification algorithms based on data enhancement[J].CAAI Transactions on Intelligent Systems,2021,16(1):170-177.[doi:10.11992/tis.202103005]

点击复制

面向数据增强的多种语音情感分类算法研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 16 期数: 2021年第1期页码: 170-177 栏目: 吴文俊人工智能科学技术奖论坛出版日期: 2021-01-05

Title:: Investigation of multiple speech emotion classification algorithms based on data enhancement

作者:: 李茜茜^1,2, 沈晓燕¹, 任福继², 康鑫²; 1. 南通大学信息科学技术学院，江苏南通 226019;
2. 日本德岛大学智能信息工学部，日本德岛 7708501

Author(s):: LI Qianqian^1,2, SHEN Xiaoyan¹, REN Fuji², KANG Xin²; 1. Institute of Information Science and Technology, Nantong University, Nantong 226019, China;
2. Department of Intelligent Information Engineering, Tokushima University, Tokushima 7708501, Japan

关键词:: 语音情感识别; 数据增强; 情感特征; 支持向量机; 随机森林; K最邻近; 低级描述特征; 机器学习

Keywords:: speech emotion recognition; data enhancement; emotion feature; support vector machine; random forest; K-nearest neighbor; low-level description features; machine learning

分类号:: TP181

DOI:: 10.11992/tis.202103005

摘要:: 目前语音情感识别存在语音样本不足、提取的特征数据量大和无关特征多使得识别率不高的问题。针对语音样本不足的情况，在预处理阶段提出了时频域的数据增强方法，对原有的数据库进行扩充；根据传统算法中提取的特征数据量大且与情感无关的特征多的现状，提取了1 582维的情感特征和10组低级描述特征。分别在支持向量机、随机森林和K最邻近3种机器学习算法上做了对比实验。实验证明：支持向量机的平均识别率比较好。在所提取的10组特征组中，LogMelFreqBand特征在3种算法上的精确度分别为74.63%、64.93%和66.42%；而pcm_fftMag_mfcc特征的精确度分别为84.33%、73.13%和58.21%。

Abstract:: Currently, problems in speech emotion recognition, such as insufficient speech samples and numerous extracted and irrelevant features, make the recognition rate low. To solve the problem of insufficient speech samples, a time-frequency domain data enhancement method is proposed in the preprocessing stage to expand the original database. Considering the current situation where traditional algorithms extract a large amount of feature data and many are emotion-independent, 1582-dimensional emotion features and 10 groups of low-level description features were extracted. Finally, a comparative experiment was performed on three machine learning algorithms: the support vector machine, random forest, and K-nearest neighbor. Experiments showed that the average recognition rate of the support vector machine was superior. Among the ten sets of features, the accuracy of LogMelFreqBand in the three algorithms was 74.63%, 64.93%, and 66.42%, respectively, and the accuracy of pcm_fftMag_mfcc was 84.33%, 73.13%, and 58.21%, respectively.

参考文献/References:: [1] 吴雪, 宋晓茹, 高嵩, 等. 基于数据增强的卷积神经网络火灾识别[J]. 科学技术与工程, 2020, 20(3):1113-1117.
WU Xue, SONG Xiaoru, GAO Song, et al. Convolution neural network based on data enhancement for fire identification[J]. Science technology and engineering, 2020, 20(3):1113-1117.
[2] CHATZIAGAPI A, PARASKEVOPOULOS G, SGOUROPOULOS D, et al. Data augmentation using GANs for speech emotion recognition[C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association. Graz, Austria, 2019.
[3] ESCUDERO J P, NOVOA J, MAHU R, et al. An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition[J]. arXiv:1803.09016, 2018.
[4] REN Fuji, MATSUMOTO K. Semi-automatic creation of youth slang corpus and its application to affective computing[J]. IEEE transactions on affective computing, 2016, 7(2):176-189.
[5] KAWASE T, NIWA K, HIOKA Y, et al. Automatic parameter switching of noise reduction for speech recognition[J]. Journal of signal processing, 2017, 21(2):63-71.
[6] YOUSEFI H, KANI A T, KANI I M, et al. Wavelet-based iterative data enhancement for implementation in purification of modal frequency for extremely noisy ambient vibration tests in Shiraz-lran[J]. Frontiers of structural and civil engineering, 2020, 14(2):446-472.
[7] ELBAROUGY R, AKAGI M. Feature selection method for real-time speech emotion recognition[C]//Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. Seoul, Korea, 2017:1-6.
[8] MANGALAM K, GUHA T. Learning spontaneity to improve emotion recognition in speech[C]//Proceedings of the 19th Annual Conference of the International Speech Communication Association. Hyderabad, India, 2018.
[9] CUBUK E D, ZOPH B, MANé D, et al. AutoAugment:learning augmentation strategies from data[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA, 2019.
[10] CHEN Mingyi, HE Xuanji, YANG Jing, et al. 3-D convolutional recurrent neural networks with atten-tion model for speech emotion recognition[J]. IEEE signal processing letters, 2018, 25(10):1440-1444.
[11] SCHULLER B W, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan, 2010:1342-6230.
[12] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh, PA, USA, 1992:144-152.
[13] VAPNIK V N. The nature of statistical learning theory[M]. New York:Springer-Verlag, 1995.
[14] VAPNIK V N. An overview of statistical learning the-ory[J]. IEEE transactions on neural networks, 1999, 10(5):988-999.
[15] 戴志诚, 李小年, 陈增照, 等. 基于KNN算法的可变权值室内指纹定位算法[J]. 计算机工程, 2019, 45(6):310-314.DAI Zhicheng, LI Xiaonian, CHEN Zengzhao, et al. Variable-weight indoor fingerprinting localization algorithm based on KNN algorithm[J]. Computer engineering, 2019, 45(6):310-314.
[16] LARIJANI M R, ASLI-ARDEH E A, KOZEGAR E, et al. Evaluation of image processing technique in identifying rice blast disease in field conditions based on KNN algorithm improvement by K-means[J]. Food science & nutrition, 2019, 7(12):3922-3930.
[17] SATHISHKUMAR R, KALAIARASAN K, PRABHAKARAN A, et al. Detection of lung cancer using SVM classifier and KNN algorithm[C]//Proceedings of 2019 IEEE International Conference on System, Computation, Automation and Networking. Pondicherry, India, 2019.
[18] 连天友, 余勤. 改进KNN算法对人体身份的识别[J]. 计算机工程与应用, 2019, 55(11):142-146, 243.LIAN Tianyou, YU Qin. Human identity recognition using improved KNN method[J]. Computer engineering and applications, 2019, 55(11):142-146, 243.
[19] PAUL A, MUKHERJEE D P, DAS P, et al. Improved random forest for classification[J]. IEEE transactions on image processing, 2018, 27(8):4012-4024.
[20] YE?ILKANAT C M. Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm[J]. Chaos, solitons & fractals, 2020, 140:1-8.
[21] BUTT A M, BHATTI Y K, HUSSAIN F. Emotional speech recognition using SMILE features and random forest tree[M]. BI Yaxin, BHATIA R, KAPOOR S. Intelligent Systems and Applications. Cham:Springer, 2020.
[22] DAI Jingzhao, ZHANG Yaan, HOU Jintao, et al. Sparse wavelet decomposition and filter banks with CNN deep learning for speech recognition[C]//Proceedings of 2019 IEEE International Conference on Electro Information Technology. Brookings, SD, USA, 2019.

相似文献/References:: [1]何锐波,狄岚,梁久祯.一种改进的深度学习的道路交通标识识别算法[J].智能系统学报,2020,15(6):1121.[doi:10.11992/tis.201811009]
　HE Ruibo,DI Lan,LIANG Jiuzhen.An improved deep learning algorithm for road traffic identification[J].CAAI Transactions on Intelligent Systems,2020,15():1121.[doi:10.11992/tis.201811009]
[2]王德文,魏波涛.基于孪生变分自编码器的小样本图像分类方法[J].智能系统学报,2021,16(2):254.[doi:10.11992/tis.201906022]
　WANG Dewen,WEI Botao.A small-sample image classification method based on a Siamese variational auto-encoder[J].CAAI Transactions on Intelligent Systems,2021,16():254.[doi:10.11992/tis.201906022]
[3]陈立潮,闫耀东,张睿,等.融合迁移学习的AlexNet神经网络不锈钢焊缝缺陷分类[J].智能系统学报,2021,16(3):537.[doi:10.11992/tis.202005013]
　CHEN Lichao,YAN Yaodong,ZHANG Rui,et al.Welding defect classification of stainless steel based on AlexNet neural network combined with transfer learning[J].CAAI Transactions on Intelligent Systems,2021,16():537.[doi:10.11992/tis.202005013]
[4]宋思雨,苗夺谦.基于多粒度空间混乱的细粒度图像分类算法[J].智能系统学报,2022,17(1):144.[doi:10.11992/tis.202105040]
　SONG Siyu,MIAO Duoqian.Fine-grained image classification algorithm based on multi-granularity regions shuffle[J].CAAI Transactions on Intelligent Systems,2022,17():144.[doi:10.11992/tis.202105040]
[5]陈立潮,朝昕,潘理虎,等.基于部件关注DenseNet的细粒度车型识别[J].智能系统学报,2022,17(2):402.[doi:10.11992/tis.202012012]
　CHEN Lichao,CHAO Xin,PAN Lihu,et al.Fine-grained vehicle-type identification based on partially-focused DenseNet[J].CAAI Transactions on Intelligent Systems,2022,17():402.[doi:10.11992/tis.202012012]
[6]张琳,刘明童,张玉洁,等.探索低资源的迭代式复述生成增强方法[J].智能系统学报,2022,17(4):680.[doi:10.11992/tis.202106032]
　ZHANG Lin,LIU Mingtong,ZHANG Yujie,et al.Explore the low-resource iterative paraphrase generation enhancement method[J].CAAI Transactions on Intelligent Systems,2022,17():680.[doi:10.11992/tis.202106032]
[7]周凯锐,刘鑫,景丽萍,等.概念驱动的小样本判别特征学习方法[J].智能系统学报,2023,18(1):162.[doi:10.11992/tis.202203061]
　ZHOU Kairui,LIU Xin,JING Liping,et al.Concept-driven discriminative feature learning for few-shot learning[J].CAAI Transactions on Intelligent Systems,2023,18():162.[doi:10.11992/tis.202203061]
[8]陈斌,朱晋宁.双流增强融合网络微表情识别[J].智能系统学报,2023,18(2):360.[doi:10.11992/tis.202109036]
　CHEN Bin,ZHU Jinning.Micro-expression recognition based on a dual-stream enhanced fusion network[J].CAAI Transactions on Intelligent Systems,2023,18():360.[doi:10.11992/tis.202109036]
[9]张小川,陈盼盼,邢欣来,等.一种建立在GPT-2模型上的数据增强方法[J].智能系统学报,2024,19(1):209.[doi:10.11992/tis.202304055]
　ZHANG Xiaochuan,CHEN Panpan,XING Xinlai,et al.A data augmentation method built on GPT-2 model[J].CAAI Transactions on Intelligent Systems,2024,19():209.[doi:10.11992/tis.202304055]
[10]莫宏伟,孙琪,孙鹏,等.乳腺钼靶肿块自监督预训练迁移检测方法研究[J].智能系统学报,2024,19(5):1082.[doi:10.11992/tis.202304032]
　MO Hongwei,SUN Qi,SUN Peng,et al.Self-supervised pretraining detection of mammographic mass targets in breast[J].CAAI Transactions on Intelligent Systems,2024,19():1082.[doi:10.11992/tis.202304032]

备注/Memo

收稿日期:2021-03-16。
基金项目:国家自然科学基金项目(61534003，81371663)；德岛大学研究集群项目(2003002)
作者简介:李茜茜，硕士研究生，主要研究方向为语音情感识别和特征处理;沈晓燕，教授，博士，南通市“226”工程二级中青年科技领军人才、南通市康复医学会康复教育专业委员会委员、南通大学信息科学与技术学院信息与通信工程专业医学信息技术学科带头人，主要研究方向为生物神经接口技术、神经信号检测电路和功能电激励电路设计、神经信号和肌电信号采集技术与分析、神经信号再生和功能重建。发表学术论文40余篇;任福继，教授，博士，日本工程院院士和欧盟科学院院士，中国人工智能学会名誉副理事长，日本工学会、IEICE、CAAI Fellow，日本国际先进信息研究所主席，获吴文俊人工智能科学技术奖创新一等奖等，主要研究方向为人工智能、情感计算、自然言语理解、模式识别。申请发明专利 10 余项。发表学术论文 500 余篇.
通讯作者:沈晓燕. E-mail：xiaoyansho@ntu.edu.cn

更新日期/Last Update: 2021-02-25

面向数据增强的多种语音情感分类算法研究 PDF下载HTML

备注/Memo

面向数据增强的多种语音情感分类算法研究

PDF下载 HTML