[1]李茜茜,沈晓燕,任福继,等.面向数据增强的多种语音情感分类算法研究[J].智能系统学报,2021,16(1):170-177.[doi:10.11992/tis.202103005]
LI Qianqian,SHEN Xiaoyan,REN Fuji,et al.Investigation of multiple speech emotion classification algorithms based on data enhancement[J].CAAI Transactions on Intelligent Systems,2021,16(1):170-177.[doi:10.11992/tis.202103005]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
16
期数:
2021年第1期
页码:
170-177
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2021-01-05
- Title:
-
Investigation of multiple speech emotion classification algorithms based on data enhancement
- 作者:
-
李茜茜1,2, 沈晓燕1, 任福继2, 康鑫2
-
1. 南通大学 信息科学技术学院,江苏 南通 226019;
2. 日本德岛大学 智能信息工学部,日本 德岛 7708501
- Author(s):
-
LI Qianqian1,2, SHEN Xiaoyan1, REN Fuji2, KANG Xin2
-
1. Institute of Information Science and Technology, Nantong University, Nantong 226019, China;
2. Department of Intelligent Information Engineering, Tokushima University, Tokushima 7708501, Japan
-
- 关键词:
-
语音情感识别; 数据增强; 情感特征; 支持向量机; 随机森林; K最邻近; 低级描述特征; 机器学习
- Keywords:
-
speech emotion recognition; data enhancement; emotion feature; support vector machine; random forest; K-nearest neighbor; low-level description features; machine learning
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202103005
- 摘要:
-
目前语音情感识别存在语音样本不足、提取的特征数据量大和无关特征多使得识别率不高的问题。针对语音样本不足的情况,在预处理阶段提出了时频域的数据增强方法,对原有的数据库进行扩充;根据传统算法中提取的特征数据量大且与情感无关的特征多的现状,提取了1 582维的情感特征和10组低级描述特征。分别在支持向量机、随机森林和K最邻近3种机器学习算法上做了对比实验。实验证明:支持向量机的平均识别率比较好。在所提取的10组特征组中,LogMelFreqBand特征在3种算法上的精确度分别为74.63%、64.93%和66.42%;而pcm_fftMag_mfcc特征的精确度分别为84.33%、73.13%和58.21%。
- Abstract:
-
Currently, problems in speech emotion recognition, such as insufficient speech samples and numerous extracted and irrelevant features, make the recognition rate low. To solve the problem of insufficient speech samples, a time-frequency domain data enhancement method is proposed in the preprocessing stage to expand the original database. Considering the current situation where traditional algorithms extract a large amount of feature data and many are emotion-independent, 1582-dimensional emotion features and 10 groups of low-level description features were extracted. Finally, a comparative experiment was performed on three machine learning algorithms: the support vector machine, random forest, and K-nearest neighbor. Experiments showed that the average recognition rate of the support vector machine was superior. Among the ten sets of features, the accuracy of LogMelFreqBand in the three algorithms was 74.63%, 64.93%, and 66.42%, respectively, and the accuracy of pcm_fftMag_mfcc was 84.33%, 73.13%, and 58.21%, respectively.
备注/Memo
收稿日期:2021-03-16。
基金项目:国家自然科学基金项目(61534003,81371663);德岛大学研究集群项目(2003002)
作者简介:李茜茜,硕士研究生,主要研究方向为语音情感识别和特征处理;沈晓燕,教授,博士,南通市“226”工程二级中青年科技领军人才、南通市康复医学会康复教育专业委员会委员、南通大学信息科学与技术学院信息与通信工程专业医学信息技术学科带头人,主要研究方向为生物神经接口技术、神经信号检测电路和功能电激励电路设计、神经信号和肌电信号采集技术与分析、神经信号再生和功能重建。发表学术论文40余篇;任福继,教授,博士,日本工程院院士和欧盟科学院院士,中国人工智能学会名誉副理事长,日本工学会、IEICE、CAAI Fellow,日本国际先进信息研究所主席,获吴文俊人工智能科学技术奖创新一等奖等,主要研究方向为人工智能、情感计算、自然言语理解、模式识别。申请发明专利 10 余项。发表学术论文 500 余篇.
通讯作者:沈晓燕. E-mail:xiaoyansho@ntu.edu.cn
更新日期/Last Update:
2021-02-25