[1]GAO Qingji,ZHAO Zhihua,XU Da,et al.Review on speech emotion recognition research[J].CAAI Transactions on Intelligent Systems,2020,15(1):1-13.[doi:10.11992/tis.201904065]
Copy

Review on speech emotion recognition research

References:
[1] PRAVENA D, GOVIND D. Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals[J]. International journal of speech technology, 2017, 20(4): 787–797.
[2] MIXDORFF H, H?NEMANN A, RILLIARD A, et al. Audio-visual expressions of attitude: how many different attitudes can perceivers decode?[J]. Speech communication, 2017, 95: 114–126.
[3] BUITELAAR P, WOOD I D, NEGI S, et al. MixedEmotions: an open-source toolbox for multimodal emotion analysis[J]. IEEE transactions on multimedia, 2018, 20(9): 2454–2465.
[4] SAPI?SKI T, KAMI?SKA D, PELIKANT A, et al. Emotion recognition from skeletal movements[J]. Entropy, 2019, 21(7): 646.
[5] PARIS M, MAHAJAN Y, KIM J, et al. Emotional speech processing deficits in bipolar disorder: the role of mismatch negativity and P3a[J]. Journal of affective disorders, 2018, 234: 261–269.
[6] SCHELINSKI S, VON KRIEGSTEIN K. The relation between vocal pitch and vocal emotion recognition abilities in people with autism spectrum disorder and typical development[J]. Journal of autism and developmental disorders, 2019, 49(1): 68–82.
[7] SWAIN M, ROUTRAY A, KABISATPATHY P. Databases, features and classifiers for speech emotion recognition: a review[J]. International journal of speech technology, 2018, 21(1): 93–120.
[8] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37–50
HAN Wenjing, LI Haifeng, RUAN Huabin, et al. Review on speech emotion recognition[J]. Journal of software, 2014, 25(1): 37–50
[9] 刘振焘, 徐建平, 吴敏, 等. 语音情感特征提取及其降维方法综述[J]. 计算机学报, 2018, 41(12): 2833–2851
LIU Zhentao, XU Jianping, WU Min, et al. Review of emotional feature extraction and dimension reduction method for speech emotion recognition[J]. Chinese journal of computers, 2018, 41(12): 2833–2851
[10] KRATZWALD B, ILI? S, KRAUS M, et al. Deep learning for affective computing: text-based emotion recognition in decision support[J]. Decision support systems, 2018, 115: 24–35.
[11] ORTONY A, TURNER T J. What’s basic about basic emotions?[J]. Psychological review, 1990, 97(3): 315–331.
[12] EKMAN P, FRIESEN W V, O’SULLIVAN M, et al. Universals and cultural differences in the judgments of facial expressions of emotion[J]. Journal of personality and social psychology, 1987, 53(4): 712–717.
[13] SCHULLER B W. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends[J]. Communications of the ACM, 2018, 61(5): 90–99.
[14] 乐国安, 董颖红. 情绪的基本结构: 争论、应用及其前瞻[J]. 南开学报(哲学社会科学版), 2013(1): 140–150
YUE Guoan, DONG Yinghong. On the categorical and dimensional approaches of the theories of the basic structure of emotions[J]. Nankai journal (philosophy, literature and social science edition), 2013(1): 140–150
[15] 李霞, 卢官明, 闫静杰, 等. 多模态维度情感预测综述[J]. 自动化学报, 2018, 44(12): 2142–2159
LI Xia, LU Guanming, YAN Jingjie, et al. A survey of dimensional emotion prediction by multimodal cues[J]. Acta automatica sinica, 2018, 44(12): 2142–2159
[16] FONTAINE J R J, SCHERER K R, ROESCH E B, et al. The world of emotions is not two-dimensional[J]. Psychological science, 2007, 18(12): 1050–1057.
[17] RUSSELL J A. A circumplex model of affect[J]. Journal of personality and social psychology, 1980, 39(6): 1161–1178.
[18] YIK M S M, RUSSELL J A, BARRETT L F. Structure of self-reported current affect: integration and beyond[J]. Journal of personality and social psychology, 1999, 77(3): 600–619.
[19] PLUTCHIK R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice[J]. American scientist, 2001, 89(4): 344–350.
[20] ZHALEHPOUR S, ONDER O, AKHTAR Z, et al. BAUM-1: a spontaneous audio-visual face database of affective and mental states[J]. IEEE transactions on affective computing, 2017, 8(3): 300–313.
[21] WANG Wenwu. Machine audition: principles, algorithms and systems[M]. New York: Information Science Reference, 2010, 398–423.
[22] WANG Yongjin, GUAN Ling. Recognizing human emotional state from audiovisual signals[J]. IEEE transactions on multimedia, 2008, 10(4): 659–668.
[23] BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech[C]//INTERSPEECH 2005. Lisbon, Portugal, 2005: 1517–1520.
[24] MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE’ 05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, USA, 2006: 1–8.
[25] LIVINGSTONE S R, RUSSO F A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English[J]. PLoS one, 2018, 13(5): e0196391.
[26] STEIDL S. Automatic classification of emotion-related user states in spontaneous children’s speech[M]. Erlangen, Germany: University of Erlangen-Nuremberg, 2009: 1–250
[27] GRIMM M, KROSCHEL K, NARAYANAN S. The Vera am Mittag German audio-visual emotional speech database[C]//Proceedings of 2008 IEEE International Conference on Multimedia and Expo. Hannover, Germany, 2008: 865–868.
[28] BUSSO C, BULUT M, LEE C C, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Language resources and evaluation, 2008, 42(4): 335–359.
[29] RINGEVAL F, SONDEREGGER A, SAUER J, et al. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions[C]//Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. Shanghai, China, 2013: 1–8.
[30] METALLINOU A, YANG Zhaojun, LEE C, et al. The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations[J]. Language resources and evaluation, 2016, 50(3): 497–521.
[31] MCKEOWN G, VALSTAR M, COWIE R, et al. The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent[J]. IEEE transactions on affective computing, 2012, 3(1): 5–17.
[32] 饶元, 吴连伟, 王一鸣, 等. 基于语义分析的情感计算技术研究进展[J]. 软件学报, 2018, 29(8): 2397–2426
RAO Yuan, WU Lianwei, WANG Yiming, et al. Research progress on emotional computation technology based on semantic analysis[J]. Journal of software, 2018, 29(8): 2397–2426
[33] WANG Yiming, RAO Yuan, WU Lianwei. A review of sentiment semantic analysis technology and progress[C]//Proceedings of 2017 13th International Conference on Computational Intelligence and Security. Hong Kong, China, 2017: 452–455.
[34] MORRIS J D. Observations: SAM: the self-assessment manikin—an efficient cross-cultural measurement of emotional response[J]. Journal of advertising research, 1995, 35(6): 63–68.
[35] 夏凡, 王宏. 多模态情感数据标注方法与实现[C]//第一届建立和谐人机环境联合学术会议(HHME2005)论文集. 北京, 2005: 1481–1487.
XIA Fan, WANG Hong. Multi-modal affective annotation method and implementation[C]//The 1th Jiont Conference on Harmonious Human Machine Environment (HHME2005). Beijing, 2005: 1481–1487.
[36] COWIE R, DOUGLAS-COWIE E, SAVVIDOU S, et al. FEELTRACE: an instrument for recording perceived emotion in real time[C]//Proceedings of the 2000 ISCA Tutorial and Research Workshop on Speech and Emotion. Newcastle, United Kingdom, 2000: 19–24.
[37] 陈炜亮, 孙晓. 基于MFCCG-PCA的语音情感识别[J]. 北京大学学报(自然科学版), 2015, 51(2): 269–274
CHEN Weiliang, SUN Xiao. Mandarin speech emotion recognition based on MFCCG-PCA[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 269–274
[38] SUN Linhui, FU Sheng, WANG Fu. Decision tree SVM model with Fisher feature selection for speech emotion recognition[J]. EURASIP journal on audio, speech, and music processing, 2019, 2019: 2.
[39] NASSIF A B, SHAHIN I, ATTILI I, et al. Speech recognition using deep neural networks: a systematic review[J]. IEEE access, 2019, 7: 19143–19165.
[40] ZHANG Shiqing, ZHANG Shiliang, HUANG Tiejun, et al. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching[J]. IEEE transactions on multimedia, 2018, 20(6): 1576–1590.
[41] 陈逸灵, 程艳芬, 陈先桥, 等. PAD三维情感空间中的语音情感识别[J]. 哈尔滨工业大学学报, 2018, 50(11): 160–166
CHEN Yiling, CHENG Yanfen, CHEN Xianqiao, et al. Speech emotionestimation in PAD 3D emotion space[J]. Journal of Harbin Institute of Technology, 2018, 50(11): 160–166
[42] 王玮蔚, 张秀再. 基于变分模态分解的语音情感识别方法[J]. 应用声学, 2019, 38(2): 237–244
WANG Weiwei, ZHANG Xiuzai. Speech emotion recognition based on variational mode decomposition[J]. Journal of applied acoustics, 2019, 38(2): 237–244
[43] 王忠民, 刘戈, 宋辉. 基于多核学习特征融合的语音情感识别[J]. 计算机工程, 2019, 45(08): 248–254
WANG Zhongmin, LIU Ge, SONG Hui. Feature fusion based on multiple kernal learning for speech emotion recognition[J]. Computer engineering, 2019, 45(08): 248–254
[44] 卢官明, 袁亮, 杨文娟, 等. 基于长短期记忆和卷积神经网络的语音情感识别[J]. 南京邮电大学学报(自然科学版), 2018, 38(5): 63–69
LU Guanming, YUAN Liang, YANG Wenjuan, et al. Speech emotion recognition based on long short-term memory and convolutional neural networks[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2018, 38(5): 63–69
[45] ?ZSEVEN T. Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition[J]. Applied acoustics, 2018, 142: 70–77.
[46] JIANG Wei, WANG Zheng, JIN J S, et al. Speech emotion recognition with heterogeneous feature unification of deep neural network[J]. Sensors, 2019, 19(12): 2730.
[47] TORRES-BOZA D, OVENEKE M C, WANG Fengna, et al. Hierarchical sparse coding framework for speech emotion recognition[J]. Speech communication, 2018, 99: 80–89.
[48] MAO Qirong, XUE Wentao, RAO Qiru, et al. Domain adaptation for speech emotion recognition by sharing priors between related source and target classes[C]//Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China, 2016: 2608–2612.
[49] JIN Qin, LI Chengxin, CHEN Shizhe, et al. Speech emotion recognition with acoustic and lexical features[C]//Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, QLD, Australia, 2015: 4749–4753.
[50] SCHULLER B. Recognizing affect from linguistic information in 3D continuous space[J]. IEEE transactions on affective computing, 2011, 2(4): 192–205.
[51] DIMOULAS C A, KALLIRIS G M. Investigation of wavelet approaches for joint temporal, spectral and cepstral features in audio semantics[C]//Audio Engineering Society Convention. New York, USA, 2013.
[52] TAWARI A, TRIVEDI M M. Speech emotion analysis: exploring the role of context[J]. IEEE transactions on multimedia, 2010, 12(6): 502–509.
[53] QUIROS-RAMIREZ M A, ONISAWA T. Considering cross-cultural context in the automatic recognition of emotions[J]. International journal of machine learning and cybernetics, 2015, 6(1): 119–127.
[54] WU Xixin, LIU Songxiang, CAO Yuewen, et al. Speech emotion recognition using capsule networks[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, United Kingdom, 2019: 6695–6699.
[55] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, United States, 2012: 1097–1105.
[56] ZHAO Jianfeng, MAO Xia, CHEN Lijiang. Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J]. Biomedical signal processing and control, 2019, 47: 312–323.
[57] 张丽, 吕军, 强彦, 等. 基于深度信念网络的语音情感识别[J]. 太原理工大学学报, 2019, 50(1): 101–107
ZHANG Li, LV Jun, QIANG Yan, et al. Emotion recognition based on deep belief network[J]. Journal of Taiyuan University of Technology, 2019, 50(1): 101–107
[58] ABDELWAHAB M, BUSSO C. Domain adversarial for acoustic emotion recognition[J]. IEEE/ACM transactions on audio, speech, and language processing, 2018, 26(12): 2423–2435.
[59] MENG Zhong, LI Jinyu, CHEN Zhuo, et al. Speaker-invariant training via adversarial learning[C]//Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, AB, Canada, 2018: 5969–5973.
[60] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of machine learning research, 2008, 9: 2579–2605.
[61] BOERSMA P, WEENINK D. Praat, a system for doing phonetics by computer[J]. Glot international, 2002, 5(9/10): 341–345.
[62] EYBEN F, W?LLMER M, SCHULLER B. Opensmile: the Munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy, 2010: 1459–1462.
[63] ?ZSEVEN T, Dü?ENCI M. SPeech ACoustic (SPAC): a novel tool for speech feature extraction and classification[J]. Applied acoustics, 2018, 136: 1–8.
[64] 孙凌云, 何博伟, 刘征, 等. 基于语义细胞的语音情感识别[J]. 浙江大学学报(工学版), 2015, 49(6): 1001–1008
SUN Lingyun, HE Bowei, LIU Zheng, et al. Speech emotion recognition based on information cell[J]. Journal of Zhejiang University (Engineering Science), 2015, 49(6): 1001–1008
[65] SCHULLER B, BATLINER A, STEIDL S, et al. Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge[J]. Speech communication, 2011, 53(9/10): 1062–1087.
[66] GHARAVIAN D, BEJANI M, SHEIKHAN M. Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks[J]. Multimedia tools and applications, 2016, 76(2): 2331–2352.
[67] 王艳, 胡维平. 基于BP特征选择的语音情感识别[J]. 微电子学与计算机, 2019, 36(5): 14–18
WANG Yan, HU Weiping. Speech emotion recognition based on BP feature selection[J]. Microelectronics & computer, 2019, 36(5): 14–18
[68] 孙颖, 姚慧, 张雪英, 等. 基于混沌特性的情感语音特征提取[J]. 天津大学学报(自然科学与工程技术版), 2015, 48(8): 681–685
SUN Ying, YAO Hui, ZHANG Xueying, et al. Feature extraction of emotional speech based on chaotic characteristics[J]. Journal of Tianjin University (Science and Technology), 2015, 48(8): 681–685
[69] 宋鹏, 郑文明, 赵力. 基于子空间学习和特征选择融合的语音情感识别[J]. 清华大学学报(自然科学版), 2018, 58(4): 347–351
SONG Peng, ZHENG Wenming, ZHAO Li. Joint subspace learning and feature selection method for speech emotion recognition[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(4): 347–351
[70] EYBEN F, SCHERER K R, SCHULLER B W, et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for voice research and affective computing[J]. IEEE transactions on affective computing, 2016, 7(2): 190–202.
[71] ?ZSEVEN T. A novel feature selection method for speech emotion recognition[J]. Applied acoustics, 2019, 146: 320–326.
[72] 姜晓庆, 夏克文, 夏莘媛, 等. 采用半定规划多核SVM的语音情感识别[J]. 北京邮电大学学报, 2015, 38(S1): 67–71
JIANG Xiaoqing, XIA Kewen, XIA Xinyuan, et al. Speech emotion recognition using semi-definite programming multiple-kernel SVM[J]. Journal of Beijing University of Posts and Telecommunications, 2015, 38(S1): 67–71
[73] ZHENG Weiqiao, YU Jiasheng, ZOU Yuexian. An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//Proceedings of 2015 International Conference on Affective Computing and Intelligent Interaction. Xi’an, China, 2015: 827–831.
[74] SHAHIN I, NASSIF A B, HAMSA S. Emotion recognition using hybrid Gaussian mixture model and deep neural network[J]. IEEE access, 2019, 7: 26777–26787.
[75] SAGHA H, JUN Deng, GAVRYUKOVA M, et al. Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace[C]//Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China, 2016: 5800–5804.
[76] 陈师哲, 王帅, 金琴. 多文化场景下的多模态情感识别[J]. 软件学报, 2018, 29(4): 1060–1070
CHEN Shizhe, WANG Shuai, JIN Qin. Multimodal emotion recognition in multi-cultural conditions[J]. Journal of software, 2018, 29(4): 1060–1070
[77] 刘颖, 贺聪, 张清芳. 基于核相关分析算法的情感识别模型[J]. 吉林大学学报(理学版), 2017, 55(6): 1539–1544
LIU Ying, HE Cong, ZHANG Qingfang. Emotion recognition model based on kernel correlation analysis algorithm[J]. Journal of Jilin University (Science Edition), 2017, 55(6): 1539–1544
[78] MA Yaxiong, HAO Yixue, CHEN Min, et al. Audio-Visual Emotion Fusion (AVEF): a deep efficient weighted approach[J]. Information fusion, 2019, 46: 184–192.
[79] HUANG Yongming, TIAN Kexin, WU Ao, et al. Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition[J]. Journal of ambient intelligence and humanized computing, 2019, 10(5): 1787–1798.
[80] MANNEPALLI K, SASTRY P N, SUMAN M. A novel adaptive fractional deep belief networks for speaker emotion recognition[J]. Alexandria engineering journal, 2017, 56(4): 485–497.
[81] XU Xinzhou, DENG Jun, COUTINHO E, et al. Connecting subspace learning and extreme learning machine in speech emotion recognition[J]. IEEE transactions on multimedia, 2019, 21(3): 795–808.
[82] TON-THAT A H, CAO N T. Speech emotion recognition using a fuzzy approach[J]. Journal of intelligent & fuzzy systems, 2019, 36(2): 1587–1597.
[83] ZHANG Biqiao, PROVOST E M, ESSL G. Cross-corpus acoustic emotion recognition with multi-task learning: seeking common ground while preserving differences[J]. IEEE transactions on affective computing, 2019, 10(1): 85–99.
[84] HUANG Kunyi, WU C H, SU M H. Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses[J]. Pattern recognition, 2019, 88: 668–678.
[85] YOON S A, SON G, KWON S. Fear emotion classification in speech by acoustic and behavioral cues[J]. Multimedia tools and applications, 2019, 78(2): 2345–2366.
[86] 袁非牛, 章琳, 史劲亭, 等. 自编码神经网络理论及应用综述[J]. 计算机学报, 2019, 42(1): 203–230
YUAN Feiniu, ZHANG Lin, SHI Jinting, et al. Theories and applications of auto-encoder neural networks: a literature survey[J]. Chinese journal of computers, 2019, 42(1): 203–230
[87] 林懿伦, 戴星原, 李力, 等. 人工智能研究的新前线: 生成式对抗网络[J]. 自动化学报, 2018, 44(5): 775–792
LIN Yilun, DAI Xingyuan, LI Li, et al. The new frontier of AI research: generative adversarial networks[J]. Acta automatica sinica, 2018, 44(5): 775–792
[88] ZHOU Jie, HUANG J X, CHEN Qin, et al. Deep learning for aspect-level sentiment classification: survey, vision, and challenges[J]. IEEE access, 2019, 7: 78454–78483.
[89] O’SHAUGHNESSY D. Recognition and processing of speech signals using neural networks[J]. Circuits, systems, and signal processing, 2019, 38(8): 3454–3481.
[90] XIE Yue, LIANG Ruiyu, LIANG Zhenlin, et al. Attention-based dense LSTM for speech emotion recognition[J]. IEICE transactions on information and systems, 2019, E102.D(7): 1426–1429.
[91] PEI Jing, DENG Lei, SONG Sen, et al. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature, 2019, 572: 106-111
[92] QIAN Yongfeng, LU Jiayi, MIAO Yiming, et al. AIEM: AI-enabled affective experience management[J]. Future generation computer systems, 2018, 89: 438–445.
[93] LADO-CODESIDO M, PéREZ C M, MATEOS R, et al. Improving emotion recognition in schizophrenia with “VOICES”: an on-line prosodic self-training[J]. PLoS one, 2019, 14(1): e0210816.
[94] CUMMINS N, BAIRD N, SCHULLER B W. Speech analysis for health: current state-of-the-art and the increasing impact of deep learning[J]. Methods, 2018, 151: 41–54.
[95] LIU Zhentao, XIE Qiao, WU Min, et al. Speech emotion recognition based on an improved brain emotion learning model[J]. Neurocomputing, 2018, 309: 145–156.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems