<-上一篇/Previous Article 下一篇/Next Article->

[1]赵小明,唐志伟,张石清.面向听视觉信息的多模态人格识别研究进展[J].智能系统学报,2021,16(2):189-201.[doi:10.11992/tis.202101034]
　ZHAO Xiaoming,TANG Zhiwei,ZHANG Shiqing.Research advance of multimodal personality recognition based on audio and visual cues[J].CAAI Transactions on Intelligent Systems,2021,16(2):189-201.[doi:10.11992/tis.202101034]

点击复制

面向听视觉信息的多模态人格识别研究进展

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 16 期数: 2021年第2期页码: 189-201 栏目: 综述出版日期: 2021-03-05

Title:: Research advance of multimodal personality recognition based on audio and visual cues

作者:: 赵小明¹, 唐志伟¹, 张石清²; 1. 浙江理工大学机械与自动控制学院，浙江杭州 310018;
2. 台州学院智能信息处理研究所，浙江台州 318000

Author(s):: ZHAO Xiaoming¹, TANG Zhiwei¹, ZHANG Shiqing²; 1. School of Faculty of Mechanical Engineering and Automation, Zhejiang Sci-Tech University, Hangzhou 310018, China;
2. Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China

关键词:: 人格识别; 人格计算; 人格类型; 听视觉信息; 特征提取; 手工特征; 深度特征; 多模态融合

Keywords:: Personality recognition; personality computing; types of personality; audio-visual cues; feature extraction; hand-crafted features; depth features; multimodal fusion

分类号:: TP391

DOI:: 10.11992/tis.202101034

摘要:: 人格识别分析是人格计算研究中一个重要的研究内容，在人类行为分析、人工智能、人机交互、个性化推荐等方面具有重要的应用价值，是近年来心理学、认知学、计算机科学等领域中的一个多学科交叉的热点研究课题。本文介绍了与人格识别相关的各种人格类型表示理论和人格识别数据库，阐述了面向听视觉信息的各种听视觉人格特征提取技术，如手工特征和深度特征，并在此基础上对面向听视觉信息人格识别的多模态融合方法做了详细的分类和归纳，最后概括了面向听视觉信息的多模态人格识别发展趋势，并进行了展望。

Abstract:: Personality recognition analysis is an important research topic in personality computing, which has important applications in human behavior analysis, artificial intelligence, human-computer interaction, and personalized recommendation. In recent years, personality recognition analysis has become an active research topic in psychology, cognition, and computer science. This study introduces different types of personality representation theories and databases related to personality recognition and presents various audio-visual cue feature extraction technologies for personality recognition, such as handcrafted and depth features. Then, multimodal fusion methods integrating audio and visual cues for personality recognition are classified and summarized in detail. Finally, the development trend of multimodal personality recognition integrating audio and visual cues is explored and summarized.

参考文献/References:: [1] VINCIARELLI A, MOHAMMADI G. A survey of personality computing[J]. IEEE transactions on affective computing, 2014, 5(3):273-291.
[2] COSTA P T, MCCRAE R R. Trait theories of personality[M]//BARONE D F, HERSEN M, VAN HASSELT V B. Advanced Personality. Boston:Springer, 1998:103-121.
[3] JUNIOR J C S J, Gü?LüTüRK Y, PéREZ M, et al. First impressions:a survey on vision-based apparent personality trait analysis[J]. IEEE transactions on affective computing, 2019:1?1.
[4] PONCE-LóPEZ V, CHEN Baiyu, OLIU M, et al. ChaLearn LAP 2016:first round challenge on first impressions-dataset and results[C]//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:400-418.
[5] ESCALANTE H J, KAYA H, SALAH A A, et al. Explaining first impressions:modeling, recognizing, and explaining apparent personality from videos[J]. IEEE transcations on affective computing, 2020:1?1.
[6] MATTHEWS G, DEARY I J, WHITEMAN M C. Personality traits[M]. 2nd ed. Cambridge:Cambridge University Press, 2003.
[7] MCCRAE R R, JOHN O P. An introduction to the five-factor model and its applications[J]. Journal of personality, 1992, 60(2):175-215.
[8] KARSON S, O’DELL J W. A guide to the clinical use of the 16 PF[M]. Champaign, IL:Institute for Personality & Ability Testing, 1976.
[9] FURNHAM A. The big five versus the big four:the relationship between the Myers-Briggs Type Indicator (MBTI) and NEO-PI five factor model of personality[J]. Personality and individual differences, 1996, 21(2):303-307.
[10] GREENE R L. The MMPI:an interpretive manual[M]. New York:Grune & Stratton Inc, 1980.
[11] EYSENCK H J, EYSENCK S B G, EYSENCK H J, et al. Manual of the eysenck personality questionnaire[J]. Journal of cardiac failure, 1975, 20(5):S67.
[12] BIEL J I, GATICA-PEREZ D. Voices of vlogging[C]//Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. Washington, USA, 2010.
[13] BIEL J I, GATICA-PEREZ D. The youtube lens:crowdsourced personality impressions and audiovisual analysis of vlogs[J]. IEEE transactions on multimedia, 2013, 15(1):41-55.
[14] BIEL J I, GATICA-PEREZ D. Vlogcast yourself:nonverbal behavior and attention in social media[C]//Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction. Beijing, China, 2010:1-4.
[15] SANCHEZ-CORTES D, ARAN O, MAST M S, et al. A nonverbal behavior approach to identify emergent leaders in small groups[J]. IEEE transactions on multimedia, 2012, 14(3):816-832.
[16] KICKUL J, NEUMAN G. Emergent leadership behaviors:the function of personality and cognitive ability in determining teamwork performance and KSAs[J]. Journal of business and psychology, 2000, 15(1):27-51.
[17] MCKEOWN G, VALSTAR M, COWIE R, et al. The SEMAINE database:annotated multimodal records of emotionally colored conversations between a person and a limited agent[J]. IEEE transactions on affective computing, 2012, 3(1):5-17.
[18] RAMMSTEDT B, JOHN O P. Measuring personality in one minute or less:a 10-item short version of the Big Five Inventory in English and German[J]. Journal of research in personality, 2007, 41(1):203-212.
[19] ESCALANTE H J, GUYON I, ESCALERA S, et al. Design of an explainable machine learning challenge for video interviews[C]//Proceedings of 2017 International Joint Conference on Neural Networks. Anchorage, USA, 2017:3688-3695.
[20] ZHANG Ting, QIN Rizhen, DONG Qiulei, et al. Physiognomy:personality traits prediction by learning[J]. International journal of automation and computing, 2017, 14(4):386-395.
[21] CELIKTUTAN O, SKORDOS E, GUNES H. Multimodal human-human-robot interactions (MHHRI) dataset for studying personality and engagement[J]. IEEE transactions on affective computing, 2019, 10(4):484-497.
[22] OOSTERHOF N N, TODOROV A. The functional basis of face evaluation[J]. Proceedings of the national academy of sciences of the United States of America, 2008, 105(32):11087-11092.
[23] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[24] WANG Xizhao, ZHAO Yanxia, POURPANAH F. Recent advances in deep learning[J]. International journal of machine learning and cybernetics, 2020, 11(4):747-750.
[25] GAO Jing, LI Peng, CHEN Zhikui, et al. A survey on deep learning for multimodal data fusion[J]. Neural computation, 2020, 32(5):829-864.
[26] CHOI Y, EL-KHAMY M, LEE J. Universal deep neural network compression[J]. IEEE journal of selected topics in signal processing, 2020, 14(4):715-726.
[27] ANGELOV P, SOARES E. Towards explainable deep neural networks (xDNN)[J]. Neural networks, 2020, 130:185-194.
[28] MAIRESSE F, WALKER M A, MEHL M R, et al. Using linguistic cues for the automatic recognition of personality in conversation and text[J]. Journal of artificial intelligence research, 2007, 30(1):457-500.
[29] MEHL M R, GOSLING S D, PENNEBAKER J W. Personality in its natural habitat:manifestations and implicit folk theories of personality in daily life[J]. Journal of personality and social psychology, 2006, 90(5):862-877.
[30] VALENTE F, KIM S, MOTLICEK P. Annotation and recognition of personality traits in spoken conversations from the ami meetings corpus[C]//Proceedings of Interspeech 2012. Portland, USA, 2012.
[31] IVANOV A V, RICCARDI G, SPORKA A J, et al. Recognition of personality traits from human spoken conversations[C]//Proceedings of Interspeech 2011. Florence, Italy, 2011:1549-1552.
[32] AN G, LEVITAN S I, LEVITAN R, et al. Automatically classifying self-rated personality scores from speech[C]//Proceedings of Interspeech 2016. San Francisco, USA, 2016:1412-1416.
[33] CARBONNEAU M A, GRANGER E, ATTABI Y, et al. Feature learning from spectrograms for assessment of personality traits[J]. IEEE transactions on affective computing, 2020, 11(1):25-31.
[34] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[35] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7):1527-1554.
[36] ELMAN J L. Finding structure in time[J]. Cognitive science, 1990, 14(2):179-211.
[37] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
[38] HAYAT H, VENTURA C, LAPEDRIZA à. On the use of interpretable CNN for personality trait recognition from audio[C]//Proceedings of CCIA. Mallorca, Spain, 2019:135-144.
[39] SU M H, WU C H, HUANG Kunyi, et al. Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks[C]//Proceedings of 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Kuala Lumpur, Malaysia, 2017:1532-1536.
[40] ZHU Minxian, XIE Xiang, ZHANG Liqiang, et al. Automatic personality perception from speech in mandarin[C]//Proceedings of 201811th International Symposium on Chinese Spoken Language Processing. Taipei, China, 2018:309-313.
[41] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005:886-893.
[42] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on pattern analysis and machine intelligence, 2002, 24(7):971-987.
[43] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2):91-110.
[44] DHALL A, HOEY J. First impressions-predicting user personality from twitter profile images[C]//Proceedings of the 7th International Workshop on Human Behavior Understanding. Amsterdam, The Netherlands, 2016:148-158.
[45] BOSCH A, ZISSERMAN A, MUNOZ X. Representing shape with a spatial pyramid kernel[C]//Proceedings of the 6th ACM International Conference on Image and Video Retrieval. Amsterdam, The Netherlands, 2007:401-408.
[46] OJANSIVU V, HEIKKIL? J. Blur insensitive texture classification using local phase quantization[C]//Proceedings of the 3rd International Conference on Image and Signal Processing. Cherbourg-Octeville, France, 2008:236-243.
[47] GUNTUKU S C, QIU Lin, ROY S, et al. Do others perceive you as you want them to?:modeling personality based on selfies[C]//Proceedings of the 1st International Workshop on Affect & Sentiment in Multimedia. Brisbane, Australia, 2015:21-26.
[48] YAN Yan, NIE Jie, HUANG Lei, et al. Exploring relationship between face and trustworthy impression using mid-level facial features[C]//Proceedings of 22nd International Conference on Multimedia Modeling. Miami, USA, 2016:540-549.
[49] 聂婕, 黄磊, 李臻, 等. 基于人物图像视觉特征的人物性格隐私分析[J]. 通信学报, 2016, 37(11):129-136
NIE Jie, HUANG Lei, LI Zhen, et al. Human personality privacy analysis based on visual features[J]. Journal on communications, 2016, 37(11):129-136
[50] TAREAF R B, ALHOSSEINI S A, MEINEL C. Facial-based personality prediction models for estimating individuals private traits[C]//Proceedings of 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). Xiamen, China, 2019:1586-1594.
[51] BIEL J I, TEIJEIRO-MOSQUERA L, GATICA-PEREZ D. FaceTube:predicting personality from facial expressions of emotion in online conversational video[C]//Proceedings of the 14th ACM International Conference on Multimodal Interaction. Santa Monica, California, USA, 2012:53-56.
[52] GATICA-PEREZ D, SANCHEZ-CORTES D, DO T M T, et al. Vlogging over time:longitudinal impressions and behavior in youtube[C]//Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia. Cairo, Egypt, 2018:37-46.
[53] TEIJEIRO-MOSQUERA L, BIEL J I, ALBA-CASTRO J L, et al. What your face vlogs about:expressions of emotion and big-five traits impressions in YouTube[J]. IEEE transactions on affective computing, 2015, 6(2):193-205.
[54] VENTURA C, MASIP D, LAPEDRIZA A. Interpreting CNN models for apparent personality trait regression[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA, 2017:55-63.
[55] WEI Xiushen, LUO Jianhao, WU Jianxin, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval[J]. IEEE transactions on image processing, 2017, 26(6):2868-2881.
[56] GüRPINAR F, KAYA H, SALAH A A. Combining deep facial and ambient features for first impression estimation[C]//Proceedings of the European conference on computer vision. Amsterdam, The Netherlands, 2016:372-385.
[57] BEYAN C, ZUNINO A, SHAHID M, et al. Personality traits classification using deep visual activity-based nonverbal features of key-dynamic images[J]. IEEE transactions on affective computing, 2019:1?1.
[58] ATREY P K, HOSSAIN M A, EL SADDIK A, et al. Multimodal fusion for multimedia analysis:a survey[J]. Multimedia systems, 2010, 16(6):345-379.
[59] ZENG Zhihong, PANTIC M, ROISMAN G I, et al. A survey of affect recognition methods:audio, visual, and spontaneous expressions[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 31(1):39-58.
[60] Gü?LüTüRK Y, Gü?Lü U, VAN GERVEN M A, et al. Deep impression:audiovisual deep residual networks for multimodal apparent personality trait recognition[C]//Proceedings of the European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:349-358.
[61] SUBRAMANIAM A, PATEL V, MISHRA A, et al. Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features[C]//Proceedings of the European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:337-348.
[62] WEI Xiushen, ZHANG Chenlin, ZHANG Hao, et al. Deep bimodal regression of apparent personality traits from short video sequences[J]. IEEE transactions on affective computing, 2018, 9(3):303-315.
[63] Gü?LüTüRK Y, Gü?Lü U, BARó X, et al. Multimodal first impression analysis with deep residual networks[J]. IEEE transactions on affective computing, 2018, 9(3):316-329.
[64] ?ELIKTUTAN O, GUNES H. Automatic prediction of impressions in time and across varying context:personality, attractiveness and likeability[J]. IEEE transactions on affective computing, 2017, 8(1):29-42.
[65] GORBOVA J, LüSI I, LITVIN A, et al. Automated screening of job candidate based on multimodal video processing[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA, 2017:29-35.
[66] GORBOVA J, AVOTS E, LüSI I, et al. Integrating vision and language for first-impression personality analysis[J]. IEEE MultiMedia, 2018, 25(2):24-33.
[67] ZHANG Chenlin, ZHANG Hao, WEI Xiushen, et al. Deep bimodal regression for apparent personality analysis[C]//Proceedings of European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:311-324.
[68] SARKAR C, BHATIA S, AGARWAL A, et al. Feature analysis for computational personality recognition using youtube personality data set[C]//Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition. Orlando, USA, 2014:11-14.
[69] GüRPINAR F, KAYA H, SALAH A A. Multimodal fusion of audio, scene, and face features for first impression estimation[C]//Proceedings of 201623rd International Conference on Pattern Recognition. Cancun, Mexico, 2016:43-48.
[70] PRINCIPI R D P, PALMERO C, JUNIOR J C, et al. On the effect of observed subject biases in apparent personality analysis from audio-visual signals[J]. IEEE transactions on affective computing, 2019:1?1.
[71] KAMPMAN O, BAREZI E J, BERTERO D, et al. Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction[C]//Proceedings of 56th Annual Meeting of the Association for Computational Linguistics. Melbourne,Australia, 2018:606?611.

备注/Memo

收稿日期:2021-01-28。
基金项目:国家自然科学基金项目（61976149）；浙江省自然科学基金项目（LZ20F020002）
作者简介:赵小明，教授，主要研究方向为音频和图像处理、机器学习和模式识别;唐志伟，硕士研究生，主要研究方向为人格计算和模式识别;张石清，教授，博士，主要研究方向为情感计算和模式识别。发表学术论文40余篇
通讯作者:赵小明.E-mail:tzxyzxm@163.com

更新日期/Last Update: 2021-04-25

面向听视觉信息的多模态人格识别研究进展 PDF下载HTML

备注/Memo

面向听视觉信息的多模态人格识别研究进展

PDF下载 HTML