[1]黄鸿铿,李应.用Bark频谱投影识别低信噪比动物声音[J].智能系统学报,2018,13(04):610-618.[doi:10.11992/tis.201703008]
 HUANG Hongkeng,LI Ying.Identifying low-SNR animal sounds based on Bark spectral projection[J].CAAI Transactions on Intelligent Systems,2018,13(04):610-618.[doi:10.11992/tis.201703008]
点击复制

用Bark频谱投影识别低信噪比动物声音(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年04期
页码:
610-618
栏目:
出版日期:
2018-07-05

文章信息/Info

Title:
Identifying low-SNR animal sounds based on Bark spectral projection
作者:
黄鸿铿 李应
福州大学 数学与计算机科学学院, 福建 福州 350116
Author(s):
HUANG Hongkeng LI Ying
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
关键词:
声音信号自动识别小波包变换随机森林环境声音
Keywords:
sound signalautomatic recognitionwavelet packet transformrandom forestsenvironment sound
分类号:
TP391
DOI:
10.11992/tis.201703008
摘要:
复杂环境声影响低信噪比动物声音的自动识别。为解决这一问题,本文提出一种不同声场景下低信噪比动物声音识别的方法。该方法把声音信号进行Bark尺度的小波包分解,再使用分解系数生成重构信号的频谱,并对频谱进行投影生成Bark频谱投影特征,通过随机森林分类器实现低信噪比动物声音的识别。该文分别在流水声环境、公路环境、风声环境和嘈杂说话声环境下,以不同的信噪比,对40种动物声音进行识别实验。结果表明,结合短时谱估计法、Bark频谱投影特征和随机森林的方法对不同信噪比的各种环境声音中动物声音的平均识别率可以达到80.5%,且在-10 dB的情况下依然保持平均60%以上的识别率。
Abstract:
In this paper, we consider the influence of complex background environments on the automatic recognition of animal sounds with low signal-to-noise ratios (SNRs). We propose a method for identifying low-SNR animal sounds in various background environments. In this method, the sound signal is decomposed by a Bark scale wavelet packet, and the decomposition coefficient is used to generate a spectrogram of the reconstructed signal, which is projected onto a spectrogram to generate a Bark spectral projection (BSP) feature. Random forests (RF) are then used to identify animal sounds with low SNRs. We classified 40 common animal sounds with different SNRs in noise environments such as flowing water, highway, wind, and loud speech. The experimental results show that by combining the proposed methods of short-time spectrum estimation, BSP, and RF in various background environments with different SNRs, the mean identification rate for animal noises can reach 80.5%. In addition, a recognition rate above 60% can be maintained even at –10 dB.

参考文献/References:

[1] MITROVIC D, ZEPPELZAUER M, BREITENEDER C. Discrimination and retrieval of animal sounds[C]//Proceedings of the 12th International Multi-Media Modelling Conference Proceedings. Beijing, China:IEEE, 2006:339-343
[2] JANCOVIC P, KÖKÜER M, ZAKERI M, et al. Bird species recognition using HMM-based unsupervised modelling of individual syllables with incorporated duration modelling[C]//Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China:IEEE, 2016:559-563.
[3] RAGHURAM M A, CHAVAN N R, BELUR R, et al. Bird classification based on their sound patterns[J]. International journal of speech technology, 2016, 19(4):791-804.
[4] BARDELI R. Similarity search in animal sound databases[J]. IEEE transactions on multimedia, 2009, 11(1):68-76.
[5] POTAMITIS I, NTALAMPIRAS S, JAHN O, et al. Automatic bird sound detection in long real-field recordings:applications and tools[J]. Applied acoustics, 2014, 80:1-9.
[6] ZHANG Xiaoxia, LI Ying. Adaptive energy detection for bird sound detection in complex environments[J]. Neurocomputing, 2015, 155:108-116.
[7] 魏静明, 李应. 利用抗噪纹理特征的快速鸟鸣声识别[J]. 电子学报, 2015, 43(1):185-190. WEI Jingming, LI Ying. Rapid bird sound recognition using anti-noise texture features[J]. Acta electronica sinica, 2015, 43(1):185-190.
[8] BREIMAN L. Random forests[J]. Machine learning, 2001, 45(1):5-32.
[9] FENG Zuren, ZHOU Qing, ZHANG Jun, et al. A target guided subband filter for acoustic event detection in noisy environments using wavelet packets[J]. IEEE/ACM transactions on audio, speech, and language processing, 2015, 23(2):361-372.
[10] WANG Jiacheng, LIN Changhong, CHEN Bowei, et al. Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation[J]. IEEE transactions on automation science and engineering, 2014, 11(2):607-613.
[11] DENNIS J, TRAN H D, LI Haizhou. Spectrogram image feature for sound event classification in mismatched conditions[J]. IEEE signal processing letters, 2011, 18(2):130-133.
[12] DENNIS J, TRAN H D, CHNG E S. Image feature representation of the subband power distribution for robust sound event classification[J]. IEEE transactions on audio, speech, and language processing, 2013, 21(2):367-377.
[13] LI Ying, WU Zhibin. Animal sound recognition based on double feature of spectrogram in real environment[C]//Proceedings of 2015 IEEE International Conference on Wireless Communications and Signal Processing. Nanjing, China:IEEE, 2015:1-5.
[14] LAINE A, FAN J. Texture classification by wavelet packet signatures[J]. IEEE Transactions on pattern analysis and machine intelligence, 1993, 15(11):1186-1191.
[15] KARMAKAR A, KUMAR A, PATNEY R K. Design of optimal wavelet packet trees based on auditory perception criterion[J]. IEEE signal processing letters, 2007, 14(4):240-243.
[16] Universitat Pompeu Fabra. Repository of sound under the creative commons license, Freesound.org[DB/OL].[2018-03-13]. http://www.freesound.org.
[17] KIM H G, MOREAU N, SIKORA T. Audio classification based on mpeg-7 spectral basis representations[J]. IEEE transactions on circuits and systems for video technology, 2004, 14(5):716-725.
[18] DENG Shiwen, HAN Jiqing, ZHANG Chaozhu, et al. Robust minimum statistics project coefficients feature for acoustic environment recognition[C]//Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2015:8232-8236.
[19] CHANG Kangming, LIU S H. Gaussian noise filtering from ECG by Wiener filter and ensemble empirical mode decomposition[J]. Journal of signal processing systems, 2011, 64(2):249-264.
[20] PALIWAL K, WÓJCICKI K, SCHWERIN B. Single-channel speech enhancement using spectral subtraction in the short-time modulation domain[J]. Speech communication, 2010, 52(5):450-475.
[21] 刘翔, 高勇. 一种引入延迟的语音增强算法[J]. 现代电子技术, 2011, 34(5):85-88. LIU Xiang, GAO Yong. Speech enhancement algorithm with leading-in delay[J]. Modern electronics technique, 2011, 34(5):85-88.

备注/Memo

备注/Memo:
收稿日期:2017-03-08。
基金项目:国家自然科学基金项目(61075022);福建省自然科学基金项目(2018J01793).
作者简介:黄鸿铿,男,1993年生,硕士研究生,主要研究方向为声音事件检测、信息安全;李应,男,1964年生,教授,博士,主要研究方向为多媒体数据检索、声音事件检测、信息安全。获授权发明专利10项。发表学术论文20余篇。
通讯作者:李应.E-mail:fj_liying@fzu.edu.cn.
更新日期/Last Update: 2018-08-25