<-上一篇/Previous Article 下一篇/Next Article->

[1]李一熙,汪镭,薛愈,等.基于短时傅里叶变换的智能音乐生成系统分析与研究[J].智能系统学报,2025,20(3):750-760.[doi:10.11992/tis.202405043]
　LI Yixi,WANG Lei,XUE Yu,et al.Research on window function analysis in STFT-based intelligent music generation system[J].CAAI Transactions on Intelligent Systems,2025,20(3):750-760.[doi:10.11992/tis.202405043]

点击复制

基于短时傅里叶变换的智能音乐生成系统分析与研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第3期页码: 750-760 栏目: 人工智能校长论坛出版日期: 2025-05-05

Title:: Research on window function analysis in STFT-based intelligent music generation system

作者:: 李一熙¹, 汪镭¹, 薛愈², 吴启迪¹; 1. 同济大学电子与信息工程学院, 上海 201804;
2. 泰州中学, 江苏泰州 225300

Author(s):: LI Yixi¹, WANG Lei¹, XUE Yu², WU Qidi¹; 1. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China;
2. Taizhou High School, Taizhou 225300, China

关键词:: 短时傅里叶变换; 人工智能; 音乐生成; 窗函数; 梅尔倒谱系数; 频谱泄漏; 主瓣增益; 混合函数

Keywords:: STFT; artificial intelligence; music generation; window function; MFCC; spectrum leakage; main-lobe gain; functions of mixing

分类号:: TP391

DOI:: 10.11992/tis.202405043

摘要:: 在基于短时傅里叶变换(short-time Fourier transform, STFT)的智能音乐生成系统中，引入梅尔倒谱系数(Mel frequency cepstrum coefficient, MFCC)作为输入特征，并对STFT的损失函数进行优化设计，以提升音乐生成的质量。在对音符输入信号进行短时傅里叶变换时，需要对时域信号进行截断并添加窗函数，对信号添加时域窗等效于在频域信号中进行卷积。时域信号在截断过程中存在频谱分析误差，使得频谱以实际频率值为中心，以窗函数频谱波形的形状向两侧扩散，从而产生频谱泄漏。不同窗函数的选择对最终生成音乐的品质具有显著影响。为此，提出一种基于能量校正因子、频域最大副瓣和主瓣增益的窗函数分析与选择方法，并开发相应脚本工具，从而完成基于符号域音乐的混合窗函数设计。实验结果表明，混合窗函数在不同的MIDI(musical instrument digital interface)数据集上均可有效减少频谱泄漏对信号截断的影响，具有很好的适应性和灵活性，从而更好地作用于基于STFT的智能音乐生成系统中。

Abstract:: In an intelligent music generation system based on short-time Fourier transform (STFT), the introduction of Mel frequency cepstral coefficients as input features, coupled with an optimized design of the STFT loss function, enhances the quality of music generation. During the STFT of the note input signal, the time-domain signal needs to be truncated, and the window functions must be added. Adding a time-domain window to the signal is equivalent to performing convolution in the frequency domain. Truncating the time-domain signal introduces spectral analysis errors, causing the spectrum to spread on both sides centered around the actual frequency value in the shape of the window function’s spectral waveform, leading to spectral leakage. The selection of different window functions has a significant impact on the quality of the final generated music. On this basis, a window function analysis and selection method based on the energy correction factor, the maximum sidelobe, and the main lobe gain in the frequency domain is proposed, and the corresponding script tools are developed to complete the design of a mixed window function based on music in the symbol domain. Experimental results show that the mixed window function can effectively reduce the impact of spectral leakage on the signal truncation on different MIDI datasets, and has good adaptability and flexibility, so as to better act on the intelligent music generation system based on STFT.

参考文献/References:: [1] 王一权, 任之初, 邵曦, 等. 高精度复调乐音识别方法[J]. 计算机应用, 2023, 43(S2): 244-249.
WANG Yiquan, REN Zhichu, SHAO Xi, et al. High precision polyphonic music recognition method[J]. Journal of computer applications, 2023, 43(S2): 244-249.
[2] 李飞龙, 和伟辉, 刘立芳, 等. 结合CWT和LightweightNet的滚动轴承实时故障诊断方法[J]. 智能系统学报, 2023, 18(3): 496-505.
LI Feilong, HE Weihui, LIU Lifang, et al. Real time fault diagnosis method of rolling bearing based on CWT and LightweightNet[J]. CAAI transactions on intelligent systems, 2023, 18(3): 496-505.
[3] 杜婷婷. 超声成像最小方差自适应波束形成改进算法研究[D]. 重庆: 重庆大学, 2020.
DU Tingting. Research on improved algorithm of minimum variance adaptive beamforming for ultrasonic imaging[D]. Chongqing: Chongqing University, 2020.
[4] 卢恋, 任伟新, 王世东. 基于Kaiser窗的分数阶Fourier变换与时频分析[J]. 振动工程学报, 2023, 36(3): 698-705.
LU Lian, REN Weixin, WANG Shidong. Fractional Fourier transform based Kaiser window and time-frequency analysis[J]. Journal of vibration engineering, 2023, 36(3): 698-705.
[5] SIGTIA S, BENETOS E, DIXON S. An end-to-end neural network for polyphonic piano music transcription[J]. IEEE/ACM transactions on audio, speech, and language processing, 2016, 24(5): 927-939.
[6] PELCHAT N, GELOWITZ C M. Neural network music genre classification[J]. Canadian journal of electrical and computer engineering, 2020, 43(3): 170-173.
[7] MARAFIOTI A, HOLIGHAUS N, PERRAUDIN N, et al. Adversarial generation of time-frequency features with application in audio synthesis[EB/OL]. (2019-05-16)[2024-01-01]. https://arxiv.org/abs/1902.04072v2.
[8] DECORSIèRE R J B, S?NDERGAARD P L, MACDONALD E N, et al. Inversion of auditory spectrograms, traditional spectrograms, and other envelope representations[J]. IEEE/ACM transactions on audio, speech, and language processing, 2015, 23(1): 46-56.
[9] 刘汾港, 马建芬, 张朝霞. 基于离散余弦变换与Transformer的语音增强[J]. 计算机工程与设计, 2023, 44(6): 1893-1898.
LIU Fengang, MA Jianfen, ZHANG Zhaoxia. Speech enhancement based on discrete cosine transform and Transformer[J]. Computer engineering and design, 2023, 44(6): 1893-1898.
[10] ALLEN J B, RABINER L R. A unified approach to short-time Fourier analysis and synthesis[J]. Proceedings of the IEEE, 1977, 65(11): 1558-1564.
[11] 纪鹏威, 全海燕. 基于双生成器与频域判别器GAN语音增强算法[J]. 云南大学学报(自然科学版), 2024, 46(5): 871-880.
JI Pengwei, QUAN Haiyan. Speech enhancement algorithm based on dual generator and frequency domain discriminator GAN[J]. Journal of Yunnan University (natural sciences edition), 2024, 46(5): 871-880.
[12] 孙奥运, 温培旭, 邵淮先, 等. 高精度音频Sigma-Delta调制器综述[J]. 电子与信息学报, 2024, 46(5): 1874-1887.
SUN Aoyun, WEN Peixu, SHAO Huaixian, et al. A review of high-resolution audio Sigma-Delta modulator[J]. Journal of electronics & information technology, 2024, 46(5): 1874-1887.
[13] 李磊, 朱永同, 杨琦, 等. 基于多任务学习与注意力机制的多层次音频特征情感识别研究[J]. 智能计算机与应用, 2024, 14(1): 85-94, 101.
LI Lei, ZHU Yongtong, YANG Qi, et al. Multilevel emotion recognition of audio features based on multitask learning and attention mechanism[J]. Intelligent computer and applications, 2024, 14(1): 85-94, 101.
[14] 何宇. 电子音乐特征分析和流派分类的研究[D]. 成都: 成都理工大学, 2020.
HE Yu. Research on the characteristic analysis and genre classification of electronic music[D]. Chengdu: Chengdu University of Technology, 2020.
[15] 马丹, 吴跃. 基于生成对抗网络的智能音乐制作综述[J]. 计算机应用研究, 2021, 38(3): 641-646.
MA Dan, WU Yue. Survey of intelligent music creation based on GAN[J]. Application research of computers, 2021, 38(3): 641-646.
[16] 刘杨, 杨飞然, 梁兆杰, 等. 基于卡尔曼滤波的STFT域回声抵消算法[J]. 声学技术, 2022, 41(5): 757-762.
LIU Yang, YANG Feiran, LIANG Zhaojie, et al. Kalman filter based acoustic echo cancellation in the STFT domain[J]. Technical acoustics, 2022, 41(5): 757-762.
[17] WANG Lei, ZHAO Ziyi, LIU Hanwei, et al. A review of intelligent music generation systems[J]. Neural computing and applications, 2024, 36(12): 6381-6401.
[18] LI Fanfan. Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence[J]. The journal of supercomputing, 2024, 80(5): 6068-6092.
[19] 陶雨昂. MFCC特征训练技术在声纹识别中的应用[J]. 集成电路应用, 2024, 41(2): 386-387.
TAO Yuang. Application of MFCC feature training technology in voiceprint recognition[J]. Application of IC, 2024, 41(2): 386-387.
[20] 黄喜阳, 杜庆治, 龙华, 等. 基于MFCC特征融合的语音情感识别算法[J]. 陕西理工大学学报(自然科学版), 2023, 39(4): 17-25.
HUANG Xiyang, DU Qingzhi, LONG Hua, et al. Speech emotion recognition algorithm based on MFCC feature fusion[J]. Journal of Shaanxi University of Technology (natural science edition), 2023, 39(4): 17-25.
[21] 赵扬青, 彭智才, 蒋雨涵, 等. 音频的梅尔频率倒谱系数特征抽取过程[J]. 信息技术与信息化, 2023(1): 104-111.
ZHAO Yangqing, PENG Zhicai, JIANG Yuhan, et al. Mel-frequency cepstral coefficients feature extraction process of audio[J]. Information technology and informatization, 2023(1): 104-111.
[22] 张名武, 李舜酩. 数字信号处理中加窗问题的综述[J]. 工业控制计算机, 2022, 35(6): 119-121, 125.
ZHANG Mingwu, LI Shunming. A survey of windowing in digital signal processing[J]. Industrial control computer, 2022, 35(6): 119-121, 125.
[23] 周海清, 丁岐鹃. 辐射源监测系统频谱检测窗函数选取研究[J]. 长江信息通信, 2021, 34(6): 60-62.
ZHOU Haiqing, DING Qijuan. Study on the selection of spectrum detection window function of radiation source monitoring system[J]. Changjiang information & communications, 2021, 34(6): 60-62.
[24] SINDAL M D, RATNA B. A tale of two leaks-Pachychoroid spectrum[J]. Indian journal of ophthalmology - case reports, 2021, 1(2): 210-211.
[25] NING Fangli, CHENG Zhanghong, MENG Di, et al. Enhanced spectrum convolutional neural architecture: an intelligent leak detection method for gas pipeline[J]. Process safety and environmental protection, 2021, 146: 726-735.
[26] ANTUNES F, FELIX L B. Comparison of signal preprocessing techniques for avoiding spectral leakage in auditory steady-state responses[J]. Research on biomedical engineering, 2019, 35(3): 251-256.
[27] 张胜利, 李伟. 基于窗函数与FFT算法的信号谐波分析[J]. 工业控制计算机, 2019, 32(5): 35-36, 38.
ZHANG Shengli, LI Wei. Signal harmonic analysis based on window functions and FFT algorithms[J]. Industrial control computer, 2019, 32(5): 35-36, 38.
[28] KUWALEK P. The problem of “spectrum leakage” in the measurement of harmonics[J]. ITM web of conferences, 2019, 28: 01044.
[29] 宋立新, 孙东梓, 安佳星, 等. 离散傅里叶变换泄漏及其加窗抑制仿真实验设计[J]. 实验室研究与探索, 2018, 37(7): 106-109.
SONG Lixin, SUN Dongzi, AN Jiaxing, et al. Simulation experiment design of DFT leakage and its windowing suppression[J]. Research and exploration in laboratory, 2018, 37(7): 106-109.
[30] 吴君钦, 王迎福. 一种改进窗函数的低时延语音增强算法[J]. 计算机仿真, 2022, 39(2): 203-211.
WU Junqin, WANG Yingfu. A low-latency speech enhancement algorithm based on improved window function[J]. Computer simulation, 2022, 39(2): 203-211.
[31] 赵晨, 冯丹平, 杨明明, 等. 平面近场声全息中指数滤波器窗函数设计优化[J]. 声学技术, 2021, 40(5): 723-727.
ZHAO Chen, FENG Danping, YANG Mingming, et al. Optimization of window function design of exponential filter in planar near-field acoustic holography[J]. Technical acoustics, 2021, 40(5): 723-727.
[32] 柏果, 程郁凡, 唐万斌. 基于两阶段加窗插值的多音信号频率估计算法[J]. 电子科技大学学报, 2021, 50(5): 682-688.
BAI Guo, CHENG Yufan, TANG Wanbin. Frequency estimation of multi-tone by two-stage windowed interpolation[J]. Journal of University of Electronic Science and Technology of China, 2021, 50(5): 682-688.

相似文献/References:: [1]李德毅.网络时代人工智能研究与发展[J].智能系统学报,2009,4(1):1.
　LI De-yi.AI research and development in the network age[J].CAAI Transactions on Intelligent Systems,2009,4():1.
[2]赵克勤.二元联系数A+Bi的理论基础与基本算法及在人工智能中的应用[J].智能系统学报,2008,3(6):476.
　ZHAO Ke-qin.The theoretical basis and basic algorithm of binary connection A+Bi and its application in AI[J].CAAI Transactions on Intelligent Systems,2008,3():476.
[3]徐玉如,庞永杰,甘?? 永,等.智能水下机器人技术展望[J].智能系统学报,2006,1(1):9.
　XU Yu-ru,PANG Yong-jie,GAN Yong,et al.AUV—state-of-the-art and prospect[J].CAAI Transactions on Intelligent Systems,2006,1():9.
[4]王志良.人工心理与人工情感[J].智能系统学报,2006,1(1):38.
　WANG Zhi-liang.Artificial psychology and artificial emotion[J].CAAI Transactions on Intelligent Systems,2006,1():38.
[5]赵克勤.集对分析的不确定性系统理论在AI中的应用[J].智能系统学报,2006,1(2):16.
　ZHAO Ke-qin.The application of uncertainty systems theory of set pair analysis (SPU)in the artificial intelligence[J].CAAI Transactions on Intelligent Systems,2006,1():16.
[6]秦裕林,朱新民,朱? 丹.Herbert Simon在最后几年里的两个研究方向[J].智能系统学报,2006,1(2):11.
　QIN Yu-lin,ZHU Xin-min,ZHU Dan.Herbert Simons two research directions in his lost years[J].CAAI Transactions on Intelligent Systems,2006,1():11.
[7]谷文祥,李丽,李丹丹.规划识别的研究及其应用[J].智能系统学报,2007,2(1):1.
　GU Wen-xiang,LI Li,LI Dan-dan.Research and application of plan recognition[J].CAAI Transactions on Intelligent Systems,2007,2():1.
[8]杨春燕,蔡文.可拓信息-知识-智能形式化体系研究[J].智能系统学报,2007,2(3):8.
　YANG Chun-yan,CAI Wen.A formalized system of extension information-knowledge-intelligence[J].CAAI Transactions on Intelligent Systems,2007,2():8.
[9]赵克勤.SPA的同异反系统理论在人工智能研究中的应用[J].智能系统学报,2007,2(5):20.
　ZHAO Ke-qin.The application of SPAbased identicaldiscrepancycontrary system theory in artificial intelligence research[J].CAAI Transactions on Intelligent Systems,2007,2():20.
[10]王志良,杨?? 溢,杨?? 扬,等.一种周期时变马尔可夫室内位置预测模型[J].智能系统学报,2009,4(6):521.[doi:10.3969/j.issn.1673-4785.2009.06.009]
　WANG Zhi-liang,YANG Yi,YANG Yang,et al.A periodic time-varying Markov model for indoor location prediction[J].CAAI Transactions on Intelligent Systems,2009,4():521.[doi:10.3969/j.issn.1673-4785.2009.06.009]

备注/Memo

收稿日期:2024-5-31。
作者简介:李一熙，硕士研究生，主要研究方向为深度学习和智能音乐生成。E-mail：1941702@tongji.edu.cn。;汪镭，教授，博士生导师，曾任上海市科协委员会委员，曾兼任国际电气与电子工程师学会(IEEE)上海分会副主席主要研究方向为智能控制与智能计算。合作出版专著和译著8本，发表学术论文100余篇。E-mail：wanglei@tongji.edu.cn。;吴启迪，教授，博士生导师，曾任同济大学校长、教育部副部长，主要研究方向为控制理论与应用、计算机集成制造系统及智能自动化理论与应用。荣获国家级、教育部、上海市等科技进步奖多项。出版学术专著10余部，发表学术论文200余篇。E-mail：wuqidi@moe.edu.cn。
通讯作者:汪镭. E-mail：wanglei@tongji.edu.cn

更新日期/Last Update: 1900-01-01

基于短时傅里叶变换的智能音乐生成系统分析与研究 PDF下载HTML

备注/Memo

基于短时傅里叶变换的智能音乐生成系统分析与研究

PDF下载 HTML