[1]李一熙,汪镭,薛愈,等.基于短时傅里叶变换的智能音乐生成系统分析与研究[J].智能系统学报,2025,20(3):750-760.[doi:10.11992/tis.202405043]
LI Yixi,WANG Lei,XUE Yu,et al.Research on window function analysis in STFT-based intelligent music generation system[J].CAAI Transactions on Intelligent Systems,2025,20(3):750-760.[doi:10.11992/tis.202405043]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第3期
页码:
750-760
栏目:
人工智能校长论坛
出版日期:
2025-05-05
- Title:
-
Research on window function analysis in STFT-based intelligent music generation system
- 作者:
-
李一熙1, 汪镭1, 薛愈2, 吴启迪1
-
1. 同济大学 电子与信息工程学院, 上海 201804;
2. 泰州中学, 江苏 泰州 225300
- Author(s):
-
LI Yixi1, WANG Lei1, XUE Yu2, WU Qidi1
-
1. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China;
2. Taizhou High School, Taizhou 225300, China
-
- 关键词:
-
短时傅里叶变换; 人工智能; 音乐生成; 窗函数; 梅尔倒谱系数; 频谱泄漏; 主瓣增益; 混合函数
- Keywords:
-
STFT; artificial intelligence; music generation; window function; MFCC; spectrum leakage; main-lobe gain; functions of mixing
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202405043
- 摘要:
-
在基于短时傅里叶变换(short-time Fourier transform, STFT)的智能音乐生成系统中,引入梅尔倒谱系数(Mel frequency cepstrum coefficient, MFCC)作为输入特征,并对STFT的损失函数进行优化设计,以提升音乐生成的质量。在对音符输入信号进行短时傅里叶变换时,需要对时域信号进行截断并添加窗函数,对信号添加时域窗等效于在频域信号中进行卷积。时域信号在截断过程中存在频谱分析误差,使得频谱以实际频率值为中心,以窗函数频谱波形的形状向两侧扩散,从而产生频谱泄漏。不同窗函数的选择对最终生成音乐的品质具有显著影响。为此,提出一种基于能量校正因子、频域最大副瓣和主瓣增益的窗函数分析与选择方法,并开发相应脚本工具,从而完成基于符号域音乐的混合窗函数设计。实验结果表明,混合窗函数在不同的MIDI(musical instrument digital interface)数据集上均可有效减少频谱泄漏对信号截断的影响,具有很好的适应性和灵活性,从而更好地作用于基于STFT的智能音乐生成系统中。
- Abstract:
-
In an intelligent music generation system based on short-time Fourier transform (STFT), the introduction of Mel frequency cepstral coefficients as input features, coupled with an optimized design of the STFT loss function, enhances the quality of music generation. During the STFT of the note input signal, the time-domain signal needs to be truncated, and the window functions must be added. Adding a time-domain window to the signal is equivalent to performing convolution in the frequency domain. Truncating the time-domain signal introduces spectral analysis errors, causing the spectrum to spread on both sides centered around the actual frequency value in the shape of the window function’s spectral waveform, leading to spectral leakage. The selection of different window functions has a significant impact on the quality of the final generated music. On this basis, a window function analysis and selection method based on the energy correction factor, the maximum sidelobe, and the main lobe gain in the frequency domain is proposed, and the corresponding script tools are developed to complete the design of a mixed window function based on music in the symbol domain. Experimental results show that the mixed window function can effectively reduce the impact of spectral leakage on the signal truncation on different MIDI datasets, and has good adaptability and flexibility, so as to better act on the intelligent music generation system based on STFT.
备注/Memo
收稿日期:2024-5-31。
作者简介:李一熙,硕士研究生,主要研究方向为深度学习和智能音乐生成。E-mail:1941702@tongji.edu.cn。;汪镭,教授,博士生导师,曾任上海市科协委员会委员,曾兼任国际电气与电子工程师学会(IEEE)上海分会副主席主要研究方向为智能控制与智能计算。合作出版专著和译著8本,发表学术论文100余篇。E-mail:wanglei@tongji.edu.cn。;吴启迪,教授,博士生导师,曾任同济大学校长、教育部副部长,主要研究方向为控制理论与应用、计算机集成制造系统及智能自动化理论与应用。荣获国家级、教育部、上海市等科技进步奖多项。出版学术专著10余部,发表学术论文200余篇。E-mail:wuqidi@moe.edu.cn。
通讯作者:汪镭. E-mail:wanglei@tongji.edu.cn
更新日期/Last Update:
1900-01-01