[1]WANG Jianzong,ZHANG Xulong,JIANG Guilin,et al.Research on audio model generation technology based on a hierarchical federated framework[J].CAAI Transactions on Intelligent Systems,2024,19(5):1331-1339.[doi:10.11992/tis.202306054]
Copy

Research on audio model generation technology based on a hierarchical federated framework

References:
[1] LIU Pengfei, YUAN Weizhe, FU Jinlan, et al. Pretrain, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM computing surveys, 2023, 55(9): 1-35.
[2] TRUMMER I. From BERT to GPT-3 codex[J]. Proceedings of the VLDB endowment, 2022, 15(12): 3770-3773.
[3] GHOSAL D, MAJUMDER N, MEHRISH A, et al. Text-to-audio generation using instruction-tuned LLM and latent diffusion model[EB/OL]. (2023–04–24)[2023–06–30]. http://arxiv.org/abs/2304.13731v2.
[4] HSU W N, BOLTE B, TSAI Y H H, et al. HuBERT: self-supervised speech representation learning by masked prediction of hidden units[J]. IEEE/ACM transactions on audio, speech, and language processing, 2021, 29: 3451-3460.
[5] ZEGHIDOUR N, LUEBS A, OMRAN A, et al. SoundStream: An end-to-end neural audio codec[J]. IEEE/ACM transactions on audio, speech, and language processing, 2021, 30: 495-507.
[6] HAYASHI T, WATANABE S. DiscreTalk: text-to-speech as a machine translation problem[EB/OL]. (2020–05–12)[2023–06–30]. http://arxiv.org/abs/2005.05525v1.
[7] BORSOS Z, MARINIER R, VINCENT D, et al. Audiolm: a language modeling approach to audio generation[EB/OL]. (2022–09–07)[2023–06–30]. https://arxiv.org/abs/2209.03143.
[8] AGOSTINELLI A, DENK T I, BORSOS Z, et al. Musiclm: generating music from text. [EB/OL]. (2023–01–26)[2023–06–30]. https://arxiv.org/abs/2301.11325.
[9] NGUYEN T A, KHARITONOV E, COPET J, et al. Generative spoken dialogue language modeling[J]. Transactions of the association for computational linguistics, 2023, 11: 250-266.
[10] CUI Xiaodong, LU Songtao, KINGSBURY B. Federated acoustic modeling for automatic speech recognition[C]//ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Toronto: IEEE, 2021: 6748–6752.
[11] HONG Zhenhou, WANG Jianzong, QU Xiaoyang, et al. Federated learning with dynamic transformer for text to speech[C]//Interspeech 2021. Brno: ISCA, 2021: 3590–3594.
[12] WU Yusong, CHEN Ke, ZHANG Tianyu, et al. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation[EB/OL]. (2022–11–12)[2023–06–30]. http://arxiv.org/abs/2211.06687v4.
[13] WU Shangda, YU Dingyao, TAN Xu, et al. CLaMP: contrastive language-music pre-training for cross-modal symbolic music information retrieval[EB/OL]. (2023–04–21)[2023–06–30]. http://arxiv.org/abs/2304.11029v4.
[14] WU Junru, LIANG Yi, HAN Feng, et al. Scaling multimodal pre-training via cross-modality gradient harmonization[J]. Advances in neural information processing systems, 2022, 35: 36161-36173.
[15] WANG Chengyi, CHEN Sanyuan, WU Yu, et al. Neural codec language models are zero-shot text to speech synthesizers[EB/OL]. (2023–01–05)[2023–06–30]. http://arxiv.org/abs/2301.02111v1.
[16] 谢旭康, 陈戈, 孙俊, 等. TCN-Transformer-CTC的端到端语音识别[J]. 计算机应用研究, 2022, 39(3): 699-703.
XIE Xukang, CHEN Ge, SUN Jun, et al. TCN-Transformer-CTC for end-to-end speech recognition[J]. Application research of computers, 2022, 39(3): 699-703.
[17] 解元, 邹涛, 孙为军, 等. 面向高混响环境的欠定卷积盲源分离算法[J]. 通信学报, 2023, 44(2): 82-93.
XIE Yuan, ZOU Tao, SUN Weijun, et al. Algorithm of underdetermined convolutive blind source separation for high reverberation environment[J]. Journal on communications, 2023, 44(2): 82-93.
[18] 方昕, 黄泽鑫, 张聿晗, 等. 基于时域波形的半监督端到端虚假语音检测方法[J]. 计算机应用, 2023, 43(1): 227-231.
FANG Xin, HUANG Zexin, ZHANG Yuhan, et al. Semi-supervised end-to-end fake speech detection method based on time-domain waveforms[J]. Journal of computer applications, 2023, 43(1): 227-231.
[19] 钟佳淋, 吴亚辉, 邓苏, 等. 基于改进NSGA-Ⅲ的多目标联邦学习进化算法[J]. 计算机科学, 2023, 50(4): 333-342.
ZHONG Jialin, WU Yahui, DENG Su, et al. Multi-objective federated learning evolutionary algorithm based on improved NSGA-Ⅲ[J]. Computer science, 2023, 50(4): 333-342.
[20] 陈洋, 廖灿辉, 张锟, 等. 基于自监督对比学习的信号调制识别算法[J]. 系统工程与电子技术, 2023, 45(4): 1200-1206.
CHEN Yang, LIAO Canhui, ZHANG Kun, et al. A signal modulation indentification algorithm based on self-supervised contrast learning[J]. Systems engineering and electronics, 2023, 45(4): 1200-1206.
[21] 罗贤昌, 薛吟兴. 基于BERT的提示学习实现软件需求精确分类[J]. 信息技术与网络安全, 2022, 41(2): 39-45.
LUO Xianchang, XUE Yinxing. Accurately classify software requirements using prompt learning on BERT[J]. Information technology and network security, 2022, 41(2): 39-45.
[22] WANG Chengyi, WU Yu, QIAN Yao, et al. UniSpeech: unified speech representation learning with labeled and unlabeled data[EB/OL]. (2021–01–19)[2023–06–30]. http://arxiv.org/abs/2101.07597v2.
[23] TAN Yue, LONG Guodong, MA Jie, et al. Federated learning from pre-trained models: a contrastive learning approach[EB/OL]. (2022–09–21)[2023–06–30]. http://arxiv.org/abs/2209.10083v1.
[24] KEITH I. The lj speech dataset[EB/OL]. [2023–06–30]. https://keithito.com/LJ- Speech-Dataset/.
[25] QIAN Kaizhi, ZHANG Yang, CHANG Shiyu, et al. Autovc: Zero-shot voice style transfer with only autoencoder loss[C]//36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 5210–5219.
[26] KANEKO T, KAMEOKA H, TANAKA K, et al. CycleGAN-VC3: examining and improving CycleGAN-VCs for mel-spectrogram conversion[EB/OL]. (2020–10–22)[2023–06–30]. http://arxiv.org/abs/2010.11672v1.
[27] QIAN Kaizhi, ZHANG Yang, CHANG Shiyu, et al. Unsupervised speech decomposition via triple information bottleneck[C]//Proceedings of the 37th International Conference on Machine Learning. Virtual: ACM, 2020: 7836–7846.
[28] SHEN Kai, JU Zeqian, TAN Xu, et al. NaturalSpeech 2: latent diffusion models are natural and zero-shot speech and singing synthesizers[EB/OL]. (2023–04–18)[2023–06–30]. http://arxiv.org/abs/2304.09116v3.
Similar References:

Memo

-

Last Update: 2024-09-05

Copyright © CAAI Transactions on Intelligent Systems