[1]WANG Jianzong,ZHANG Xulong,JIANG Guilin,et al.Research on audio model generation technology based on a hierarchical federated framework[J].CAAI Transactions on Intelligent Systems,2024,19(5):1331-1339.[doi:10.11992/tis.202306054]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 5
Page number:
1331-1339
Column:
吴文俊人工智能科学技术奖论坛
Public date:
2024-09-05
- Title:
-
Research on audio model generation technology based on a hierarchical federated framework
- Author(s):
-
WANG Jianzong1; ZHANG Xulong1; JIANG Guilin2; CHENG Ning1; XIAO Jing1
-
1. Ping An Technology (Shenzhen) Co., Ltd., Shenzhen 518046, China;
2. Hunan Chasing Financial Holdings Co., Ltd., Changsha 410035, China
-
- Keywords:
-
audio model; federated learning framework; audio representation learning; data heterogeneity; privacy protection; contrastive learning; prompt learning; model compression
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202306054
- Abstract:
-
This study focuses on the development of next-generation audio generation techniques, specifically through the construction of a federated audio model training framework. The goal is to enable efficient and robust audio representation learning on data massive scale, providing high-performance solutions for various downstream audio tasks. The key scientific challenges addressed in this research and their corresponding methods include the following: 1) Proposing a federated learning framework suitable for audio models to address issues such as data heterogeneity, communication efficiency, and privacy protection. 2) Introducing a pretraining method based on contrastive learning, utilizing <audio, text description> data pairs to learn semantic features and enhance the model’s generalization and diversification capabilities. 3) Presenting a fine-tuning method grounded in prompt learning, utilizing a small amount of annotated data to improve the model’s adaptability and customization capabilities. 4) Developing a distributed optimization algorithm to compress audio models so as to reduce model complexity and resource consumption, thereby improving deployment and operational efficiency. Through experimental evaluation in the downstream task of sound effect conversion, the proposed method achieved a score of 3.81 in terms of mean opinion score. The experimental results show that the proposed method achieves good performance in sound effect conversion tasks.