[1]WANG Zhongmei,AO Wenxiu,LIU Jianhua,et al.An audio-visual multimodal balanced learning method based on adaptive gradient modulation[J].CAAI Transactions on Intelligent Systems,2025,20(5):1217-1226.[doi:10.11992/tis.202412009]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 5
Page number:
1217-1226
Column:
学术论文—自然语言处理与理解
Public date:
2025-09-05
- Title:
-
An audio-visual multimodal balanced learning method based on adaptive gradient modulation
- Author(s):
-
WANG Zhongmei1; AO Wenxiu1; LIU Jianhua1; JIA Lin1; ZHANG Changfan1; PENG Shen’ao1; LIU Jinping2
-
1. School of Railway Transportation, Hunan University of Technology, Zhuzhou 412007, China;
2. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
-
- Keywords:
-
balanced learning; multimodal learning; gradient modulation; adaptive learning; multimodal gradient balancing; learning rate; audio-visual multimodal; collaborative decision-making
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202412009
- Abstract:
-
To address the challenge in audio-visual multimodal learning, where differing learning rates across modalities cause one to dominate and suppress others, thereby weakening the multimodal collaborative decision-making process, a novel multimodal balanced learning method based on adaptive gradient modulation (AGM-CR) is proposed. This method employs modulation coefficients that dynamically adjust the learning rates of individual modalities according to their gradient variations. Additionally, it incorporates a gradient balancing strategy that integrates modality-specific gradient losses into the total loss as a regularization term. Together, these mechanisms reduce gradient disparities, fostering a more balanced and effective learning process. Experimental evaluation on the CREMA-D and RAVDESS datasets demonstrates that AGM-CR improves classification accuracy by 2.5 and 3.3 percentage points, respectively. Furthermore, AGM-CR stabilizes training by minimizing gradient fluctuations across iterations, which accelerates convergence. Importantly, AGM-CR functions as a plug-and-play approach, enhancing flexibility and generalizability compared with existing balancing approaches.