[1]王忠美,敖文秀,刘建华,等.基于自适应梯度调制的音视频多模态平衡学习方法[J].智能系统学报,2025,20(5):1217-1226.[doi:10.11992/tis.202412009]
WANG Zhongmei,AO Wenxiu,LIU Jianhua,et al.An audio-visual multimodal balanced learning method based on adaptive gradient modulation[J].CAAI Transactions on Intelligent Systems,2025,20(5):1217-1226.[doi:10.11992/tis.202412009]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第5期
页码:
1217-1226
栏目:
学术论文—自然语言处理与理解
出版日期:
2025-09-05
- Title:
-
An audio-visual multimodal balanced learning method based on adaptive gradient modulation
- 作者:
-
王忠美1, 敖文秀1, 刘建华1, 贾林1, 张昌凡1, 彭深奥1, 刘金平2
-
1. 湖南工业大学 轨道交通学院, 湖南 株洲 412007;
2. 湖南师范大学 信息科学与工程学院, 湖南 长沙 410081
- Author(s):
-
WANG Zhongmei1, AO Wenxiu1, LIU Jianhua1, JIA Lin1, ZHANG Changfan1, PENG Shen’ao1, LIU Jinping2
-
1. School of Railway Transportation, Hunan University of Technology, Zhuzhou 412007, China;
2. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
-
- 关键词:
-
平衡学习; 多模态学习; 梯度调制; 自适应学习; 梯度均衡化; 学习速率; 音视频模态; 协同决策
- Keywords:
-
balanced learning; multimodal learning; gradient modulation; adaptive learning; multimodal gradient balancing; learning rate; audio-visual multimodal; collaborative decision-making
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202412009
- 摘要:
-
针对音视频多模态学习中因异质学习速率导致单一模态主导模型学习过程,抑制其他模态学习,进而削弱多模态协同决策效果的问题,提出一种基于自适应梯度调制的多模态平衡学习方法(adaptive gradient modulation based compensation and regularization, AGM-CR)。首先,根据模态间的学习梯度差异引入调制系数来自适应调整各模态的学习速率;然后,通过梯度均衡化策略,将单个模态的梯度损失作为正则项融入总损失来约束模态间梯度差异,进一步平衡各模态的学习过程;最后,实验结果表明在CREMA-D和RAVDESS数据集上,AGM-CR将分类准确率分别提高了2.5和3.3百分点,并在多次迭代中减小模型的梯度波动,表现出更高的训练稳定性和收敛速度。与现有的平衡方法相比,AGM-CR可即插即用,更具灵活性和通用性。
- Abstract:
-
To address the challenge in audio-visual multimodal learning, where differing learning rates across modalities cause one to dominate and suppress others, thereby weakening the multimodal collaborative decision-making process, a novel multimodal balanced learning method based on adaptive gradient modulation (AGM-CR) is proposed. This method employs modulation coefficients that dynamically adjust the learning rates of individual modalities according to their gradient variations. Additionally, it incorporates a gradient balancing strategy that integrates modality-specific gradient losses into the total loss as a regularization term. Together, these mechanisms reduce gradient disparities, fostering a more balanced and effective learning process. Experimental evaluation on the CREMA-D and RAVDESS datasets demonstrates that AGM-CR improves classification accuracy by 2.5 and 3.3 percentage points, respectively. Furthermore, AGM-CR stabilizes training by minimizing gradient fluctuations across iterations, which accelerates convergence. Importantly, AGM-CR functions as a plug-and-play approach, enhancing flexibility and generalizability compared with existing balancing approaches.
备注/Memo
收稿日期:2024-12-11。
基金项目:国家重点研发计划项目(2021YFF0501101);国家自然科学基金项目(52272347);国家自然科学基金青年基金项目(62106074).
作者简介:王忠美,讲师,电气与电子工程师协会(IEEE)会员,主要研究方向为人工智能、计算机视觉和遥感信息处理。E-mail:wangzhongmei@hut.edu.cn。;敖文秀,硕士研究生,主要研究方向为模态融合、多模态平衡学习。E-mail:m23081100020@stu.hut.edu.cn。;刘建华,教授,博士生导师,主要研究方向为轨道交通电传动控制与智能运维。主持国家自然科学基金项目2项、国家重点研发计划课题1项。E-mail:jhliu@hut.edu.cn。
通讯作者:王忠美. E-mail:wangzhongmei@hut.edu.cn
更新日期/Last Update:
2025-09-05