[1]黄鉴之,丁成诚,陶蔚,等.非光滑凸情形Adam型算法的最优个体收敛速率[J].智能系统学报,2020,15(6):1140-1146.[doi:10.11992/tis.202006046]
HUANG Jianzhi,DING Chengcheng,TAO Wei,et al.Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization[J].CAAI Transactions on Intelligent Systems,2020,15(6):1140-1146.[doi:10.11992/tis.202006046]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
15
期数:
2020年第6期
页码:
1140-1146
栏目:
学术论文—机器学习
出版日期:
2020-11-05
- Title:
-
Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization
- 作者:
-
黄鉴之1, 丁成诚1, 陶蔚2, 陶卿1
-
1. 中国人民解放军陆军炮兵防空兵学院 信息工程系, 安徽 合肥 230031;
2. 中国人民解放军陆军工程大学 指挥控制工程学院, 江苏 南京 210007
- Author(s):
-
HUANG Jianzhi1, DING Chengcheng1, TAO Wei2, TAO Qing1
-
1. Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031, China;
2. Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
-
- 关键词:
-
机器学习; AdaGrad算法; RMSProp算法; 动量方法; Adam算法; AMSGrad算法; 个体收敛速率; 稀疏性
- Keywords:
-
machine learning; AdaGrad algorithm; RMSProp algorithm; momentum methods; Adam algorithm; AMSGrad algorithm; individual convergence rate; sparsity
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202006046
- 摘要:
-
Adam是目前深度神经网络训练中广泛采用的一种优化算法框架,同时使用了自适应步长和动量技巧,克服了SGD的一些固有缺陷。但即使对于凸优化问题,目前Adam也只是在线学习框架下给出了和梯度下降法一样的regret界,动量的加速特性并没有得到体现。这里针对非光滑凸优化问题,通过巧妙选取动量和步长参数,证明了Adam的改进型具有最优的个体收敛速率,从而说明了Adam同时具有自适应和加速的优点。通过求解 ${l_1}$ 范数约束下的hinge损失问题,实验验证了理论分析的正确性和在算法保持稀疏性方面的良好性能。
- Abstract:
-
Adam is a popular optimization framework for training deep neural networks, which simultaneously employs adaptive step-size and momentum techniques to overcome some inherent disadvantages of SGD. However, even for the convex optimization problem, Adam proves to have the same regret bound as the gradient descent method under online optimization circumstances; moreover, the momentum acceleration property is not revealed. This paper focuses on nonsmooth convex problems. By selecting suitable time-varying step-size and momentum parameters, the improved Adam algorithm exhibits an optimal individual convergence rate, which indicates that Adam has the advantages of both adaptation and acceleration. Experiments conducted on the l1-norm ball constrained hinge loss function problem verify the correctness of the theoretical analysis and the performance of the proposed algorithms in keeping the sparsity.
更新日期/Last Update:
2020-12-25