<-Previous Article Next Article->

[1]HUANG Jianzhi,DING Chengcheng,TAO Wei,et al.Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization[J].CAAI Transactions on Intelligent Systems,2020,15(6):1140-1146.[doi:10.11992/tis.202006046]

Copy

Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 15 Number of periods: 2020 6 Page number: 1140-1146 Column: 学术论文—机器学习 Public date: 2020-11-05

Title:: Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization

Author(s):: HUANG Jianzhi¹; DING Chengcheng¹; TAO Wei²; TAO Qing¹; 1. Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031, China;
2. Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China

Keywords:: machine learning; AdaGrad algorithm; RMSProp algorithm; momentum methods; Adam algorithm; AMSGrad algorithm; individual convergence rate; sparsity

CLC:: TP181

DOI:: 10.11992/tis.202006046

Abstract:: Adam is a popular optimization framework for training deep neural networks, which simultaneously employs adaptive step-size and momentum techniques to overcome some inherent disadvantages of SGD. However, even for the convex optimization problem, Adam proves to have the same regret bound as the gradient descent method under online optimization circumstances; moreover, the momentum acceleration property is not revealed. This paper focuses on nonsmooth convex problems. By selecting suitable time-varying step-size and momentum parameters, the improved Adam algorithm exhibits an optimal individual convergence rate, which indicates that Adam has the advantages of both adaptation and acceleration. Experiments conducted on the l₁-norm ball constrained hinge loss function problem verify the correctness of the theoretical analysis and the performance of the proposed algorithms in keeping the sparsity.

References:: [1] KINGMA D P, BA J L. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations. San Diego, USA, 2015.
[2] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. The journal of machine learning research, 2011, 12: 2121-2159.
[3] ZINKEVICH M. Online convex programming and generalized infinitesimal gradient ascent[C]//Proceedings of the 20th International Conference on Machine Learning. Washington, USA, 2003: 928-935.
[4] TIELEMAN T, HINTON G. Lecture 6.5-RMSProp: divide the gradient by a running average of its recent magnitude[R]. Toronto: University of Toronto, 2012.
[5] ZEILER M D. ADADELTA: an adaptive learning rate method[EB/OL]. (2012-12-22)[2020-04-20]. https://arxiv.org/abs/1212.5701
[6] POLYAK B T. Some methods of speeding up the convergence of iteration methods[J]. USSR computational mathematics and mathematical physics, 1964, 4(5): 1-17.
[7] NESTEROV Y E. A method of solving a convex programming problem with convergence rate O(1/k²)[J]. Soviet mathematics doklady, 1983, 27(2): 372-376.
[8] GHADIMI E, FEYZMAHDAVIAN H R, JOHANSSON M. Global convergence of the Heavy-ball method for convex optimization[C]//Proceedings of 2015 European Control Conference. Linz, Austria, 2015: 310-315.
[9] SHAMIR O, ZHANG Tong. Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA, 2013: 1-71-1-79.
[10] 陶蔚, 潘志松, 储德军, 等. 使用Nesterov步长策略投影次梯度方法的个体收敛性[J]. 计算机学报, 2018, 41(1): 164-176
TAO Wei, PAN Zhisong, CHU Dejun, et al. The individual convergence of projected subgradient methods using the Nesterov’s step-size strategy[J]. Chinese journal of computers, 2018, 41(1): 164-176
[11] TAO Wei, PAN Zhisong, WU Gaowei, et al. The strength of Nesterov’s extrapolation in the individual convergence of nonsmooth optimization[J]. IEEE transactions on neural networks and learning systems, 2020, 31(7): 2557-2568.
[12] 程禹嘉, 陶蔚, 刘宇翔, 等. Heavy-Ball型动量方法的最优个体收敛速率[J]. 计算机研究与发展, 2019, 56(8): 1686-1694
CHENG Yujia, TAO Wei, LIU Yuxiang, et al. Optimal individual convergence rate of the Heavy-ball-based momentum methods[J]. Journal of computer research and development, 2019, 56(8): 1686-1694
[13] KIROS R, ZEMEL R S, SALAKHUTDINOV R, et al. A multiplicative model for learning distributed text-based attribute representations[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 2348-2356.
[14] BAHAR P, ALKHOULI T, PETER J T, et al. Empirical investigation of optimization algorithms in neural machine translation[J]. The Prague bulletin of mathematical linguistics, 2017, 108(1): 13-25.
[15] REDDI S J, KALE S, KUMAR S. On the convergence of Adam and beyond[C]//Processing of the 6th International Conference on Learning Representations. Vancouver, Canada, 2018.
[16] WANG Guanghui, LU Shiyin, TU Weiwei, et al. SAdam: a variant of Adam for strongly convex functions[C]//Processing of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia, 2020.
[17] CHEN Xiangyi, LIU Sijia, SUN Ruoyu, et al. On the convergence of a class of Adam-type algorithms for non-convex optimization[C]//Processing of the 7th International Conference on Learning Representations. New Orleans, USA, 2019.
[18] DUCHI J, SHALEV-SHWARTZ S, SINGER Y, et al. Efficient projections onto the l1-ball for learning in high dimensions[C]//Processing of the 25th International Conference on Machine learning. Helsinki, Finland, 2008: 272-279.
[19] AGARWAL A, BARTLETT P L, RAVIKUMAR P, et al. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization[J]. IEEE transactions on information theory, 2012, 58(5): 3235-3249.
[20] SHALEV-SHWARTZ S, SINGER Y, SREBRO N, et al. Pegasos: primal estimated sub-gradient solver for SVM[J]. Mathematical programming, 2011, 127(1): 3-30.
[21] RAKHLIN A, SHAMIR O, SRIDHARAN K. Making gradient descent optimal for strongly convex stochastic optimization[C]//Processing of the 29th International Conference on International Conference on Machine Learning. Edinburgh, Scotland, 2012: 1571-1578.

Similar References:

Memo

Last Update: 2020-12-25

Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization PDF DownloadHTML

Memo

Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization

PDF Download HTML