[1]HUANG Jianzhi,DING Chengcheng,TAO Wei,et al.Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization[J].CAAI Transactions on Intelligent Systems,2020,15(6):1140-1146.[doi:10.11992/tis.202006046]
Copy

Optimal individual convergence rate of Adam-type algorithms in nonsmooth convex optimization

References:
[1] KINGMA D P, BA J L. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations. San Diego, USA, 2015.
[2] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. The journal of machine learning research, 2011, 12: 2121-2159.
[3] ZINKEVICH M. Online convex programming and generalized infinitesimal gradient ascent[C]//Proceedings of the 20th International Conference on Machine Learning. Washington, USA, 2003: 928-935.
[4] TIELEMAN T, HINTON G. Lecture 6.5-RMSProp: divide the gradient by a running average of its recent magnitude[R]. Toronto: University of Toronto, 2012.
[5] ZEILER M D. ADADELTA: an adaptive learning rate method[EB/OL]. (2012-12-22)[2020-04-20]. https://arxiv.org/abs/1212.5701
[6] POLYAK B T. Some methods of speeding up the convergence of iteration methods[J]. USSR computational mathematics and mathematical physics, 1964, 4(5): 1-17.
[7] NESTEROV Y E. A method of solving a convex programming problem with convergence rate O(1/k2)[J]. Soviet mathematics doklady, 1983, 27(2): 372-376.
[8] GHADIMI E, FEYZMAHDAVIAN H R, JOHANSSON M. Global convergence of the Heavy-ball method for convex optimization[C]//Proceedings of 2015 European Control Conference. Linz, Austria, 2015: 310-315.
[9] SHAMIR O, ZHANG Tong. Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA, 2013: 1-71-1-79.
[10] 陶蔚, 潘志松, 储德军, 等. 使用Nesterov步长策略投影次梯度方法的个体收敛性[J]. 计算机学报, 2018, 41(1): 164-176
TAO Wei, PAN Zhisong, CHU Dejun, et al. The individual convergence of projected subgradient methods using the Nesterov’s step-size strategy[J]. Chinese journal of computers, 2018, 41(1): 164-176
[11] TAO Wei, PAN Zhisong, WU Gaowei, et al. The strength of Nesterov’s extrapolation in the individual convergence of nonsmooth optimization[J]. IEEE transactions on neural networks and learning systems, 2020, 31(7): 2557-2568.
[12] 程禹嘉, 陶蔚, 刘宇翔, 等. Heavy-Ball型动量方法的最优个体收敛速率[J]. 计算机研究与发展, 2019, 56(8): 1686-1694
CHENG Yujia, TAO Wei, LIU Yuxiang, et al. Optimal individual convergence rate of the Heavy-ball-based momentum methods[J]. Journal of computer research and development, 2019, 56(8): 1686-1694
[13] KIROS R, ZEMEL R S, SALAKHUTDINOV R, et al. A multiplicative model for learning distributed text-based attribute representations[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014: 2348-2356.
[14] BAHAR P, ALKHOULI T, PETER J T, et al. Empirical investigation of optimization algorithms in neural machine translation[J]. The Prague bulletin of mathematical linguistics, 2017, 108(1): 13-25.
[15] REDDI S J, KALE S, KUMAR S. On the convergence of Adam and beyond[C]//Processing of the 6th International Conference on Learning Representations. Vancouver, Canada, 2018.
[16] WANG Guanghui, LU Shiyin, TU Weiwei, et al. SAdam: a variant of Adam for strongly convex functions[C]//Processing of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia, 2020.
[17] CHEN Xiangyi, LIU Sijia, SUN Ruoyu, et al. On the convergence of a class of Adam-type algorithms for non-convex optimization[C]//Processing of the 7th International Conference on Learning Representations. New Orleans, USA, 2019.
[18] DUCHI J, SHALEV-SHWARTZ S, SINGER Y, et al. Efficient projections onto the l1-ball for learning in high dimensions[C]//Processing of the 25th International Conference on Machine learning. Helsinki, Finland, 2008: 272-279.
[19] AGARWAL A, BARTLETT P L, RAVIKUMAR P, et al. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization[J]. IEEE transactions on information theory, 2012, 58(5): 3235-3249.
[20] SHALEV-SHWARTZ S, SINGER Y, SREBRO N, et al. Pegasos: primal estimated sub-gradient solver for SVM[J]. Mathematical programming, 2011, 127(1): 3-30.
[21] RAKHLIN A, SHAMIR O, SRIDHARAN K. Making gradient descent optimal for strongly convex stochastic optimization[C]//Processing of the 29th International Conference on International Conference on Machine Learning. Edinburgh, Scotland, 2012: 1571-1578.
Similar References:

Memo

-

Last Update: 2020-12-25

Copyright © CAAI Transactions on Intelligent Systems