<-上一篇/Previous Article 下一篇/Next Article->

[1]窦勇敢,袁晓彤.基于隐式随机梯度下降优化的联邦学习[J].智能系统学报,2022,17(3):488-495.[doi:10.11992/tis.202106029]
　DOU Yonggan,YUAN Xiaotong.Federated learning with implicit stochastic gradient descent optimization[J].CAAI Transactions on Intelligent Systems,2022,17(3):488-495.[doi:10.11992/tis.202106029]

点击复制

基于隐式随机梯度下降优化的联邦学习

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第3期页码: 488-495 栏目: 学术论文—机器学习出版日期: 2022-05-05

Title:: Federated learning with implicit stochastic gradient descent optimization

作者:: 窦勇敢^1,2, 袁晓彤^1,2; 1. 南京信息工程大学自动化学院，江苏南京 210044;
2. 江苏省大数据分析技术重点实验室，江苏南京 210044

Author(s):: DOU Yonggan^1,2, YUAN Xiaotong^1,2; 1. School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China;
2. Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China

关键词:: 联邦学习; 分布式机器学习; 中央服务器; 全局模型; 隐式随机梯度下降; 数据异构; 系统异构; 优化算法; 快速收敛

Keywords:: federated learning; distributed machine learning; central server; global model; implicit stochastic gradient descent; statistical heterogeneity; systems heterogeneity; optimization algorithm; faster convergence

分类号:: TP8

DOI:: 10.11992/tis.202106029

摘要:: 联邦学习是一种分布式机器学习范式，中央服务器通过协作大量远程设备训练一个最优的全局模型。目前联邦学习主要存在系统异构性和数据异构性这两个关键挑战。本文主要针对异构性导致的全局模型收敛慢甚至无法收敛的问题，提出基于隐式随机梯度下降优化的联邦学习算法。与传统联邦学习更新方式不同，本文利用本地上传的模型参数近似求出平均全局梯度，同时避免求解一阶导数，通过梯度下降来更新全局模型参数，使全局模型能够在较少的通信轮数下达到更快更稳定的收敛结果。在实验中，模拟了不同等级的异构环境，本文提出的算法比FedProx和FedAvg均表现出更快更稳定的收敛结果。在相同收敛结果的前提下，本文的方法在高度异构的合成数据集上比FedProx通信轮数减少近50%，显著提升了联邦学习的稳定性和鲁棒性。

Abstract:: Federated learning is a distributed machine learning paradigm. The central server trains an optimal global model by collaborating with numerous remote devices. Presently, there are two key challenges faced by federated learning: system and statistical heterogeneities. Herein, we mainly focus on the slow convergence of the global model or when it even fails to converge due to system and statistical heterogeneities. We propose a federated learning optimization algorithm based on implicit stochastic gradient descent optimization, which is different from the traditional method of updating in federated learning. We use the locally uploaded model parameters to approximate the average global gradient and to avoid solving the first-order and update the global model parameter via gradient descent. This is performed so that the global model can achieve faster and more stable convergence results with fewer communication rounds. In the experiment, different levels of heterogeneous settings were simulated. The proposed algorithm shows considerably faster and more stable convergence behavior than FedAvg and FedProx. In the premise of the same convergence results, the experimental results show that the proposed method reduces the number of communication rounds by approximately 50% compared with Fedprox in highly heterogeneous synthetic datasets. This considerably improves the stability and robustness of federated learning.

参考文献/References:: [1] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 2017: 1273–1282.
[2] WANG Hongyi, YUROCHKIN M, SUN Yuekai, et al. Federated learning with matched averaging [EB/OL]. (2020–02–25)[2021–03–09]https://arxiv: 2002.06440, 2020.
[3] KOPPARAPU K, LIN E, ZHAO J. FedCD: Improving performance in non-IID federated learning [EB/OL]. (2020–07–27) [2021–03–09]https:// arxiv: 2006.09637, 2020.
[4] YU Hao, YANG Sen, ZHU Shenghuo. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning[C]//Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. Palo Alto, USA, 2019: 5693-5700.
[5] WANG Shiqiang, TUOR T, SALONIDIS T, et al. Adaptive federated learning in resource constrained edge computing systems[J]. IEEE journal on selected areas in communications, 2019, 37(6): 1205–1221.
[6] YU Hao, JIN Rong, YANG Sen. On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA, 2019: 7184–7193.
[7] JEONG E, OH S, KIM H, et al. Communication-efficient on-device machine learning: federated distillation and augmentation under Non-IID private data [EB/OL]. (2018–11–28)[2021–03–09]https:// arxiv: 1811.11479, 2018.
[8] HUANG Li, YIN Yifeng, FU Zeng, et al. LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data[J]. PLoS one, 2020, 15(4): e0230706.
[9] REDDI S, CHARLES Z, ZAHEER M, et al. Adaptive federated optimization [EB/OL]. (2021–09–08) [2021–10–09]https:// arXiv: 2003.00295, 2021.
[10] YANG Kai, FAN Tao, CHEN Tianjian, et al. A quasi-newton method based vertical federated learning framework for logistic regression[EB/OL]. (2019–12–04)[2021–09–08]https:// arXiv: 1912.00513, 2019.
[11] DHAKAL S, PRAKASH S, YONA Y, et al. Coded federated learning[C]//2019 IEEE Globecom Workshops (GC Wkshps). Waikoloa, USA, 2019: 1–6.
[12] WANG Cong, YANG Yuanyuan, ZHOU Pengzhan. Towards efficient scheduling of federated mobile devices under computational and statistical heterogeneity[J]. IEEE transactions on parallel and distributed systems, 2021, 32(2): 394–410.
[13] MALINOVSKIY G, KOVALEV D, GASANOV E, et al. From local SGD to local fixed-point methods for federated learning[C]//Proceedings of the 37th International Conference on Machine Learning. New York, USA, 2020: 6692–6701.
[14] HANZELY F, RICHTáRIK P. Federated learning of a mixture of global and local models [EB/OL]. (2020–02–10)[2021–03–09]https:// arXiv: 2002.05516, 2020.
[15] ROTHCHILD D, PANDA A, ULLAH E, et al. FetchSGD: Communication-efficient federated learning with sketching[C]//Proceedings of the 37th International Conference on Machine Learning. New York, USA, 2020: 8253–8265.
[16] WANG Jialei, WANG Weiran, SREBRO N. Memory and communication efficient distributed stochastic optimization with minibatch-prox[C]//Proceedings of the 2017 Conference on Learning Theory. New York, USA, 2017: 1882–1919.
[17] LI Tian, HU Shengyuan, BEIRAMI A, et al. Federated multi-task learning for competing constraints[EB/OL]. [2021–03–09]https://openreview.net/forum?id=1ZN5y4yx6T1.
[18] LI Tian, SAHU A, ZAHEER M, et al. Federated optimization in heterogeneous networks[J]. Proceeding of machine learning and systems, 2020, 2: 429–450.
[19] ZHOU Pan, YUAN Xiaotong, XU Huan, et al. Efficient meta learning via minibatch proximal update[EB/OL]. (2019–12–08)[2021–03–09]https://openreview.net/forum?id=B1gSHVrx8S.
[20] PHONG L T, AONO Y, HAYASHI T, et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE transactions on information forensics and security, 2018, 13(5): 1333–1345.
[21] GO A, BHAYANI R, HUANG Lei. Twitter sentiment classification using distant supervision[J]. CS224N project report, Stanford, 2009, 1(12): 2009.
[22] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324.
[23] COHEN G, AFSHAR S, TAPSON J, et al. EMNIST: extending MNIST to handwritten letters[C]//2017 International Joint Conference on Neural Networks (IJCNN). Anchorage, USA, 2017: 2921–2926.
[24] Tung K K. Topics in Mathematical Modeling[M]. Princeton University Press, 2007.
[25] BALLES, LUKAS, PHILIPP HENNING. Dissecting adam: the sign, magnitude and variance of stochastic gradients[C]//International Conference on Machine Learning. PMLR, 2018: 404–413.

相似文献/References:: [1]王健宗,肖京,朱星华,等.联邦推荐系统的协同过滤冷启动解决方法[J].智能系统学报,2021,16(1):178.[doi:10.11992/tis.202009032]
　WANG Jianzong,XIAO Jing,ZHU Xinghua,et al.Cold starts in collaborative filtering for federated recommender systems[J].CAAI Transactions on Intelligent Systems,2021,16():178.[doi:10.11992/tis.202009032]
[2]陈涛,谢在鹏,屈志昊.基于动态阈值增强原型网络的联邦半监督学习模型[J].智能系统学报,2024,19(3):534.[doi:10.11992/tis.202311015]
　CHEN Tao,XIE Zaipeng,QU Zhihao.Federated semi-supervised learning model based on dynamic threshold enhanced prototype network[J].CAAI Transactions on Intelligent Systems,2024,19():534.[doi:10.11992/tis.202311015]
[3]陈志鹏,张勇,高海荣,等.隐私保护下融合联邦学习和LSTM的少数据综合能源多元负荷预测[J].智能系统学报,2024,19(3):565.[doi:10.11992/tis.202208049]
　CHEN Zhipeng,ZHANG Yong,GAO Hairong,et al.Integrated energy multivariate load forecasting combining federated learning with LSTM in privacy-protected and low-data environments[J].CAAI Transactions on Intelligent Systems,2024,19():565.[doi:10.11992/tis.202208049]
[4]高媛,石润华,刘长杰.自适应差分隐私的联邦学习方案[J].智能系统学报,2024,19(6):1395.[doi:10.11992/tis.202306052]
　GAO Yuan,SHI Runhua,LIU Changjie.Federated learning scheme with adaptive differential privacy[J].CAAI Transactions on Intelligent Systems,2024,19():1395.[doi:10.11992/tis.202306052]
[5]赵泽华,梁美玉,薛哲,等.基于数据质量评估的高效强化联邦学习节点动态采样优化[J].智能系统学报,2024,19(6):1552.[doi:10.11992/tis.202305054]
　ZHAO Zehua,LIANG Meiyu,XUE Zhe,et al.Client dynamic sampling optimization of efficient reinforcement federated learning based on data quality assessment[J].CAAI Transactions on Intelligent Systems,2024,19():1552.[doi:10.11992/tis.202305054]

备注/Memo

收稿日期:2021-06-18。
基金项目:国家自然科学基金项目（61876090,61936005）；科技创新2030–“新一代人工智能”重大项目（2018AAA0100400）.
作者简介:窦勇敢，硕士研究生，主要研究方向为联邦学习、语义分割;袁晓彤，教授，博士生导师，中国计算机学会计算机视觉专委会委员，中国自动化学会模式识别与机器智能专委会委员，IEEE会员，主要研究方向为机器学习和计算机视觉。入选江苏省双创人才。发表学术论文80余篇
通讯作者:袁晓彤.E-mail:xtyuan1980@gmail.com

更新日期/Last Update: 1900-01-01

基于隐式随机梯度下降优化的联邦学习 PDF下载HTML

备注/Memo

基于隐式随机梯度下降优化的联邦学习

PDF下载 HTML