[1]MA Xiang,SHEN Guowei,GUO Chun,et al.Dynamic adaptive parallel acceleration method for heterogeneous distributed machine learning[J].CAAI Transactions on Intelligent Systems,2023,18(5):1099-1107.[doi:10.11992/tis.202209024]
Copy

Dynamic adaptive parallel acceleration method for heterogeneous distributed machine learning

References:
[1] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1?9.
[2] SZEGEDY C, TOSHEV A, ERHAN D. Deep neural networks for object detection[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. New York: ACM, 2013: 2553?2561.
[3] 叶正喆, 苍岩. 基于卷积神经网络的行人检测方法[J]. 应用科技, 2022, 49(2): 55-62
YE Zhengzhe, CANG Yan. A pedestrian detection method based on convolutional neural network[J]. Applied science and technology, 2022, 49(2): 55-62
[4] KENTON J D M W C, TOUTANOVA L K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT. Minnesota: Association for Computational Linguistics, 2019: 4171?4186.
[5] 窦勇敢, 袁晓彤. 基于隐式随机梯度下降优化的联邦学习[J]. 智能系统学报, 2022, 17(3): 488-495
DOU Yonggan, YUAN Xiaotong. Federated learning with implicit stochastic gradient descent optimization[J]. CAAI transactions on intelligent systems, 2022, 17(3): 488-495
[6] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc, 2019: 8026?8037.
[7] ABADI M, BARHAM P, CHEN Jianmin, et al. TensorFlow: a system for large-scale machine learning[C]//Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. New York: ACM, 2016: 265?283.
[8] 曹嵘晖, 唐卓, 左知微, 等. 面向机器学习的分布式并行计算关键技术及应用[J]. 智能系统学报, 2021, 16(5): 919-930
CAO Ronghui, TANG Zhuo, ZUO Zhiwei, et al. Key technologies and applications of distributed parallel computing for machine learning[J]. CAAI transactions on intelligent systems, 2021, 16(5): 919-930
[9] 王帅, 李丹. 分布式机器学习系统网络性能优化研究进展[J]. 计算机学报, 2022, 45(7): 1384-1411
WANG Shuai, LI Dan. Research progress on network performance optimization of distributed machine learning system[J]. Chinese journal of computers, 2022, 45(7): 1384-1411
[10] MIAO Xupeng, NIE Xiaonan, SHAO Yingxia, et al. Heterogeneity-aware distributed machine learning training via partial reduce[C]//Proceedings of the 2021 International Conference on Management of Data. New York: ACM, 2021: 2262?2270.
[11] 舒娜, 刘波, 林伟伟, 等. 分布式机器学习平台与算法综述[J]. 计算机科学, 2019, 46(3): 9-18
SHU Na, LIU Bo, LIN Weiwei, et al. Survey of distributed machine learning platforms and algorithms[J]. Computer science, 2019, 46(3): 9-18
[12] FAN Wenfei, HE Kun, LI Qian, et al. Graph algorithms: parallelization and scalability[J]. Science China information sciences, 2020, 63(10): 203101.
[13] JIANG Jiawei, CUI Bin, ZHANG Ce, et al. Heterogeneity-aware distributed parameter servers[C]//Proceedings of the 2017 ACM International Conference on Management of Data. New York: ACM, 2017: 463?478.
[14] 朱泓睿, 元国军, 姚成吉, 等. 分布式深度学习训练网络综述[J]. 计算机研究与发展, 2021, 58(1): 98-115
ZHU Hongrui, YUAN Guojun, YAO Chengji, et al. Survey on network of distributed deep learning training[J]. Journal of computer research and development, 2021, 58(1): 98-115
[15] XU Ning, CUI Bin, CHEN Lei, et al. Heterogeneous environment aware streaming graph partitioning[J]. IEEE transactions on knowledge and data engineering, 2015, 27(6): 1560-1572.
[16] HO Q, CIPAR J, CUI Henggang, et al. More effective distributed ML via a stale synchronous parallel parameter server[J]. Advances in neural information processing systems, 2013, 2013: 1223-1231.
[17] LI M, ANDERSEN D G, SMOLA A, et al. Communication efficient distributed machine learning with the parameter server[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 1. Massachusetts: MIT press, 2014: 19?27.
[18] ZHAO Xing, AN Aijun, LIU Junfeng, et al. Dynamic stale synchronous parallel distributed training for deep learning[C]//2019 IEEE 39th International Conference on Distributed Computing Systems. Dallas: IEEE, 2019: 1507?1517.
[19] FAN Wenfei, LU Ping, YU Wenyuan, et al. Adaptive asynchronous parallelization of graph algorithms[J]. ACM transactions on database systems, 2020, 45(2): 1-45.
[20] 王恩东, 闫瑞栋, 郭振华, 等. 分布式训练系统及其优化算法综述[J/OL]. 计算机学报, 2023: 1?29. (2023?04?06)[2023?05?01]. https://kns.cnki.net/kcms/detail/11.1826.tp.20230404.1510.002.html.
WANG Endong, YAN Ruidong, GUO Zhenhua, et al. A survey of distributed training system and its optimization algorithms[J/OL]. Chinese journal of computers, 2023: 1?29. (2023?04?06)[2023?05?01]. https://kns.cnki.net/kcms/detail/11.1826.tp.20230404.1510.002.html.
[21] CHEN Jianmin, PAN Xinghao, MONGA R, et al. Revisiting distributed synchronous SGD[EB/OL]. (2017?03?21)[2022?07?11]. https://arxiv.org/abs/1604.00981.
[22] TENG M, WOOD F. Bayesian distributed stochastic gradient descent[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: ACM, 2018: 6380?6390.
[23] SUN Haifeng, GUI Zhiyi, GUO Song, et al. GSSP: eliminating stragglers through grouping synchronous for distributed deep learning in heterogeneous cluster[J]. IEEE transactions on cloud computing, 2022, 10(4): 2637-2648.
[24] HARLAP A, CUI Henggang, DAI Wei, et al. Addressing the straggler problem for iterative convergent parallel ML[C]//Proceedings of the Seventh ACM Symposium on Cloud Computing. New York: ACM, 2016: 98?111.
[25] XU Hongfei, VAN GENABITH J, XIONG Deyi, et al. Dynamically adjusting transformer batch size by monitoring gradient direction change[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 3519?3524.
[26] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770?778.
[27] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015?04?10)[2022?07?11]. https://arxiv.org/abs/1409.1556.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems