[1]曹嵘晖,唐卓,左知微,等.面向机器学习的分布式并行计算关键技术及应用[J].智能系统学报,2021,16(5):919-930.[doi:10.11992/tis.202108010]
 CAO Ronghui,TANG Zhuo,ZUO Zhiwei,et al.Key technologies and applications of distributed parallel computing for machine learning[J].CAAI Transactions on Intelligent Systems,2021,16(5):919-930.[doi:10.11992/tis.202108010]
点击复制

面向机器学习的分布式并行计算关键技术及应用(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年5期
页码:
919-930
栏目:
吴文俊人工智能科技进步奖一等奖
出版日期:
2021-09-05

文章信息/Info

Title:
Key technologies and applications of distributed parallel computing for machine learning
作者:
曹嵘晖12 唐卓12 左知微12 张学东12
1. 湖南大学 信息科学与工程学院, 湖南 长沙 410082;
2. 国家超级计算长沙中心, 湖南 长沙 410082
Author(s):
CAO Ronghui12 TANG Zhuo12 ZUO Zhiwei12 ZHANG Xuedong12
1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;
2. National Supercomputer Center in Changsha, Changsha 410082, China
关键词:
机器学习分布式计算倾斜数据任务时空调度资源管理节能调度跨域资源迁移并行优化图迭代算法智能分析系统
Keywords:
machine learningdistributed computingskew datatask space-time schedulingresource managementenergy-saving schedulingcross-domain resource migrationparallel optimizationgraph iteration algorithmintelligent analysis system
分类号:
TP18
DOI:
10.11992/tis.202108010
摘要:
当前机器学习等算法的计算、迭代过程日趋复杂, 充足的算力是保障人工智能应用落地效果的关键。本文首先提出一种适应倾斜数据的分布式异构环境下的任务时空调度算法,有效提升机器学习模型训练等任务的平均效率;其次,提出分布式异构环境下高效的资源管理系统与节能调度算法,实现分布式异构环境下基于动态预测的跨域计算资源迁移及电压/频率的动态调节,节省了系统的整体能耗;然后构建了适应于机器学习/深度学习算法迭代的分布式异构优化环境,提出了面向机器学习/图迭代算法的分布式并行优化基本方法。最后,本文研发了面向领域应用的智能分析系统,并在制造、交通、教育、医疗等领域推广应用,解决了在高效数据采集、存储、清洗、融合与智能分析等过程中普遍存在的性能瓶颈问题。
Abstract:
At present, the calculation and iteration process of algorithms such as machine learning is becoming more and more complex. Sufficient computational power is the key to ensure the landing effect of artificial intelligence application. In view of this, this paper first puts forward a task space-time scheduling algorithm adapted to the distributed heterogeneous environment of skew data, which effectively improves the average efficiency of tasks such as machine learning model training. Then, the high-efficiency resource management system and energy-saving scheduling algorithm in distributed heterogeneous environment are proposed to realize the dynamic prediction based cross-domain computing resource migration and voltage/frequency dynamic regulation in distributed heterogeneous environment, which saves the overall energy consumption of the system, and then, the distributed heterogeneous optimization environment adapted to the iteration of machine learning/deep learning algorithm is constructed, and the basic method of distributed parallel optimization for machine learning/graph iteration algorithm is proposed. Finally, the intelligent analysis system for field-oriented applications is researched and developed, and popularized in manufacturing, transportation, education, medical and other fields, which solves the performance bottleneck problems that are common in the process of high-efficiency data collection, storage, cleaning, fusion and intelligent analysis.

参考文献/References:

[1] LU Chienping. Native supercomputing and the revival of Moore’s law[J]. APSIPA transactions on signal and information processing, 2017, 6:1–17.
[2] DOHERR D. Supercomputing of tomorrow artificial intelligence in a smarter world[C]// International New York Conference on Social Sciences. New York, USA, 2017:1–4.
[3] SHUKUR H, ZEEBAREE S R M, AHMED A J, et al. A state of art survey for concurrent computation and clustering of parallel computing for distributed systems[J]. Journal of applied science and technology trends, 2020, 1(4): 148–154.
[4] CICIRELLI F, GIORDANO A, MASTROIANNI C. Analysis of global and local synchronization in parallel computing[J]. IEEE transactions on parallel and distributed systems, 2020, 32(5): 988–1000.
[5] LU Yuqian, XU Xun, WANG Lihui. Smart manufacturing process and system automation–a critical review of the standards and envisioned scenarios[J]. Journal of manufacturing systems, 2020, 56: 312–325.
[6] LIU Qiang, LENG Jiewu, YAN Douxi, et al. Digital twin-based designing of the configuration, motion, control, and optimization model of a flow-type smart manufacturing system[J]. Journal of manufacturing systems, 2021, 58: 52–64.
[7] KIRIMTAT A, KREJCAR O, KERTESZ A, et al. Future trends and current state of smart city concepts:a survey[J]. IEEE access, 2020, 8: 86448–86467.
[8] LI Kenli Li, LIU Chubo, LI Keqin, et al. A framework of price bidding configurations for resource usage in cloud computing[J]. IEEE transactions on parallel & distributed systems, 2016, 27(8):2168–2181.
[9] ZHONG Jianlong, HE Bingsheng. Medusa: simplified graph processing on GPUs[J]. IEEE transactions on parallel and distributed systems, 2014, 25(6): 1543–1552.
[10] WU Ren, ZHANG Bin, HSU M. GPU-accelerated large scale analytics[EB/OL]. (2009-03-06). http://www.hpl.hp.com/techreports/2009/HPL-2009-38.pdf.
[11] PONCE S P. Towards algorithm transformation for temporal data mining on GPU[D]. Virginia, USA: Virginia Polytechnic Institute and State University, 2009: 805–816.
[12] RAHUL K, BANYAL R K, GOSWAMI P. Analysis and processing aspects of data in big data applications[J]. Journal of discrete mathematical sciences and cryptography, 2020, 23(2): 385–393.
[13] WOLFF J G. The potential of the SP system in machine learning and data analysis for image processing[J]. Big data and cognitive computing, 2021, 5(1): 1–15.
[14] ZHANG Yongpeng, MUELLER F, CUI Xiaohui, et al. GPU-accelerated text mining[C]//Workshop on Exploiting Parallelism Using GPUs and Other Hardware-assisted Methods. Seattle, USA, 2009: 1–6.
[15] SCHATZ M C, TRAPNELL C. Fast exact string matching on the GPU[J]. Center for bioinformatics and computational biology, 2013:1–6.
[16] SCHATZ M C, TRAPNELL C, DELCHER A L, et al. High-throughput sequence alignment using Graphics Processing Units[J]. BMC bioinformatics, 2007, 8(1):1–10.
[17] HE Bingsheng, FANG Wenbin, LUO Qiong, et al. Mars: a MapReduce framework on graphics processors[C]//Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Toronto, Canada, 2008: 260–269.
[18] MOOLEY A, MURTHY K, SINGH H . DisMaRC: a distributed map reduce framework on CUDA[EB/OL].http://www.cs.utexas.edu/~karthikm/dismarc.pdf(2019).
[19] KAGERMANN H, WAHLSTER W, HELBIG J. Recommendations for implementing the strategic initiative INDUSTRIE 4.0-Securing the future of German manufacturing industry[J]. Final report of the industrie, 2013, 4: 213?220.
[20] 孙家广. 工业大数据[J]. 软件和集成电路, 2016(8):22–23.
SUN Jiaguang. Industrial big data[J].Software and integrated circuit, 2016(8):22–23.
[21] WANG D, LIU Jun, SRINIVASAN R. Data-driven soft sensor approach for quality prediction in a refining process[J]. IEEE transactions on industrial informatics, 2010, 6(1): 11–17.
[22] WAN Jiafu, TANG Shenglong, LI Di, et al. A manufacturing big data solution for active preventive maintenance[J]. IEEE transactions on industrial informatics, 2017, 13(4): 2039–2047.
[23] HONG Sumin, CHOI W, JEONG W K. GPU in-memory processing using spark for iterative computation[C]//2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Madrid, Spain, 2017: 31–41.
[24] JI Feng, MA Xiaosong. Using shared memory to accelerate MapReduce on graphics processing units[C]//International Parallel & Distributed Processing Symposium. Anchorage, USA, 2011: 805–816.
[25] FANG Wenbin, HE Bingsheng, LUO Qiong, et al. Mars: accelerating MapReduce with graphics processors[J]. IEEE transactions on parallel and distributed systems, 2011, 22(4): 608–620.
[26] STUART J A, OWENS J D. Multi-GPU MapReduce on GPU clusters[C]//2011 IEEE International Parallel & Distributed Processing Symposium. Anchorage, USA, 2011: 1068–1079.
[27] ABBASI A, KHUNJUSH F, AZIMI R. A preliminary study of incorporating GPUs in the Hadoop framework[C]//The 16th CSI International Symposium on Computer Architecture and Digital Systems. Shiraz, Iran, 2012: 178–185.
[28] GROSSMAN M, BRETERNITZ M, SARKAR V. HadoopCL: MapReduce on distributed heterogeneous platforms through seamless integration of Hadoop and OpenCL[C]//IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum. Cambridge, USA, 2013: 1918–1927.
[29] ZHU Jie, LI Juanjuan, HARDESTY E, et al. GPU-in-Hadoop: enabling MapReduce across distributed heterogeneous platforms[C]//2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS). Taiyuan, China, 2014: 321–326.
[30] LI Peilong, LUO Yan, ZHANG Ning, et al. HeteroSpark: a heterogeneous CPU/GPU Spark platform for machine learning algorithms[C]//IEEE International Conference on Networking, Architecture and Storage (NAS). Boston, USA, 2015: 347–348.
[31] MANZI D, TOMPKINS D. Exploring GPU acceleration of apache spark[C]//IEEE International Conference on Cloud Engineering (IC2E). Berlin, Germany, 2016: 222–223.
[32] CHOI W, HONG Sumin, JEONG W K. Vispark: GPU-accelerated distributed visual computing using spark[J]. SIAM journal on scientific computing, 2016, 38(5): S700–S719.
[33] ELTEIR M, LIN Heshan, FENG Wuchun, et al. StreamMR: an optimized MapReduce framework for AMD GPUs[C]//17th International Conference on Parallel and Distributed Systems. Tainan, China, 2011: 364–371.
[34] EWEN S, TZOUMAS K, KAUFMANN M, et al. Spinning fast iterative data flows[J]. Proceedings of the VLDB endowment, 2012, 5(11): 1268–1279.
[35] KIM K S, CHOI Y S. Incremental iteration method for fast PageRank computation[C]//Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. Bali, Indonesia, 2015: 80.
[36] LOW Y, BICKSON D, GONZALEZ J, et al. Distributed GraphLab: a framework for machine learning and data mining in the cloud[J]. Proceedings of the VLDB endowment, 2012, 5(8): 716–727.
[37] XU Yu, KOSTAMAA P. Efficient outer join data skew handling in parallel DBMS[J]. Proceedings of the VLDB endowment, 2009, 2(2): 1390–1396.
[38] TAN Jian, MENG Shicong, MENG Xiaoqiao, et al. Improving reducetask data locality for sequential MapReduce jobs[C]//Proceedings IEEE INFOCOM. Turin, Italy, 2013: 1627–1635.
[39] KWON Y C, BALAZINSKA M, HOWE B, et al. SkewTune: mitigating skew in mapreduce applications[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. Scottsdale, USA, 2012: 25–36.
[40] CHEN Qi, LIU Cheng, XIAO Zhen. Improving MapReduce performance using smart speculative execution strategy[J]. IEEE transactions on computers, 2014, 63(4): 954–967.
[41] CHEN Qi, YAO Jinyu, XIAO Zhen. LIBRA: lightweight data skew mitigation in MapReduce[J]. IEEE transactions on parallel and distributed systems, 2015, 26(9): 2520–2533.
[42] LE Yanfang, LIU Jiangchuan, ERGüN F, et al. Online load balancing for MapReduce with skewed data input[C]//IEEE Conference on Computer Communications. Toronto, Canada, 2014: 2004–2012.
[43] RAMAKRISHNAN S R, SWART G, URMANOV A. Balancing reducer skew in MapReduce workloads using progressive sampling[C]//Proceedings of the Third ACM Symposium on Cloud Computing. San Jose, USA, 2012: 16: 621?633.
[44] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1440–1448.
[45] NARODYTSKA N, KASIVISWANATHAN S. Simple black-box adversarial attacks on deep neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA, 2017: 1310–1318.
[46] PAPERNOT N, MCDANIEL P, WU Xi, et al. Distillation as a defense to adversarial perturbations against deep neural networks[C]//2016 IEEE Symposium on Security and Privacy. San Jose, USA, 2016: 582–597.

相似文献/References:

[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(5):148.
[2]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,2(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2(5):45.
[3]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报,2007,2(04):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2(5):9.
[4]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(05):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6(5):396.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(5):95.[doi:10.3969/j.issn.1673-4785.201208012]
[6]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(5):183.[doi:10.3969/j.issn.1673-4785.201209056]
[7]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(03):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8(5):261.
[8]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(5):1.[doi:10.3969/j.issn.1673-4785.201403072]
[9]孔庆超,毛文吉,张育浩.社交网站中用户评论行为预测[J].智能系统学报,2015,10(03):349.[doi:10.3969/j.issn.1673-4785.201403019]
 KONG Qingchao,MAO Wenji,ZHANG Yuhao.User comment behavior prediction in social networking sites[J].CAAI Transactions on Intelligent Systems,2015,10(5):349.[doi:10.3969/j.issn.1673-4785.201403019]
[10]姚霖,刘轶,李鑫鑫,等.词边界字向量的中文命名实体识别[J].智能系统学报,2016,11(1):37.[doi:10.11992/tis.201507065]
 YAO Lin,LIU Yi,LI Xinxin,et al.Chinese named entity recognition via word boundarybased character embedding[J].CAAI Transactions on Intelligent Systems,2016,11(5):37.[doi:10.11992/tis.201507065]

备注/Memo

备注/Memo:
收稿日期:2021-08-11。
基金项目:国家重点研发计划项目(2018YFB1701400);国家自然科学基金项目(92055213,61873090,L1924056,62002114);金融及产业数据驱动下的智慧园区云平台研发及产业化项目(XMHT20190205007);广东省重点领域研发计划项目(XMHT20190205007)深圳市科技计划项目(JSGG20180507183023239)
作者简介:曹嵘晖,副研究员,博士后,主要研究方向为分布式计算与云计算、并行处理体系结构。OpenStack 云计算开源社区核心成员,高性能计算应用软件技术教育部工程研究中心核心成员,湖南省高性能数据处理与智能分析创新团队核心成员。获吴文俊人工智能科技进步一等奖(排名第五)。主持国家重点研发子课题项目2项、国家自然科学基金项目1项、湖南省自然科学基金项目1项,参与撰写湖南省信创云标准1项,参与国家重点研发计划项目2 项、国家自然科学基金重点项目1 项、面上项目2 项、湖南省重点研发计划1 项。申请专利16 项、授权7 项,参与撰写专著1部,发表学术论文多篇。唐卓,教授,博士生导师,主要研究方向为分布式计算与云计算。教育部青年长江学者、国家超级计算长沙中心总工程师,担任多个SCI期刊的客座编辑,获国家科技进步二等奖(第三)、吴文俊人工智能科技进步一等奖(第一)、中国产学研合作创新成果一等奖(第一)、湖南省技术发明一等奖(第二)。主持科技部国家重点研发计划项目1项、国家自然科学基金重点项目1项、国家自然科学基金面上项目2项、国家自然科学基金应急项目3项、国家自然科学基金青年基金项目1项,广东省经信委项目、产学研合作项目、中国博士后科学基金等10余项。发表学术论文百余篇;左知微,博士研究生,主要研究方向为分布式机器学习。
通讯作者:唐卓.E-mail:ztang@hnu.edu.cn
更新日期/Last Update: 1900-01-01