DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(01):1-11.[doi:10.3969/j.issn.1673-4785.201403072]





A review on general purpose computing on GPUs and its applications in computational intelligence
丁科12 谭营12
1. 北京大学 机器感知与智能教育部重点实验室, 北京 100871;
2. 北京大学 信息科学技术学院, 北京 100871
DING Ke12 TAN Ying12
1. Key Laboratory of Machine Perception (MOE), Peking University, Beijing 100871, China;
2. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
computational intelligenceswarm intelligenceevolutionary algorithmsmachine learningdeep learninggraphics processing unit (GPU)general purpose computing on GPUsheterogonous computinghigh performance computing (HPC)
The GPU enjoys the characteristics of high parallelism, low energy consumption and cheap price. Compared with the traditional CPU platform, it is especially suitable for tasks with high data parallelism. GPU computing has come into the mainstream of high performance computation (HPC) due to the emerging of development platforms like CUDA and OpenCL. The GPU’s enormous computational power greatly promotes computational intelligence. A great success has been achieved in the fields such as deep learning and swarm intelligence optimization, and several breakthroughs have been seen in image, and speech recognition because of GPU. Though suffering some drawbacks, GPUs provide common people and small institutions with enormous computing power. This has changed the set-up of scientific computing and programming model because it could only be provided by expensive supercomputers. To help researchers in the field of computational intelligence better utilize GPUs, a detailed survey of GPGPU is given in this paper。First, the characteristics and advantages of GPUs against CPUs are presented. Then we briefly review the development of GPU hardware followed by a survey of the evolution of development tools for GPGPU; special attention is drawn to two major platforms, CUDA and OpenCL. We end this paper with our perspectives of the challenges and trends of GPGPU. We point out that embedding and cluster are two major trends for GPGPU and as both academia and industry continue to see increasing progress in artificial intelligence, the GPU will be more widely used in more domains.


[1] OWENS J D, LUEBKE D, GOVINDARAJU N, et al. A survey of general-purpose computation on graphics hardware[J]. Computer Graphics Forum, 2007, 26(1): 80-113.
[2] OWENS J D, LUEBKE D, GOVINDARAJU N, et al. GPU computing[J]. Proceedings of the IEEE, 2008, 96(5): 879-899.
[3] SUTTER H. The free lunch is over: a fundamental turn toward concurrency in software[J]. Dr. Dobb’s Journal, 2005, 30(3): 202-210.
[4] ROSS P E. Why CPU frequency stalled[J]. Spectrum, 2008, 45(4): 72-78.
[5] BORKAR S. Getting gigascale chips: challenges and opportunities in continuing Moore’s Law[J]. Queue, 2003, 1(7): 26-33.
[6] NVIDIA. CUDA C programming guide v6.5[R]. Santa Clara, CA, USA: NVIDIA Corporation, 2014.
[7] JARARWEH Y, JARRAH M, BOUSSELHAM A, et al. GPU-based personal supercomputing[C]//2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies. Amman, 2013: 1-5.
[8] KAPASI U J, RIXNER S, DALLY W J, et al. Programmable stream processors[J]. Computer, 2003, 36(8): 54-62.
[9] BUCK I, FOLEY T, HORN D, et al. Brook for GPUs: stream computing on graphics hardware[J]. ACM Transactions on Graphics, 2004, 23(3): 777-786.
[10] Microsoft. C++ accelerated massive parallelism[Z]. Redmond, WA, USA: Microsoft, 2013.
[11] NVIDIA. CUDA C best practices guide version 4.1[R]. Santa Clara, CA, USA: NVIDIA Corporation, 2012.
[12] NVIDIA. GPU-Accelerated Libraries.[OL/EB].[2015-01-05]. https://developer.nvidia.com/gpu-accelerated-libraries.
[13] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the ACM International Conference on Multimedia,[s.l.], 2014: 675-678.
[14] GASTER B, HOWES L, KAELI D R,等. OpenCL异构计算[M]. 北京: 清华大学出版社, 2012: 10-35.
[15] KIRK D B, HWU W W. Programming massively parallel processors: a Hands-on approach[M]. Beijing: Tsinghua University Press, 2010: 205-220.
[16] MUNSHI A, GASTER B, MATTSON T G, et al. OpenCL Programming Guide[M]. Boston: Addison_Wesley Professional, 2011: 63-68.
[17] AMD上海研发中心. 跨平台的多核与从核编程讲义——OpenCL的方式[M]. 上海: AMD, 2010: 1-154.
[18] FARBER R. 高性能 CUDA应用设计与开发[M]. 北京:机械工业出版社, 2013: 1-49.
[19] ZEILER M, FERGUS R. Visualizing and understanding convolutional networks[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland, 2014: 818-833.
[20] HINTON G, OSINDERO S, WELLING M, et al. Unsupervised discovery of nonlinear structure using contrastive backpropagation[J]. Nature, 2006, 30(4): 725-731.
[21] KRIZHEVSKY A, SUTSKEVER I, HINTON G. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25. Reno, Nevada, USA, 2012: 1106-1114.
[22] COATES A, HUVAL B, WANG T, et al. Deep learning with COTS HPC systems[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA, 2013: 1337-1345.
[23] ZHOU Y, TAN Y. GPU-based parallel particle swarm optimization[C]//IEEE Congress on Evolutionary Computation. Trondheim, Norway, 2009: 1493-1500.
[24] ZHOU Y, TAN Y. Particle swarm optimization with triggered mutation and its implementation based on GPU[C]//GECCO’10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. Portland, Oregon, USA, 2010: 1-8.
[25] ZHOU Y, TAN Y. GPU-based parallel multi-objective particle swarm optimization[J]. International Journal of Artificial Intelligence, 2011, 7(A11): 125-141.
[26] DING K, TAN Y. A GPU-based parallel fireworks algorithm for optimization[C]//GECCO’13: Proceedings of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference. Amsterdam, the Netherlands, 2013: 9-16.
[27] TAN Y, ZHU Y. Fireworks algorithm for optimization[C]//First International Conference of Swarm Intelligence. Beijing, China, 2010: 355-364.
[28] RYMUT B, KWOLEK B. GPU-supported object tracking using adaptive appearance models and particle swarm optimization[C]//International Conference on Computer Vision and Graphics, Warsaw, Poland, 2010: 227-234.
[29] MUSSI L, IVEKOVIC S, CAGNONI S. Markerless articulated human body tracking from multi-view video with GPU-PSO[C]//9th International Conference on Environmental Systems. York, UK, 2010: 97-108.
[30] NOBILE M S, BESOZZI D, CAZZANIGA P, et al. A GPU-based multi-swarm PSO method for parameter estimation in stochastic biological systems exploiting discrete-time target series[C]//10th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Computational Biology. Málaga, Spain, 2012, 7246: 74-85.
[31] MAGHAZEH A, BORDOLOI UD, ELES P, et al. General purpose computing on low-power embedded GPUs: has it come of age[R]. Linkping University Electronic Press, 2013.
[32] HALLMANS D, SANDSTROM K, LINDGREN M, et al. GPGPU for industrial control systems[C]//2013 IEEE 18th Conference on Emerging Technologies Factory Automation. Cagliari, Italy, 2013: 1-4.


 DING Yong-sheng.A new scheme for computational intelligence: bio-network architecture[J].CAAI Transactions on Intelligent Systems,2007,2(01):26.
[2]康 琦,汪 镭,刘小莉,等.基于群体智能框架理念的遗传算法总体模式描述[J].智能系统学报,2007,2(05):42.
 KANG Qi,WANG Lei,LIU Xiao-li,et al.General mode description genetic algorithms based on a framework of swarm intelligence[J].CAAI Transactions on Intelligent Systems,2007,2(01):42.
[3]杨东升,康 琦,刘 波,等.面向生产系统的残次品主次成因的群体智能分析[J].智能系统学报,2009,4(06):502.[doi:10.3969/j.issn.1673-4785.2009.06.006]
 YANG Dong-sheng,KANG Qi,LIU Bo,et al.Swarm intelligence analysis of primary and secondary causes of defective products for manufacturing system[J].CAAI Transactions on Intelligent Systems,2009,4(01):502.[doi:10.3969/j.issn.1673-4785.2009.06.006]
 XIA Linlin,ZHANG Jianpei,CHU Yan.An application survey on computational intelligence for path planning of mobile robots[J].CAAI Transactions on Intelligent Systems,2011,6(01):160.
 CHEN Jie,SHEN Yanxia,LU Xin.Artificial bee colony algorithm based on information feedback and an improved fitness value evaluation[J].CAAI Transactions on Intelligent Systems,2016,11(01):172.[doi:10.11992/tis.201506024]
 QIN Quande,CHENG Shi,LI Li,et al.Artificial bee colony algorithm: a survey[J].CAAI Transactions on Intelligent Systems,2014,9(01):127.[doi:10.3969/j.issn.1673-4785.201309064]
 TAN Ying,ZHENG Shaoqiu.Recent advances in fireworks algorithm[J].CAAI Transactions on Intelligent Systems,2014,9(01):515.[doi:10.3969/j.issn.1673-4785.201409010]
 GU Daqiang,ZHENG Wengang.Technologies for cooperative transportation by multiple mobile robots[J].CAAI Transactions on Intelligent Systems,2019,14(01):20.[doi:10.11992/tis.201801038]
 LI Jingcan,DING Shifei.Twin support vector machine based on artificial fish swarm algorithm[J].CAAI Transactions on Intelligent Systems,2019,14(01):1121.[doi:10.11992/tis.201905025]
 QIU Huaxin,DUAN Haibin,FAN Yanming,et al.Pigeon flock interaction pattern switching model and its synchronization analysis[J].CAAI Transactions on Intelligent Systems,2020,15(01):334.[doi:10.11992/tis.201904052]


作者简介:丁科,男,1989年生,博士研究生,主要研究方向为群体智能、GPU通用计算、并行编程和机器学习;谭营,男,1964年生,教授,博士生导师,主要研究方向为计算智能、群体智能、机器学习、人工免疫系统、智能信息处理及信息安全应用。担任IJCIPT主编,IJSIR副主编,IEEE Trans on Cybernetics副主编等,IEEE Senior Member, IEEE CIS-ETTC委员,ICSI系列会议大会主席。主持国家“863”计划、国家自然科学基金、国际合作交流等科研项目30余项。获得2009年度国家自然科学二等奖,是中科院百人计划入选者。获国家发明专利授权3项,发表学术论文260余篇,出版专著5部。
更新日期/Last Update: 2015-06-16