<-Previous Article Next Article->

[1]YU Chengyu,LI Zhiyuan,MAO Wenyu,et al.Design and implementation of an efficient accelerator for sparse convolutional neural network[J].CAAI Transactions on Intelligent Systems,2020,15(2):323-333.[doi:10.11992/tis.201902007]

Copy

Design and implementation of an efficient accelerator for sparse convolutional neural network

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 15 Number of periods: 2020 2 Page number: 323-333 Column: 学术论文—机器学习 Public date: 2020-03-05

Title:: Design and implementation of an efficient accelerator for sparse convolutional neural network

Author(s):: YU Chengyu¹; 2; LI Zhiyuan¹; 2; MAO Wenyu¹; LU Huaxiang¹; 2; 3; 4; 1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China;
2. University of Chinese Academy of Sciences, Beijing 100089, China;
3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Scienc

Keywords:: convolutional neural network; sparsity; embedded FPGA; ReLU; hardware acceleration; parallel computing; deep learning

CLC:: TN4

DOI:: 10.11992/tis.201902007

Abstract:: To address the difficulty experienced by convolutional neural networks (CNNs) in computing hardware implementation, most previous designs of convolutional neural network accelerators have focused on solving the computation performance and bandwidth bottlenecks, while ignoring the importance of CNN sparsity to accelerator design. Recently, it has often been difficult to simultaneously achieve computational flexibility, parallel efficiency, and resource overhead using the small number of CNN accelerator designs capable of utilizing sparsity. In this paper, we first analyze the effects of different parallel expansion methods on the use of sparsity, analyze different methods that utilize sparsity, and then propose a parallel expansion method that can accelerate CNNs with activated sparsity to achieve higher parallelism efficiency and lower additional resource cost, as compared with other designs. Lastly, we complete the design of this CNN accelerator and implemented it on FPGA. The results show that compared with a dense network design using the same device, the acceleration performance of the VGG-16 network was increased by 108.8% and its overall performance was improved by 164.6%, which has obvious performance advantages.

References:: [1] RUSSAKOVSKY O, DENG Jia, SU Hao, et al. ImageNet large scale visual recognition challenge[J]. International journal of computer vision, 2014, 115(3): 211-252.
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]// The 3rd International Conference on Learning Representations (ICLR2015).San Diego, CA, 2015.
[3] ZHANG Chen, LI Peng, SUN Guangyu, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of 2015 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. New York, NY, USA, 2015.
[4] NATALE G, BACIS M, SANTAMBROGIO M D. On how to design dataflow FPGA-based accelerators for convolutional neural networks[C]//2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). Bochum, Germany, 2017.
[5] SHEN Yongming, FERDMAN M, MILDER P. Maximizing CNN accelerator efficiency through resource partitioning[C]//2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). Toronto, ON, Canada, 2016.
[6] MA Yufei, CAO Yu, VRUDHULA S, et al. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//ACM/SIGDA International Symposium on Field-programmable Gate Arrays. Monterey, California, USA, 2017.
[7] SHI Shaohuai, CHU Xiaowen. Speeding up convolutional neural networks by exploiting the sparsity of rectifier units[J]. Computer vision and pattern recognition, 2017, 4: 1-7.
[8] CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE journal of solid-state circuits, 2017, 52(1): 127-138.
[9] ALBERICIO J, JUDD P, HETHERINGTON T, et al. Cnvlutin: ineffectual-neuron-free deep neural network computing[C]//2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Seoul, South Korea, 2016.
[10] ZHANG Shijin, DU Zidong, ZHANG Lei, et al. Cambricon-X: an accelerator for sparse neural networks[C]//201649th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Taipei, China, 2016: 1-12.
[11] HAN Song, LIU Xingyu, MAO Huizi, et al. EIE: efficient inference engine on compressed deep neural network[J]. International symposium on computer architecture, 2016, 44(3): 243-254.
[12] Parashar A, RHU M, Mukkara A, et al. SCNN: An accelerator for compressed-sparse convolutional neural networks[C]//The 44th Annual International Symposium. TorontoCanada, 2017.
[13] OLSHAUSEN B A, FIELD D J. Sparse coding with an overcomplete basis set: a strategy employed by V1?[J]. Vision research, 1997, 37(23): 3311-3325.
[14] DAYAN P, ABBOTT L F. Theoretical neuroscience: computational and mathematical modeling of neural systems[M]. Cambridge, USA: The MIT Press, 2001.
[15] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, 2012.
[17] HAN Song, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015: 1135-1143.
[18] HAN Song, MAO Huizi, DALLY W J, et al. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[C]//International Conference on Learning Representations, San Juan, Puerto Rico,2016.
[19] MA Yufei, SUDA N, CAO Yu, et al. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA[C]//201626th International Conference on Field Programmable Logic and Applications (FPL). Lausanne, Switzerland, 2016: 1-8.
[20] KU NG. Why systolic architectures[J]. IEEE Computer, 1982, 15(1): 300-309.
[21] GYSEL P, MOTAMEDI M, GHIASI S. Hardware-oriented approximation of convolutional neural networks[J]. Computer Vision and Pattern Recognition, 2016, 10: 1-8.
[22] DORRANCE R, REN Fengbo, MARKOVI? D, et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs[C]//ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2014: 161-170.
[23] QIU Jiantao, WANG Jie, YAO Song, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2016: 26-35.
[24] SUDA N, CHANDRA V, DASIKA G, et al. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, California, USA, 2016.
[25] ZHANG Chen, FANG Zhenman, ZHOU Peipei, et al. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks[C]//Proceedings of the 35th International Conference on Computer-Aided Design. Austin, Texas, USA, 2016.

Similar References:

Memo

Last Update: 1900-01-01

Design and implementation of an efficient accelerator for sparse convolutional neural network PDF DownloadHTML

Memo

Design and implementation of an efficient accelerator for sparse convolutional neural network

PDF Download HTML