[1]YU Chengyu,LI Zhiyuan,MAO Wenyu,et al.Design and implementation of an efficient accelerator for sparse convolutional neural network[J].CAAI Transactions on Intelligent Systems,2020,15(2):323-333.[doi:10.11992/tis.201902007]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 2
Page number:
323-333
Column:
学术论文—机器学习
Public date:
2020-03-05
- Title:
-
Design and implementation of an efficient accelerator for sparse convolutional neural network
- Author(s):
-
YU Chengyu1; 2; LI Zhiyuan1; 2; MAO Wenyu1; LU Huaxiang1; 2; 3; 4
-
1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China;
2. University of Chinese Academy of Sciences, Beijing 100089, China;
3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Scienc
-
- Keywords:
-
convolutional neural network; sparsity; embedded FPGA; ReLU; hardware acceleration; parallel computing; deep learning
- CLC:
-
TN4
- DOI:
-
10.11992/tis.201902007
- Abstract:
-
To address the difficulty experienced by convolutional neural networks (CNNs) in computing hardware implementation, most previous designs of convolutional neural network accelerators have focused on solving the computation performance and bandwidth bottlenecks, while ignoring the importance of CNN sparsity to accelerator design. Recently, it has often been difficult to simultaneously achieve computational flexibility, parallel efficiency, and resource overhead using the small number of CNN accelerator designs capable of utilizing sparsity. In this paper, we first analyze the effects of different parallel expansion methods on the use of sparsity, analyze different methods that utilize sparsity, and then propose a parallel expansion method that can accelerate CNNs with activated sparsity to achieve higher parallelism efficiency and lower additional resource cost, as compared with other designs. Lastly, we complete the design of this CNN accelerator and implemented it on FPGA. The results show that compared with a dense network design using the same device, the acceleration performance of the VGG-16 network was increased by 108.8% and its overall performance was improved by 164.6%, which has obvious performance advantages.