[1]NING Xin,ZHAO Wenyao,ZONG Yixin,et al.An overview of the joint optimization method for neural network compression[J].CAAI Transactions on Intelligent Systems,2024,19(1):36-57.[doi:10.11992/tis.202306042]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 1
Page number:
36-57
Column:
综述
Public date:
2024-01-05
- Title:
-
An overview of the joint optimization method for neural network compression
- Author(s):
-
NING Xin1; ZHAO Wenyao2; ZONG Yixin3; ZHANG Yugui1; CHEN Hao4; ZHOU Qi1; MA Junxiao1
-
1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China;
2. School of Microelectronics, Hefei University of Technology, Hefei 230009, China;
3. Bureau of Frontier Sciences and Education, Chinese Academy of Sciences, Beijing 100864, China;
4. College of Artificial Intelligence, Nankai University, Tianjin 300071, China
-
- Keywords:
-
neural network; compression; pruning; quantization; knowledge distillation; model compression; deep learning
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202306042
- Abstract:
-
With the increasing demand for real-time, privacy and security of AI applications, deploying high-performance neural network on an edge computing platform has become a research hotspot. Since common edge computing platforms have limitations in storage, computing power, and power consumption, the edge deployment of deep neural networks is still a huge challenge. Currently, one method to overcome the challenges is to compress the existing neural network to adapt to the device deployment conditions. The commonly used model compression algorithms include pruning, quantization, and knowledge distillation. By taking advantage of complementary multiple methods, the combined compression can achieve better compression acceleration effect, which is becoming a hot spot in research. This paper first makes a brief overview of the commonly used model compression algorithms, and then summarizes three commonly used joint compression algorithms: “knowledge distillation + pruning”, “knowledge distillation + quantification” and "pruning + quantification", focusing on the analysis and discussion of basic ideas and methods of joint compression. Finally, the future key development direction of the neural network compression joint optimization method is put forward.