[1]NING Xin,ZHAO Wenyao,ZONG Yixin,et al.An overview of the joint optimization method for neural network compression[J].CAAI Transactions on Intelligent Systems,2024,19(1):36-57.[doi:10.11992/tis.202306042]
Copy

An overview of the joint optimization method for neural network compression

References:
[1] 黄震华, 杨顺志, 林威, 等. 知识蒸馏研究综述[J]. 计算机学报, 2022, 45(3): 624–653
HUANG Zhenhua, YANG Shunzhi, LIN Wei, et al. Knowledge distillation: a survey[J]. Chinese journal of computers, 2022, 45(3): 624–653
[2] 高晗, 田育龙, 许封元, 等. 深度学习模型压缩与加速综述[J]. 软件学报, 2021, 32(1): 68–92
GAO Han, TIAN Yulong, XU Fengyuan, et al. Survey of deep learning model compression and acceleration[J]. Journal of software, 2021, 32(1): 68–92
[3] 邵仁荣, 刘宇昂, 张伟, 等. 深度学习中知识蒸馏研究综述[J]. 计算机学报, 2022, 45(8): 1638–1673
SHAO Renrong, LIU Yuang, ZHANG Wei, et al. A survey of knowledge distillation in deep learning[J]. Chinese journal of computers, 2022, 45(8): 1638–1673
[4] HOWARD A G, ZHU Menglong, CHEN Bo, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017?04?17)[2023?06?21]. https://arxiv.org/abs/1704.04861.pdf.
[5] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510?4520.
[6] ZHANG Xiangyu, ZHOU Xinyu, LIN Mengxiao, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848?6856.
[7] MA Ningning, ZHANG Xiangyu, ZHENG Haitao, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//European Conference on Computer Vision. Cham: Springer, 2018: 122?138.
[8] IANDOLA F N, HAN Song, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[EB/OL]. (2016?11?04)[2023?06?21]. https://arxiv.org/abs/1602.07360.pdf.
[9] HOWARD A, SANDLER M, CHEN Bo, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision . Piscataway: IEEE, 2020: 1314?1324.
[10] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8697?8710.
[11] TAN Mingxing, CHEN Bo, PANG Ruoming, et al. MnasNet: platform-aware neural architecture search for mobile[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 2815?2823.
[12] LECUN Y, DENKER J, SOLLA S. Optimal brain damage[J]. Advances in neural information processing systems, 1989, 2: 598–605.
[13] HASSIBI B, STORK D G. Second order derivatives for network pruning: optimal brain surgeon[C]//Advances in Neural Information Processing Systems 5. New York: ACM, 1992: 164?171.
[14] LI Hao, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[EB/OL]. (2017?03?10)[2023?06?21]. https://arxiv.org/abs/1608.08710.pdf.
[15] HE Yang, KANG Guoliang, DONG Xuanyi, et al. Soft filter pruning for accelerating deep convolutional neural networks[EB/OL]. (2018?08?21)[2023?06?21]. https://arxiv.org/abs/1808.06866.pdf.
[16] LIU Zhuang, LI Jianguo, SHEN Zhiqiang, et al. Learning efficient convolutional networks through network slimming[C]//2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2755?2763.
[17] LUO Jianhao, WU Jianxin, LIN Weiyao. ThiNet: a filter level pruning method for deep neural network compression[C]//2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5068?5076.
[18] WEN Wei, WU Chunpeng, WANG Yandan, et al. Learning structured sparsity in deep neural networks[C]// In Proceedings of the 30th International Conference on Neural Information Processing Systems. Curran Associates Inc. Red Hook, NY, USA, 2016: 2082–2090.
[19] LEBEDEV V, LEMPITSKY V. Fast ConvNets using group-wise brain damage[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway : IEEE, 2016: 2554?2564.
[20] YE Xucheng, DAI Pengcheng, LUO Junyu, et al. Accelerating CNN Training by Pruning Activation Gradients[C]// In Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XXV. Springer-Verlag, Berlin: Heidelberg, 2020: 322–338.
[21] HAN Song, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[EB/OL]. (2015?10?31)[2023?06?21]. https://arxiv.org/abs/1506.02626.pdf.
[22] DETTMERS T. 8-bit approximations for parallelism in deep learning[EB/OL]. (2016?02?19)[2023?06?21]. https://arxiv.org/abs/1511.04561.pdf.
[23] JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 2704?2713.
[24] LIN Xiaofan, ZHAO Cong, PAN Wei. Towards accurate binary convolutional neural network[EB/OL]. (2016?02?19)[2023?06?21]. https://arxiv.org/abs/1711.11294.pdf.
[25] COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: training deep neural networks with binary weights during propagations[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. New York: ACM, 2015: 3123?3131.
[26] HOU Lu, Yao Quanming, KWOK J T. Loss-aware binarization of deep networks[J]. (2018?05?10)[2023?06?21]. https://arxiv.org/abs/1611.01600v1.pdf.
[27] JUEFEI-XU F, BODDETI V N, SAVVIDES M. Local binary convolutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 4284?4293.
[28] ZHU Chenzhuo, HAN Song, Mao Huizi, et al. Trained ternary quantization[EB/OL]//(2017?02?23)[2023?06?21]. https://arxiv.org/abs/1612.01064v2.pdf.
[29] ACHTERHOLD J, KOEHLER J M, SCHMEINK A, et al. Variational network quantization[C]//International Conference on Learning Representations. Ithaca:ICLR, 2018: 1?18.
[30] MELLEMPUDI N, KUNDU A, MUDIGERE D, et al. Ternary neural networks with fine-grained quantization[EB/OL]. (2017?05?30)[2023?06?21]. https://arxiv.org/abs/1705.01462.pdf.
[31] BOROUMAND A, GHOSE S, KIM Y, et al. Google workloads for consumer devices: mitigating data movement bottlenecks[C]//Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2018: 316?331.
[32] MISHRA A, COOK J, NURVITADHI E, et al. WRPN: training and inference using wide reduced-precision networks[EB/OL]. (2017?04?10)[2023?06?21]. https://arxiv.org/abs/1704.03079.pdf.
[33] WANG Kuan, LIU Zhijian, LIN Yujun, et al. HAQ: hardware-aware automated quantization with mixed precision[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 8604?8612.
[34] ZHANG Dongqing, YANG Jiaolong, YE D, et al. LQ-nets: learned quantization for highly accurate and compact deep neural networks[C]//European Conference on Computer Vision. Cham: Springer, 2018: 373?390.
[35] GONG Yunchao, LIU Liu, YANG Ming, et al. Compressing deep convolutional networks using vector quantization[EB/OL]. (2014?12?18)[2023?06?21]. https://arxiv.org/abs/1412.6115.pdf.
[36] TAILOR S A, FERNANDEZ-MARQUES J, LANE N D. Degree-quant: quantization-aware training for graph neural networks[EB/OL]. (2021?03?15)[2023?06?21]. https://arxiv.org/abs/2008.05000.pdf.
[37] SEIDE F, FU Hao, DROPPO J, et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs[C]//Interspeech 2014. ISCA: ISCA, 2014: 1?5.
[38] LI Conglong, AHMAD AWAN A, TANG Hanlin, et al. 1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed[C]//2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC). Piscataway: IEEE, 2023: 272?281.
[39] ALISTARH D, GRUBIC D, LI J Z, et al. QSGD: communication-efficient SGD via gradient quantization and encoding[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 1707?1718.
[40] GOU Jianping, YU Baosheng, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International journal of computer vision, 2021, 129(6): 1789–1819.
[41] BUCILU? C, CARUANA R, NICULESCU-MIZIL A. Model compression[C]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2006: 535?541.
[42] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015?03?09)[2023?06?21]. https://arxiv.org/abs/1503.02531.pdf.
[43] MIRZADEH S I, FARAJTABAR M, LI Ang, et al. Improved knowledge distillation via teacher assistant[J]. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(4): 5191–5198.
[44] REMERO A, BALLAS N, EBRAHIMI K, et al. FitNets: hints for thin deep Nets[EB/OL].(2015?03?27)[2023?06?21]. https://arxiv.org/abs/1412.6550.pdf.
[45] ZAGORUYKO S, KOMODAKIS N.Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL]. (2017?02?12)[2023?06?21]. https://arxiv.org/abs/1612.03928.pdf.
[46] YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7130?7138.
[47] XU Xixia, ZOU Qi, LIN Xue, et al. Integral knowledge distillation for multi-person pose estimation[J]. IEEE signal processing letters, 2020, 27: 436–440.
[48] ZHANG Linfeng, SONG Jiebo, GAO Anni, et al. Be your own teacher: improve the performance of convolutional neural networks via self distillation[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2020: 3712?3721.
[49] ZHANG Feng, HU Hong, DAI Hanbin, et al. Self-evolutionary pose distillation[C]//2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway: IEEE, 2020: 240?244.
[50] WU Min, MA Weihua, LI Yue, et al. Automatic optimization of super parameters based on model pruning and knowledge distillation[C]//2020 International Conference on Computer Engineering and Intelligent Control. Piscataway: IEEE, 2021: 111?116.
[51] WU Min, MA Weihua, LI Yue, et al. The optimization method of knowledge distillation based on model pruning[C]//2020 Chinese Automation Congress. Piscataway: IEEE, 2021: 1386?1390.
[52] AFLALO Y, NOY A, LIN Ming, et al. Knapsack pruning with inner distillation[EB/OL]. (2020?06?03)[2023?06?21]. https://arxiv.org/abs/2002.08258.pdf.
[53] ZHOU Zhaojing, ZHOU Yun, JIANG Zhuqing, et al. An efficient method for model pruning using knowledge distillation with few samples[C]//ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 2515?2519.
[54] WANG Bo, JIANG Qingji, SONG Dawei, et al. SAR vehicle recognition via scale-coupled Incep_Dense Network (IDNet)[J]. International journal of remote sensing, 2021, 42(23): 9109–9134.
[55] CHEN Shiqi, ZHAN Ronghui, WANG Wei, et al. Learning slimming SAR ship object detector through network pruning and knowledge distillation[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2020, 14: 1267–1282.
[56] WANG Zhen, DU Lan, LI Yi. Boosting lightweight CNNs through network pruning and knowledge distillation for SAR target recognition[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2021, 14: 8386–8397.
[57] AGHLI N, RIBEIRO E. Combining weight pruning and knowledge distillation for CNN compression[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2021: 3185?3192.
[58] PARK J, NO A. Prune your model before distill it[M]. Cham: Springer Nature Switzerland, 2022: 120?136.
[59] CHE Hongle, SHI Qirui, CHEN Juan, et al. HKDP: a hybrid approach on knowledge distillation and pruning for neural network compression[C]//2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway: IEEE, 2022: 188?193.
[60] CHEN Liyang, CHEN Yongquan, XI Juntong, et al. Knowledge from the original network: restore a better pruned network with knowledge distillation[J]. Complex & intelligent systems, 2022, 8(2): 709–718.
[61] YAO Weiwei, ZHANG Jie, LI Chen, et al. Semantic segmentation optimization algorithm based on knowledge distillation and model pruning[C]//2019 2nd International Conference on Artificial Intelligence and Big Data. Piscataway: IEEE, 2019: 261?265.
[62] CUI Baiyun, LI Yingming, ZHANG Zhongfei. Joint structured pruning and dense knowledge distillation for efficient transformer model compression[J]. Neurocomputing, 2021, 458: 56–69.
[63] XU Bangguo, ZHANG Tiankui, WANG Yapeng, et al. A knowledge- distillation - integrated pruning method for vision transformer[C]//2022 21st International Symposium on Communications and Information Technologies. Piscataway: IEEE, 2022: 210?215.
[64] WANG Tianzhe, WANG Kuan, CAI Han, et al. APQ: joint search for network architecture, pruning and quantization policy[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2075?2084.
[65] XU Rui, LUAN Siyu, GU Zonghua, et al. LRP-based policy pruning and distillation of reinforcement learning agents for embedded systems[C]//2022 IEEE 25th International Symposium on Real-Time Distributed Computing. Piscataway: IEEE, 2022: 1?8.
[66] LIU Yu, JIA Xuhui, TAN Mingxing, et al. Search to distill: pearls are everywhere but not the eyes[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7536?7545.
[67] SATTLER F, MARBAN A, RISCHKE R, et al. CFD: communication-efficient federated distillation via soft-label quantization and delta coding[J]. IEEE transactions on network science and engineering, 2021, 9(4): 2025–2038.
[68] RUSU A A, COLMENAREJO S G, GULCEHRE C, et al. Policy distillation[EB/OL]. (2016?01?07)[2023?06?21]. https://arxiv.org/abs/1511.06295.pdf.
[69] CHO J H, HARIHARAN B. On the efficacy of knowledge distillation[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2020: 4793?4801.
[70] LI Chenxing, ZHU Lei, XU Shuang, et al. Compression of acoustic model via knowledge distillation and pruning[C]//2018 24th International Conference on Pattern Recognition. Piscataway: IEEE, 2018: 2785?2790.
[71] PRAKOSA S W, LEU J S, CHEN Zhaohong. Improving the accuracy of pruned network using knowledge distillation[J]. Pattern analysis & applications, 2021, 24(2): 819–830.
[72] SHIN S, BOO Y, SUNG W. Knowledge distillation for optimization of quantized deep neural networks[C]//2020 IEEE Workshop on Signal Processing Systems (SiPS). Piscataway: IEEE, 2020: 1?6.
[73] MISHRA A, MARR D. Apprentice: using knowledge distillation techniques to improve low-precision network accuracy[EB/OL]. (2017?11?15)[2023?06?21]. https://arxiv.org/abs/1711.05852.pdf.
[74] MIN Rui, LAN Hai, CAO Zongjie, et al. A gradually distilled CNN for SAR target recognition[J]. IEEE access, 2019, 7: 42190–42200.
[75] KIM J, BHALGAT Y, LEE J, et al. QKD: quantization-aware knowledge distillation[EB/OL]. (2019?11?28)[2023?06?21]. https://arxiv.org/abs/1911.12491.pdf.
[76] OKUNO T, NAKATA Y, ISHII Y, et al. Lossless AI: toward guaranteeing consistency between inferences before and after quantization via knowledge distillation[C]//2021 17th International Conference on Machine Vision and Applications (MVA). Piscataway: IEEE, 2021: 1?5.
[77] SI Liang, LI Yuhai, ZHOU Hengyi, et al. Explore a novel knowledge distillation framework for network learning and low-bit quantization[C]//2021 China Automation Congress. Piscataway: IEEE, 2022: 3002?3007.
[78] KIM M, LEE S, HONG S J, et al. Understanding and improving knowledge distillation for quantization aware training of large transformer encoders[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022: 6713?6725.
[79] POLINO A, PASCANU R, ALISTARH D. Model compression via distillation and quantization[EB/OL]. (2018?02?15)[2023?06?21]. https://arxiv.org/abs/1802.05668.pdf.
[80] CHOI Y, CHOI J, EL-KHAMY M, et al. Data-free network quantization with adversarial knowledge distillation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE, 2020: 3047?3057.
[81] WEI Yi, PAN Xinyu, QIN Hongwei, et al. Quantization mimic: towards very tiny CNN for object detection[C]//European Conference on Computer Vision. Cham: Springer, 2018: 274?290.
[82] PAUPAMAH K, JAMES S, KLEIN R. Quantisation and pruning for neural network compression and regularisation[C]//2020 International SAUPEC/RobMech/PRASA Conference. Piscataway: IEEE, 2020: 1?6.
[83] LIBERATORI B, MAMI C A, SANTACATTERINA G, et al. YOLO-based face mask detection on low-end devices using pruning and quantization[C]//2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). Piscataway: IEEE, 2022: 900?905.
[84] CHANG Wanting, KUO C H, FANG Lichun. Variational channel distribution pruning and mixed-precision quantization for neural network model compression[C]//2022 International Symposium on VLSI Design, Automation and Test. Piscataway: IEEE, 2022: 1?3.
[85] ZHENG Yong, YANG Haigang, HUANG Zhihong, et al. A high energy-efficiency FPGA-based LSTM accelerator architecture design by structured pruning and normalized linear quantization[C]//2019 International Conference on Field-Programmable Technology. Piscataway: IEEE, 2020: 271?274.
[86] TUNG F, MORI G. CLIP-Q: deep network compression learning by In-parallel pruning-quantization[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7873?7882.
[87] QI Qi, LU Yan, LI Jiashi, et al. Learning low resource consumption CNN through pruning and quantization[J]. IEEE transactions on emerging topics in computing, 2022, 10(2): 886–903.
[88] WU J Y, YU Cheng, FU S W, et al. Increasing compactness of deep learning based speech enhancement models with parameter pruning and quantization techniques[J]. IEEE signal processing letters, 2019, 26(12): 1887–1891.
[89] HAN Song, MAO Huizi, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. (2016?02?15)[2023?06?21]. https://arxiv.org/abs/1510.00149.pdf.
[90] DENG Chunhua, LIAO Siyu, XIE Yi, et al. PermDNN: efficient compressed DNN architecture with permuted diagonal matrices[C]//2018 51st Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE, 2018: 189?202.
[91] KRISHNAMOORTHI R. Quantizing deep convolutional networks for efficient inference: a whitepaper[EB/OL]. (2018?06?21)[2023?06?21]. https://arxiv.org/abs/1806.08342.pdf.
[92] LIANG Tailin, GLOSSNER J, WANG Lei, et al. Pruning and quantization for deep neural network acceleration: a survey[J]. Neurocomputing, 2021, 461: 370–403.
[93] GIL Y, PARK J H, BAEK J, et al. Quantization-aware pruning criterion for industrial applications[J]. IEEE transactions on industrial electronics, 2022, 69(3): 3203–3213.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems