[1]丁贵广,陈辉,王澳,等.视觉深度学习模型压缩加速综述[J].智能系统学报,2024,19(5):1072-1081.[doi:10.11992/tis.202311011]
 DING Guiguang,CHEN Hui,WANG Ao,et al.Review of model compression and acceleration for visual deep learning[J].CAAI Transactions on Intelligent Systems,2024,19(5):1072-1081.[doi:10.11992/tis.202311011]
点击复制

视觉深度学习模型压缩加速综述

参考文献/References:
[1] DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248-255.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[3] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04) [2023-11-10]. https://arxiv.org/abs/1409.1556.
[4] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1-9.
[5] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[6] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2020-10-22) [2023-11-10]. https://arxiv.org/abs/2010.11929.
[7] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image Transformers & distillation through attention[C]//International Conference on Machine Learning. Virtual Event: PMLR, 2021: 10347-10357.
[8] LIU Ze, LIN Yutong, CAO Yue, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[9] DEHGHANI M, DJOLONGA J, MUSTAFA B, et al. Scaling vision Transformers to 22 billion parameters[C]//International Conference on Machine Learning. Honolulu: PMLR, 2023: 7480-7512.
[10] GOU Jianping, YU Baosheng, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International journal of computer vision, 2021, 129(6): 1789-1819.
[11] CHENG Yu, WANG Duo, ZHOU Pan, et al. A survey of model compression and acceleration for deep neural networks[J]. IEEE signal processing magazine, 2018, 35(1): 126-136.
[12] LIANG Tailin, GLOSSNER J, WANG Lei, et al. Pruning and quantization for deep neural network acceleration: a survey[J]. Neurocomputing, 2021, 461: 370-403.
[13] FORREST N, SONG Han, MATTHEW W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL]. (2016-11-04) [2023-11-10]. https://arxiv.org/abs/1602.07360.
[14] HOWARD A G, ZHU Menglong, CHEN Bo, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17) [2023-11-10]. https://arxiv.org/abs/1704.04861.
[15] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[16] HOWARD A, SANDLER M, CHEN Bo, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
[17] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[18] ZHANG Xiangyu, ZHOU Xinyu, LIN Mengxiao, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6848-6856.
[19] MA Ningning, ZHANG Xiangyu, ZHENG Haitao, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design[C]//European Conference on Computer Vision. Cham: Springer, 2018: 122-138.
[20] LI Dawei, WANG Xiaolong, KONG Deguang. DeepRebirth: accelerating deep neural network execution on mobile devices[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 2322-2330.
[21] PRABHU A, VARMA G, NAMBOODIRI A. Deep expander networks: efficient deep networks from graph theory[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 20-36.
[22] JEON Y, KIM J. Constructing fast network through deconstruction of convolution[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: ACM, 2018: 5955-5965.
[23] KIM E, AHN C, OH S. NestedNet: learning nested sparse structures in deep neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8669-8678.
[24] JIN Xiaojie, YANG Yingzhen, XU Ning, et al. WSNet: compact and efficient networks through weight sampling[C]//International Conference on Machine Learning. Stockholm: PMLR, 2018: 2352-2361.
[25] LECUN Y, DENKER J, SOLLA S. Optimal brain damage[C]//Advances in Neural Information Processing Systems. Denver: ACM, 1990: 598-605.
[26] SRINIVAS S, SUBRAMANYA A, BABU R V. Training sparse neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 455-462.
[27] HAN Song, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: ACM, 2015: 1135-1143.
[28] GUO Yiwen, YAO Anbang, CHEN Yurong. Dynamic network surgery for efficient DNNs[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona: ACM, 2016: 1387-1395.
[29] MOLCHANOV D, ASHUKHA A, VETROV D. Variational dropout sparsifies deep neural networksC]//Proceedings of the 34th International Conference on Machine Learning. Sydney: JMLR, 2017: 2498-2507.
[30] WEN Wei, WU Chunxia, WANG Yongan, et al. Learning structured sparsity in deep neural networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona: ACM, 2016: 2082–2090.
[31] LI Hao, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient ConvNets[EB/OL]. (2016-08-31) [2023-11-10]. https://arxiv.org/abs/1608.08710.
[32] YU Ruichi, LI Ang, CHEN Chunfu, et al. NISP: pruning networks using neuron importance score propagation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9194-9203.
[33] ZHUANG Zhuangwei, TAN Mingkui, ZHUANG Bohan, et al. Discrimination-aware channel pruning for deep neural networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: ACM 2018: 875-886.
[34] LIN Shaohui, JI R, LI Yuchao, et al. Accelerating convolutional networks via global & dynamic filter pruning[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 2018: 2425-2432.
[35] COURBARIAUX M, BENGIO Y, DAVID J P. Binaryconnect: Training deep neural networks with binary weights during propagations[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: ACM, 2015: 3123-3131.
[36] LIU Zechun, WU Baoyuan, LUO Wenhan, et al. Bi-real net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm[C]//European Conference on Computer Vision. Cham: Springer, 2018: 747-763.
[37] ZHU Chi, HAN Song, MAO Huimin, et al. Trained ternary quantization[EB/OL]. (2016-12-04) [2023-11-10]. https://arxiv.org/abs/1612.01064.
[38] ACHTERHOLD J, KOEHLER J, SCHMEINK A, et al. Variational network quantization[C]//International Conference on Learning Representations. Vancouver: ICLR, 2018: 1-18.
[39] ZHOU Shuchang, WU Yuxin, NI Zekun, et al. DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients[EB/OL]. (2016-06-20) [2023-11-10]. https://arxiv.org/abs/1606.06160.
[40] MISHRA A, NURVITADHI E, COOK J et al. WRPN: wide reduced-precision networks[EB/OL]. (2017-09-04) [2023-11-10]. https://arxiv.org/abs/1709.01134.
[41] ZHOU Aojun, YAO Anbang, GUO Yiwen, et al. Incremental network quantization: towards lossless CNNs with low-precision weights[EB/OL]. (2017-02-10) [2023-11-10]. https://arxiv.org/abs/1702.03044.
[42] BOYD S. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and trends in machine learning, 2010, 3(1): 1-122.
[43] HINTON G E, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015-03-09) [2023-11-10]. https://arxiv.org/abs/1503.02531.
[44] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. (2014-12-19) [2023-11-10]. https://arxiv.org/abs/1412.6550.
[45] CHEN Tianqi, GOODFELLOW I, SHLENS J. Net2Net: accelerating learning via knowledge transfer[EB/OL]. (2015-11-18) [2023-11-10]. https://arxiv.org/abs/1511.05641.
[46] LAN Xu, ZHU Xiatian, GONG Shaogang. Knowledge distillation by on-the-fly native ensemble[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: ACM, 2018: 7528-7538.
[47] LASSANCE C, BONTONOU M, HACENE G B, et al. Deep geometric knowledge distillation with graphs[C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 8484-8488.
[48] YOU Shan, XU Chang, XU Chao, et al. Learning from multiple teacher networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax: ACM, 2017: 1285-1294.
[49] WANG Yunhe, XU Chang, XU Chao, et al. Adversarial learning of portable student networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 4260-4267.
[50] MIRZADEH S I, FARAJTABAR M, LI Ang, et al. Improved knowledge distillation via teacher assistant[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2020: 5191-5198.
[51] HE Yang, KANG Guoliang, DONG Xuanyi, et al. Soft filter pruning for accelerating deep convolutional neural networks[EB/OL]. (2018-08-21) [2023-11-10]. https://arxiv.org/abs/1808.06866.
[52] JACOB B, KLIGYS S, CHEN Bo, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2704-2713.
[53] ZHAO Borui, CUI Quan, SONG Renjie, et al. Decoupled knowledge distillation[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11943-11952.
[54] ZHU Mingjian, TANG Yehui, HAN Kai. Vision Transformer pruning[EB/OL]. (2021-04-17) [2023-11-10]. https://arxiv.org/abs/2104.08500.
[55] SONG Zhuoran, XU Yihong, HE Zhezhi, et al. CP-ViT: cascade vision Transformer pruning via progressive sparsity prediction[EB/OL]. (2022-03-09) [2023-11-10]. https://arxiv.org/abs/2203.04570.
[56] YU Fang, HUANG Kun, WANG Meng, et al. Width & depth pruning for vision Transformers[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022: 3143-3151.
[57] YANG Huanrui, YIN Hongxu, SHEN Maying, et al. Global vision Transformer pruning with hessian-aware saliency[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 18547-18557.
[58] YU Shixing, CHEN Tianlong, SHEN Jiayi, et al. Unified visual Transformer compression[EB/OL]. (2022-03-15) [2023-11-10]. https://arxiv.org/abs/2203.08243.
[59] ZHENG Chuanyang, ZHANG Kai, YANG Zhi, et al. SAViT: structure-aware vision Transformer pruning via collaborative optimization[C]//Proceedings of the 36th International Conference on Neural Information Processing System. New Orleans: ACM, 2022: 9010-9023.
[60] YIN Miao, UZKENT B, SHEN Yilin, et al. GOHSP: a unified framework of graph and optimization-based heterogeneous structured pruning for vision Transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023: 10954-10962.
[61] HOU Zejiang, KUNG S Y. Multi-dimensional model compression of vision Transformer[C]//2022 IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1-6.
[62] HE Haoyu, LIU Jing, PAN Zizheng, et al. Pruning self-attentions into convolutional layers in single path[EB/OL]. (2021-11-23) [2023-11-10]. https://arxiv.org/abs/2111.11802.
[63] CHEN Tianlong, CHENG Yu, GAN Zhe, et al. Chasing sparsity in vision Transformers: an end-to-end exploration[C]//Proceedings of the 35th International Conference on Neural Information Processing System. Online: ACM, 2021: 19974-19988.
[64] WANG Zhenyu, LUO Haowen, WANG Pichao, et al. VTC-LFC: vision Transformer compression with low-frequency components[C]//Proceedings of the 36th International Conference on Neural Information Processing Syste. New Orleans: ACM, 2022: 13974-13988.
[65] RAO Yongming, ZHAO Wenliang, LIU Benlin, et al. DynamicViT: efficient vision Transformers with dynamic token sparsification[C]//Proceedings of the 35th International Conference on Neural Information Processing System. Online: ACM, 2021: 13937-13949.
[66] KONG Zhenglun, DONG Peiyan, MA Xiaolong, et al. SPViT: enabling faster vision Transformers via Latency-aware soft token pruning[C]//Lecture Notes in Computer Science. Cham: Springer, 2022: 620-640.
[67] XU Yifan, ZHANG Zhijie, ZHANG Mengdan, et al. Evo-ViT: slow-fast token evolution for dynamic vision Transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022: 2964-2972.
[68] ZONG Zhuofan, LI Kunchang, SONG Guanglu, et al. Self-slimmed vision Transformer[C]//Lecture Notes in Computer Science. Cham: Springer, 2022: 432-448.
[69] BOLYA D, FU Chengyang, DAI Xiaoliang, et al. Token merging: your ViT but faster[EB/OL]. (2022-10-17) [2023-11-10]. https://arxiv.org/abs/2210.09461.
[70] WEI Siyuan, YE Tianzhu, ZHANG Shen, et al. Joint token pruning and squeezing towards more aggressive compression of vision Transformers[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 2092-2101.
[71] CHEN Mengzhao, SHAO Wenqi, XU Peng, et al. DiffRate: differentiable compression rate for efficient vision Transformers[EB/OL]. (2023-05-29) [2023-11-10]. https://arxiv.org/abs/2305.17997.
相似文献/References:
[1]张欣培,周尧,章毅.用于胎儿超声切面识别的知识蒸馏方法[J].智能系统学报,2022,17(1):181.[doi:10.11992/tis.202105007]
 ZHANG Xinpei,ZHOU Yao,ZHANG Yi.Knowledge distillation method for fetal ultrasound section identification[J].CAAI Transactions on Intelligent Systems,2022,17():181.[doi:10.11992/tis.202105007]
[2]宁欣,赵文尧,宗易昕,等.神经网络压缩联合优化方法的研究综述[J].智能系统学报,2024,19(1):36.[doi:10.11992/tis.202306042]
 NING Xin,ZHAO Wenyao,ZONG Yixin,et al.An overview of the joint optimization method for neural network compression[J].CAAI Transactions on Intelligent Systems,2024,19():36.[doi:10.11992/tis.202306042]
[3]林孙旗,徐家梦,郑瑜杰,等.面向掌纹掌静脉识别网络轻量化的非对称双模态融合方法[J].智能系统学报,2024,19(5):1190.[doi:10.11992/tis.202212031]
 LIN Sunqi,XU Jiameng,ZHENG Yujie,et al.An asymmetric bimodal fusion method for lightweight palm print and palm vein recognition network[J].CAAI Transactions on Intelligent Systems,2024,19():1190.[doi:10.11992/tis.202212031]
[4]王健宗,张旭龙,姜桂林,等.基于分层联邦框架的音频模型生成技术研究[J].智能系统学报,2024,19(5):1331.[doi:10.11992/tis.202306054]
 WANG Jianzong,ZHANG Xulong,JIANG Guilin,et al.Research on audio model generation technology based on a hierarchical federated framework[J].CAAI Transactions on Intelligent Systems,2024,19():1331.[doi:10.11992/tis.202306054]

备注/Memo

收稿日期:2023-11-10。
基金项目:国家自然科学基金项目(61925107,62271281);浙江省自然科学基金项目(LDT23F01013F01).
作者简介:丁贵广,教授,博士,主要研究方向为多媒体信息处理、计算机视觉感知。主持和参与国家自然科学基金面上项目等国家级项目数十项。曾获国家科技进步奖二等奖、吴文俊人工智能科技进步奖一等奖、中国电子学会技术发明奖一等奖等。发表学术论文近百篇,引用量超17000次。E-mail:dinggg@tsinghua.edu.cn;陈辉,助理研究员,主要研究方向为计算机视觉、多媒体信息处理。主持国家自然科学基金面上项目1项、科技部—“新一代人工智能2030”子课题1项。E-mail:jichenhui2012@gmail.com;王澳, 博士研究生,主要研究方向为深度学习模型设计和优化。E-mail:wa22@mails.tsinghua.edu.cn。
通讯作者:陈辉. E-mail:jichenhui2012@gmail.com

更新日期/Last Update: 2024-09-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com