<-Previous Article Next Article->

[1]JIANG Wentao,YOU Zhuocheng,ZHANG Shengchong.Dynamic mask convolution for image classification networks[J].CAAI Transactions on Intelligent Systems,2026,21(2):423-434.[doi:10.11992/tis.202503019]

Copy

Dynamic mask convolution for image classification networks

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 21 Number of periods: 2026 2 Page number: 423-434 Column: 学术论文—机器感知与模式识别 Public date: 2026-03-05

Title:: Dynamic mask convolution for image classification networks

Author(s):: JIANG Wentao¹; YOU Zhuocheng¹; ZHANG Shengchong²; 1. College of Software, Liaoning Technology University, Huludao 125105, China;
2. Science and Technology on Electro-Optical Information Security Control Laboratory, Tianjin 300308, China

Keywords:: image classification; masking mechanism; residual networks; dynamic mask convolution; dilated convolution; attention mechanism; feature fusion; feature extraction

CLC:: TP391

DOI:: 10.11992/tis.202503019

Abstract:: Aiming at the problems of traditional image classification methods in complex scenes, such as weak feature adaptability, limited ability to capture multi-scale information, and insufficient ability to express detailed features, an image classification network based on dynamic mask convolution is proposed. Firstly, the multi-branch mask convolution fusion module is designed, which combines the multi-branch structure with the dynamic mask mechanism to realize the fusion of different scale information, and dynamically selects and strengthens the key features according to the context information of the input image, so as to improve the feature extraction ability of the network. Secondly, the adaptive enhancement module is introduced in the residual learning, and the feature weights are adaptively adjusted by integrating the pixel-level and channel level attention mechanisms to accurately capture the important details in the image. Through experiments on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof datasets, the classification accuracy of 96.85%, 82.39%, 97.88%, 93.35% and 85.93% respectively, which is significantly better than the traditional image classification methods. The network can show excellent and stable classification performance in the face of diverse image features and complex scenes, and provides a new idea for the application of deep learning in the field of image classification.

References:: [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]// 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 412-420.
[3] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1-9.
[4] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[5] ZAGORUYKO S, KOMODAKIS N. Wide residual networks[EB/OL]. (2016-05-23) [2025-03-12]. https://arxiv.org/abs/1605.07146.
[6] ABDI M, NAHAVANDI S. Multi-residual networks: improving the speed and accuracy of residual networks[EB/OL]. (2016-09-19) [2025-03-12]. https://arxiv.org/pdf/1609.05672.pdf.
[7] WANG Ao, CHEN Hui, LIN Zijia, et al. LSNet: see large, focus small[C]//2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2025: 9718-9729.
[8] YANG Jiangnan, LIU Shuangli, WU Jingjun, et al. Pinwheel-shaped convolution and scale-based dynamic loss for infrared small target detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2025, 39(9): 9202-9210
[9] TAN Mingxing, LE Q V. EfficientNetV2: smaller models and faster training[C]//International Conference on Machine Learning. Virtual: PMLR, 2021: 13-24.
[10] YU Weihao, ZHOU Pan, YAN Shuicheng, et al. InceptionNeXt: when inception meets ConvNeXt[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 5672-5683.
[11] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
[12] LUO Zhengbo, SUN Zitang, ZHOU Weilian, et al. Rethinking ResNets: improved stacking strategies with high-order schemes for image classification[J]. Complex & intelligent systems, 2022, 8(4): 3395-3407
[13] 许新征, 李杉. 基于特征膨胀卷积模块的轻量化技术研究[J]. 电子学报, 2023, 51(2): 355-364 XU Xinzheng, LI Shan. Research of lightweight convolution neural network based on feature expansion convolution[J]. Acta electronica sinica, 2023, 51(2): 355-364
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 6000-6010.
[15] DAI Zihang, LIU Hanxiao, LE Q V, et al. CoAtNet: marrying convolution and attention for all data sizes[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(9): 3201-3212
[16] CAO Yue, XU Jiarui, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul: IEEE, 2019: 1971-1980.
[17] 赵凤, 耿苗苗, 刘汉强, 等. 卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法[J]. 电子与信息学报, 2024, 46(5): 2237-2248 ZHAO Feng, GENG Miaomiao, LIU Hanqiang, et al. Convolutional neural network and vision transformer-driven cross-layer multi-scale fusion network for hyperspectral image classification[J]. Journal of electronics & information technology, 2024, 46(5): 2237-2248
[18] WU Gang, JIANG Junjun, JIANG Kui, et al. DSwinIR: rethinking window-based attention for image restoration[J]. IEEE transactions on pattern analysis and machine intelligence, 2025: 1-18.
[19] 刘万军, 赵思琪, 曲海成, 等. 结合前景特征增强与区域掩码自注意力的细粒度图像分类[J]. 智能系统学报, 2022, 17(6): 1134-1144 LIU Wanjun, ZHAO Siqi, QU Haicheng, et al. Combining foreground feature reinforcement and region mask self-attention for fine-grained image classification[J]. CAAI transactions on intelligent systems, 2022, 17(6): 1134-1144
[20] KANG Ming, TING C M, TING F F, et al. ASF-YOLO: a novel YOLO model with attentional scale sequence fusion for cell instance segmentation[J]. Image and vision computing, 2024, 147: 105057
[21] LU Liping, XIONG Qian, XU Bingrong, et al. MixDehazeNet: mix structure block for image dehazing network[C]//2024 International Joint Conference on Neural Networks. Yokohama: IEEE, 2024: 1-10.
[22] CUBUK E D, ZOPH B, SHLENS J, et al. AutoAugment: learning augmentation policies from data[C]//International Conference on Machine Learning. Los Angeles: PMLR, 2019: 874-883.
[23] ZHONG Zhun, ZHENG Liang, KANG Guoliang, et al. Random erasing data augmentation[J]. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(7): 13001-13008
[24] LOSHCHILOV I, HUTTER F. SGDR: stochastic gradient descent with warm restarts[C]//International Conference on Learning Representations. Toulon: OpenReview.net, 2017: 1-16.
[25] HAN Kai, WANG Yunhe, TIAN Qi, et al. GhostNet: more features from cheap operations[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586.
[26] ZHOU Chenlin, ZHANG Han, ZHOU Zhaokun, et al. QKFormer: query-key interaction for efficient vision Transformers[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 1700-1709.
[27] MA Chenxiang, WU Jibin, SI Chenyang, et al. Scaling supervised local learning with augmented auxiliary networks[C]//International conference on learning representations. Vienna: OpenReview. net, 2024: 1-18.
[28] WU Xidong, GAO Shangqian, ZHANG Zeyu, et al. Auto- train-once: controller network guided automatic network pruning from scratch[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16163-16173.
[29] 邱云飞, 张家欣, 兰海, 等. 融合张量合成注意力的改进ResNet图像分类模型[J]. 激光与光电子学进展, 2023, 60(6): 97-106 QIU Yunfei, ZHANG Jiaxin, LAN Hai, et al. Improved ResNet image classification model based on tensor synthesis attention[J]. Laser & optoelectronics progress, 2023, 60(6): 97-106
[30] 姜文涛, 陈晨, 张晟翀. 空间位置矫正的稀疏特征图像分类网络[J]. 光电工程, 2024, 51(5): 240050 JIANG Wentao, CHEN Chen, ZHANG Shengchong. Sparse feature image classification network with spatial position correction[J]. Opto-electronic engineering, 2024, 51(5): 240050
[31] 袁姮, 刘杰, 姜文涛, 等. 特征重排列注意力机制的双池化残差分类网络[J]. 中国图象图形学报, 2025, 30(1): 110-129 YUAN Heng, LIU Jie, JIANG Wentao, et al. Double-pooling residual classification network based on feature reordering attention mechanism[J]. Journal of image and graphics, 2025, 30(1): 110-129

Similar References:

Memo

Last Update: 1900-01-01

Dynamic mask convolution for image classification networks PDF DownloadHTML

Memo

Dynamic mask convolution for image classification networks

PDF Download HTML