[1]JIANG Wentao,WANG Xinjie,ZHANG Shengchong.Spatially constrained attention mechanism for image classification network[J].CAAI Transactions on Intelligent Systems,2025,20(6):1444-1460.[doi:10.11992/tis.202505025]
Copy

Spatially constrained attention mechanism for image classification network

References:
[1] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[2] DING Xiaohan, ZHANG Xiangyu, HAN Jungong, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953-11965.
[3] TAN Mingxing, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. (2019-05-28)[2020-09-11]. http://arxiv.org/pdf/1905.11946.pdf.
[4] LIU Xinyu, PENG Houwen, ZHENG Ningxin, et al. EfficientViT: memory efficient vision Transformer with cascaded group attention[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 14420-14430.
[5] 姜文涛, 张大鹏. 优化分类的弱目标孪生网络跟踪研究[J]. 智能系统学报, 2023, 18(5): 984-993.
JIANG Wentao, ZHANG Dapeng. Research on weak object tracking based on Siamese network with optimized classification[J]. CAAI transactions on intelligent systems, 2023, 18(5): 984-993.
[6] 刘晓敏, 余梦君, 乔振壮, 等. 面向多源遥感数据分类的尺度自适应融合网络[J]. 电子与信息学报, 2024, 46(9): 3693-3702.
LIU Xiaomin, YU Mengjun, QIAO Zhenzhuang, et al. Scale adaptive fusion network for multimodal remote sensing data classification[J]. Journal of electronics & information technology, 2024, 46(9): 3693-3702.
[7] 刘佳, 宋泓, 陈大鹏, 等. 非语言信息增强和对比学习的多模态情感分析模型[J]. 电子与信息学报, 2024, 46(8): 3372-3381.
LIU Jia, SONG Hong, CHEN Dapeng, et al. A multimodal sentiment analysis model enhanced with non-verbal information and contrastive learning[J]. Journal of electronics & information technology, 2024, 46(8): 3372-3381.
[8] 王柳, 梁铭炬. 融合深度信息的室内场景分割算法[J]. 计算机系统应用, 2024, 33(3): 111-117.
WANG Liu, LIANG Mingju. Indoor scene segmentation algorithm based on fusion of deep information[J]. Computer systems and applications, 2024, 33(3): 111-117.
[9] ZHAO Youpeng, TANG Huadong, JIANG Yingying, et al. Parameter-efficient vision Transformer with linear attention[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1275-1279.
[10] SARKAR R, LIANG Hanxue, FAN Zhiwen, et al. Edge-MoE: memory-efficient multi-task vision Transformer architecture with task-level sparsity via mixture-of-experts[C]//2023 IEEE/ACM International Conference on Computer Aided Design. San Francisco: IEEE, 2023: 1-9.
[11] WANG Wenxiao, CHEN Wei, QIU Qibo, et al. CrossFormer: a versatile vision Transformer hinging on cross-scale attention[J]. IEEE transactions on pattern analysis and machine intelligence, 2024, 46(5): 3123-3136.
[12] 姜文涛, 孟庆姣. 自适应时空正则化的相关滤波目标跟踪[J]. 智能系统学报, 2023, 18(4): 754-763.
JIANG Wentao, MENG Qingjiao. Correlation filter tracking for adaptive spatiotemporal regularization[J]. CAAI transactions on intelligent systems, 2023, 18(4): 754-763.
[13] YANG Jian, LI Chen, LI Xuelong. Underwater image restoration with light-aware progressive network[C]//ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1-5.
[14] LI Zixuan, WANG Yuangen. Optimizing Transformer for large-hole image inpainting[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1180-1184.
[15] CHEN Xiangyu, WANG Xintao, ZHANG Wenlong, et al. Hat: hybrid attention Transformer for image restoration[EB/OL]. (2023-09-11)[2025-10-01]. https://arxiv.org/abs/2309.05239.
[16] JI Jiahuan, ZHONG Baojiang, SONG Weigang, et al. Learning multi-scale features for jpeg image artifacts removal[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1565-1569.
[17] LIU Yifeng, TIAN Jing. Probabilistic attention map: a probabilistic attention mechanism for convolutional neural networks[J]. Sensors, 2024, 24(24): 8187.
[18] POLANSKY M G, HERRMANN C, HUR J, et al. Boundary attention: learning curves, corners, junctions and grouping[EB/OL]. (2024-01-01)[2025-10-01]. https://arxiv.org/abs/2401.00935.
[19] XIAO Da, MENG Qingye, LI Shengping, et al. Improving Transformers with dynamically composable multi-head attention[EB/OL]. (2024-05-17)[2025-10-01]. https://arxiv.org/abs/2405.08553.
[20] YU Xiang, GUO Hongbo, YUAN Ying, et al. An improved medical image segmentation framework with Channel-Height-Width-Spatial attention module[J]. Engineering applications of artificial intelligence, 2024, 136: 108751.
[21] ZAGORUYKO S, KOMODAKIS N. Wide residual networks[EB/OL]. (2016-05-23)[2025-10-01]. https://arxiv.org/abs/1605.07146.
[22] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531–11539.
[23] PARMAR N, VASWANI A, USZKOREIT J, et al. Image Transformer[C]//International Conference on Machine Learning. Stockholm: PMLR, 2018: 4055-4064.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[25] CHIEN Y. Pattern classification and scene analysis[J]. IEEE transactions on automatic control, 1974, 19(4): 462-463.
[26] SHARMA N, JAIN V, MISHRA A. An analysis of convolutional neural networks for image classification[J]. Procedia computer science, 2018, 132: 377-384.
[27] NETZER Y, WANG T, COATES A, et al. The street view house numbers (SVHN) dataset[EB/OL]. (2011-12-12)[2023-05-04]. http://ufldl.stanford.edu/housenumbers/.
[28] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. The German traffic sign recognition benchmark [EB/OL]. (2012-03-16)[2023-05-04]. http://benchmark.ini.rub.de/?section=gtsrb&subsection=news.
[29] HU Jie, SHEN Li, SAMUEL A, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(8): 1
[30] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 510-519.
[31] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018: 3-19.
[32] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.
[33] DAI Jifeng, QI Haozhi, XIONG Yuwen, et al. Deformable convolutional networks[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 764-773.
[34] YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference[EB/OL]. (2019-04-10)[2024-10-12]. https://arxiv.org/abs/1904.04971.
[35] LUU M L, HUANG Zeyi, XING E P, et al. Expeditious sali ency-guided mix-up through random gradient Threshold ing[EB/OL]. (2022-12-09)[2024-10-12]. https://arxiv.org/abs/2212.04875.
[36] 郭玉荣, 张珂, 王新胜, 等. 端到端双通道特征重标定DenseNet图像分类[J]. 中国图象图形学报, 2020, 25(3): 486-497.
GUO Yurong, ZHANG Ke, WANG Xinsheng, et al. Image classification method based on end-to-end dual feature reweight DenseNet[J]. Journal of image and graphics, 2020, 25(3): 486-497.
[37] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2024-10-12]. https://arxiv.org/abs/1409.1556.
[38] HASSANI A, WALTON S, SHAH N, et al. Escaping the big data paradigm with compact Transformers[EB/OL]. (2022-06-07)[2024-10-12]. https://arxiv.org/abs/2104.05704.
[39] ZHOU C L, ZHANG H, ZHOU Z K, et al. QKFormer: hierarchical spiking Transformer using Q-K attention[EB/OL]. (2024-03-25)[2024-10-08]. https://arxiv.org/abs/2403.16552.
[40] CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Rethinking attention with performers[EB/OL]. (2020-09-30)[2024-01-05]. https://arxiv.org/pdf/2009.14794.pdf.
[41] LAN Hai, WANG Xihao, SHEN Hao, et al. Couplformer: rethinking vision Transformer with coupling attention[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 6464-6473.
[42] 谢奕涛, 苏鹭梅, 杨帆, 等. 面向目标类别分类的无数据知识蒸馏方法[J]. 中国图象图形学报, 2024, 29(11): 3401-3416.
XIE Yitao, SU Lumei, YANG Fan, et al. Data-free knowledge distillation for target class classification[J]. Journal of Image and Graphics, 2024, 29(11): 3401-3416.
[43] 柴智, 丁春涛, 郭慧, 等. CN2Conv: 面向物联网设备的强鲁棒CNN设计方法[J]. 计算机应用研究, 2025, 42(7): 2154-2160
CHAI Zhi, DING Chuntao, GUO Hui, et al. Combined non-linearity convolution kernel generation: strong robust CNN design method based on IoT[J]. Application research of computers, 2025, 42(7): 2154-2160
[44] 宫智宇, 王士同. 面向重尾噪声图像分类的残差网络学习方法[J/OL]. 计算机应用. [2025-10-02]. https://doi.org/10.11772/j.issn.1001-9081.2024101407.
GONG Zhiyu, WANG Shitong. Residual network learning method for image classification under heavy-tail noise[J/OL]. Computer applications. [2025-10-02]. https://doi.org/10.11772/j.issn.1001-9081.2024101407.
[45] 杨育婷, 李玲玲, 刘旭, 等. 基于多尺度-多方向Transformer的图像识别[J]. 计算机学报, 2025, 48(2): 249-265.
YANG Yuting, LI Lingling, LIU Xu, et al. Multi-scale and multi-directional Transformer-based image recognition[J]. Chinese journal of computers, 2025, 48(2): 249-265.
[46] 朱秋慧, 杨靖, 黄若愚, 等. 基于部分卷积的多尺度特征卷积神经网络模型[J/OL]. 无线电通信技术. [2025-05-21]. http://kns.cnki.net/kcms/detail/13.1099.TN.20250310.1707.012.html.
Zhu Qiuhui, Yang Jing, Huang Ruoyu, et al. Partial convolution-based multi-scale feature convolutional neural network model[J/OL]. Radio communications technology. [2025-05-21]. http://kns.cnki.net/kcms/detail/13.1099.TN.20250310.1707.012.html.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems