<-上一篇/Previous Article 下一篇/Next Article->

[1]姜文涛,王鑫杰,张晟翀.空间约束注意力机制的图像分类网络[J].智能系统学报,2025,20(6):1444-1460.[doi:10.11992/tis.202505025]
　JIANG Wentao,WANG Xinjie,ZHANG Shengchong.Spatially constrained attention mechanism for image classification network[J].CAAI Transactions on Intelligent Systems,2025,20(6):1444-1460.[doi:10.11992/tis.202505025]

点击复制

空间约束注意力机制的图像分类网络

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第6期页码: 1444-1460 栏目: 学术论文—机器感知与模式识别出版日期: 2025-11-05

Title:: Spatially constrained attention mechanism for image classification network

作者:: 姜文涛¹, 王鑫杰¹, 张晟翀²; 1. 辽宁工程技术大学软件学院, 辽宁葫芦岛 125105;
2. 光电信息控制和安全技术重点实验室天津, 300308

Author(s):: JIANG Wentao¹, WANG Xinjie¹, ZHANG Shengchong²; 1. School of Software, Liaoning Technical University, Huludao 125105, China;
2. Key Laboratory of Optoelectronic Information Control and Security Technology, Tianjin 300308, China

关键词:: 图像分类; 空间约束注意力机制; 边缘感知卷积; 随机池化; 空间信息; 边缘特征; 特征融合; 残差网络

Keywords:: image classification; spatially constrained attention mechanism; edge aware convolution; stochastic pooling; spatial information; edge features; feature fusion; residual network

分类号:: TP391

DOI:: 10.11992/tis.202505025

摘要:: 针对分类网络中低阶特征提取不充分和特征图空间位置加权不足的问题，本文提出了一种空间约束注意力机制的图像分类网络（spatially constrained attention mechanism for image classification network，SCAM-Net）。SCAM-Net网络以WideResnet-28-10残差网络为基础架构。本文提出空间约束注意力机制（spatial constrained attention mechanism，SCA），通过引入空间约束机制和动态加权策略，显著增强了特征图的空间位置感知能力，使网络能够更精准地聚焦于关键区域，从而优化特征表示质量，提升模型在复杂场景下的判别能力。提出了边缘感知卷积（edge aware convolution，EAConv），通过融合Sobel算子和不同尺寸的卷积核，实现了对跨层次信息的整合，解决了原模型中首层卷积对边缘特征提取能力不足的问题。实验结果表明，在CIFAR-100、CIFAR-10、SVHN和GTSRB 4种数据集上，SCAM-Net相较于基线模型WideResnet-28-10在分类准确率上分别提升了2.43%、0.93%、0.14%和0.91%；同时，相比于性能排名第2的QKFormer网络在4种数据集上的分类准确率分别提高了0.13%、0.10%、0.12%和0.34%。空间约束注意力机制和边缘感知卷积相互协作，使得SCAM-Net在处理图像时能够更准确地关注图像中的复杂细节，有效提升图像分类精度。

Abstract:: This paper addresses two major issues in image classification networks: insufficient low-level feature extraction and inadequate spatial weighting of feature maps. A novel image classification network named SCAM-Net (spatially constrained attention Mechanism for Image Classification Network) is proposed. SCAM-Net is built upon the WideResNet-28-10 architecture. First, a Spatial-Constrained Attention (SCA) mechanism is introduced. By incorporating a spatial constraint strategy and a dynamic weighting approach, SCA significantly enhances the network’s ability to perceive spatial positions in feature maps. This enables the model to focus more precisely on critical regions and improves the quality of feature representation, leading to better discrimination in complex scenarios. Second, an Edge-Aware Convolution (EAConv) is developed. EAConv integrates Sobel operators with convolutions of multiple kernel sizes to capture multi-level edge information, thereby compensating for the weak edge feature extraction capability in the original first convolutional layer. Experimental results demonstrate that SCAM-Net outperforms the baseline WideResNet-28-10 by 2.43%, 0.93%, 0.14%, and 0.91% on CIFAR-100, CIFAR-10, SVHN, and GTSRB datasets, respectively. Compared with the second-best model QKFormer, SCAM-Net achieves 0.13%, 0.10%, 0.12%, and 0.34% higher classification accuracy on the same datasets. These results confirm that the collaboration between the spatial-constrained attention mechanism and the edge-aware convolution allows SCAM-Net to better capture fine-grained visual details and effectively improve image classification performance.

参考文献/References:: [1] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[2] DING Xiaohan, ZHANG Xiangyu, HAN Jungong, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953-11965.
[3] TAN Mingxing, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. (2019-05-28)[2020-09-11]. http://arxiv.org/pdf/1905.11946.pdf.
[4] LIU Xinyu, PENG Houwen, ZHENG Ningxin, et al. EfficientViT: memory efficient vision Transformer with cascaded group attention[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 14420-14430.
[5] 姜文涛, 张大鹏. 优化分类的弱目标孪生网络跟踪研究[J]. 智能系统学报, 2023, 18(5): 984-993.
JIANG Wentao, ZHANG Dapeng. Research on weak object tracking based on Siamese network with optimized classification[J]. CAAI transactions on intelligent systems, 2023, 18(5): 984-993.
[6] 刘晓敏, 余梦君, 乔振壮, 等. 面向多源遥感数据分类的尺度自适应融合网络[J]. 电子与信息学报, 2024, 46(9): 3693-3702.
LIU Xiaomin, YU Mengjun, QIAO Zhenzhuang, et al. Scale adaptive fusion network for multimodal remote sensing data classification[J]. Journal of electronics & information technology, 2024, 46(9): 3693-3702.
[7] 刘佳, 宋泓, 陈大鹏, 等. 非语言信息增强和对比学习的多模态情感分析模型[J]. 电子与信息学报, 2024, 46(8): 3372-3381.
LIU Jia, SONG Hong, CHEN Dapeng, et al. A multimodal sentiment analysis model enhanced with non-verbal information and contrastive learning[J]. Journal of electronics & information technology, 2024, 46(8): 3372-3381.
[8] 王柳, 梁铭炬. 融合深度信息的室内场景分割算法[J]. 计算机系统应用, 2024, 33(3): 111-117.
WANG Liu, LIANG Mingju. Indoor scene segmentation algorithm based on fusion of deep information[J]. Computer systems and applications, 2024, 33(3): 111-117.
[9] ZHAO Youpeng, TANG Huadong, JIANG Yingying, et al. Parameter-efficient vision Transformer with linear attention[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1275-1279.
[10] SARKAR R, LIANG Hanxue, FAN Zhiwen, et al. Edge-MoE: memory-efficient multi-task vision Transformer architecture with task-level sparsity via mixture-of-experts[C]//2023 IEEE/ACM International Conference on Computer Aided Design. San Francisco: IEEE, 2023: 1-9.
[11] WANG Wenxiao, CHEN Wei, QIU Qibo, et al. CrossFormer: a versatile vision Transformer hinging on cross-scale attention[J]. IEEE transactions on pattern analysis and machine intelligence, 2024, 46(5): 3123-3136.
[12] 姜文涛, 孟庆姣. 自适应时空正则化的相关滤波目标跟踪[J]. 智能系统学报, 2023, 18(4): 754-763.
JIANG Wentao, MENG Qingjiao. Correlation filter tracking for adaptive spatiotemporal regularization[J]. CAAI transactions on intelligent systems, 2023, 18(4): 754-763.
[13] YANG Jian, LI Chen, LI Xuelong. Underwater image restoration with light-aware progressive network[C]//ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1-5.
[14] LI Zixuan, WANG Yuangen. Optimizing Transformer for large-hole image inpainting[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1180-1184.
[15] CHEN Xiangyu, WANG Xintao, ZHANG Wenlong, et al. Hat: hybrid attention Transformer for image restoration[EB/OL]. (2023-09-11)[2025-10-01]. https://arxiv.org/abs/2309.05239.
[16] JI Jiahuan, ZHONG Baojiang, SONG Weigang, et al. Learning multi-scale features for jpeg image artifacts removal[C]//2023 IEEE International Conference on Image Processing. Kuala Lumpur: IEEE, 2023: 1565-1569.
[17] LIU Yifeng, TIAN Jing. Probabilistic attention map: a probabilistic attention mechanism for convolutional neural networks[J]. Sensors, 2024, 24(24): 8187.
[18] POLANSKY M G, HERRMANN C, HUR J, et al. Boundary attention: learning curves, corners, junctions and grouping[EB/OL]. (2024-01-01)[2025-10-01]. https://arxiv.org/abs/2401.00935.
[19] XIAO Da, MENG Qingye, LI Shengping, et al. Improving Transformers with dynamically composable multi-head attention[EB/OL]. (2024-05-17)[2025-10-01]. https://arxiv.org/abs/2405.08553.
[20] YU Xiang, GUO Hongbo, YUAN Ying, et al. An improved medical image segmentation framework with Channel-Height-Width-Spatial attention module[J]. Engineering applications of artificial intelligence, 2024, 136: 108751.
[21] ZAGORUYKO S, KOMODAKIS N. Wide residual networks[EB/OL]. (2016-05-23)[2025-10-01]. https://arxiv.org/abs/1605.07146.
[22] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531–11539.
[23] PARMAR N, VASWANI A, USZKOREIT J, et al. Image Transformer[C]//International Conference on Machine Learning. Stockholm: PMLR, 2018: 4055-4064.
[24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[25] CHIEN Y. Pattern classification and scene analysis[J]. IEEE transactions on automatic control, 1974, 19(4): 462-463.
[26] SHARMA N, JAIN V, MISHRA A. An analysis of convolutional neural networks for image classification[J]. Procedia computer science, 2018, 132: 377-384.
[27] NETZER Y, WANG T, COATES A, et al. The street view house numbers (SVHN) dataset[EB/OL]. (2011-12-12)[2023-05-04]. http://ufldl.stanford.edu/housenumbers/.
[28] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. The German traffic sign recognition benchmark [EB/OL]. (2012-03-16)[2023-05-04]. http://benchmark.ini.rub.de/?section=gtsrb&subsection=news.
[29] HU Jie, SHEN Li, SAMUEL A, et al. Squeeze-and-excitation networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(8): 1
[30] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 510-519.
[31] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich: Springer, 2018: 3-19.
[32] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.
[33] DAI Jifeng, QI Haozhi, XIONG Yuwen, et al. Deformable convolutional networks[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 764-773.
[34] YANG B, BENDER G, LE Q V, et al. CondConv: conditionally parameterized convolutions for efficient inference[EB/OL]. (2019-04-10)[2024-10-12]. https://arxiv.org/abs/1904.04971.
[35] LUU M L, HUANG Zeyi, XING E P, et al. Expeditious sali ency-guided mix-up through random gradient Threshold ing[EB/OL]. (2022-12-09)[2024-10-12]. https://arxiv.org/abs/2212.04875.
[36] 郭玉荣, 张珂, 王新胜, 等. 端到端双通道特征重标定DenseNet图像分类[J]. 中国图象图形学报, 2020, 25(3): 486-497.
GUO Yurong, ZHANG Ke, WANG Xinsheng, et al. Image classification method based on end-to-end dual feature reweight DenseNet[J]. Journal of image and graphics, 2020, 25(3): 486-497.
[37] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2024-10-12]. https://arxiv.org/abs/1409.1556.
[38] HASSANI A, WALTON S, SHAH N, et al. Escaping the big data paradigm with compact Transformers[EB/OL]. (2022-06-07)[2024-10-12]. https://arxiv.org/abs/2104.05704.
[39] ZHOU C L, ZHANG H, ZHOU Z K, et al. QKFormer: hierarchical spiking Transformer using Q-K attention[EB/OL]. (2024-03-25)[2024-10-08]. https://arxiv.org/abs/2403.16552.
[40] CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Rethinking attention with performers[EB/OL]. (2020-09-30)[2024-01-05]. https://arxiv.org/pdf/2009.14794.pdf.
[41] LAN Hai, WANG Xihao, SHEN Hao, et al. Couplformer: rethinking vision Transformer with coupling attention[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 6464-6473.
[42] 谢奕涛, 苏鹭梅, 杨帆, 等. 面向目标类别分类的无数据知识蒸馏方法[J]. 中国图象图形学报, 2024, 29(11): 3401-3416.
XIE Yitao, SU Lumei, YANG Fan, et al. Data-free knowledge distillation for target class classification[J]. Journal of Image and Graphics, 2024, 29(11): 3401-3416.
[43] 柴智, 丁春涛, 郭慧, 等. CN2Conv: 面向物联网设备的强鲁棒CNN设计方法[J]. 计算机应用研究, 2025, 42(7): 2154-2160
CHAI Zhi, DING Chuntao, GUO Hui, et al. Combined non-linearity convolution kernel generation: strong robust CNN design method based on IoT[J]. Application research of computers, 2025, 42(7): 2154-2160
[44] 宫智宇, 王士同. 面向重尾噪声图像分类的残差网络学习方法[J/OL]. 计算机应用. [2025-10-02]. https://doi.org/10.11772/j.issn.1001-9081.2024101407.
GONG Zhiyu, WANG Shitong. Residual network learning method for image classification under heavy-tail noise[J/OL]. Computer applications. [2025-10-02]. https://doi.org/10.11772/j.issn.1001-9081.2024101407.
[45] 杨育婷, 李玲玲, 刘旭, 等. 基于多尺度-多方向Transformer的图像识别[J]. 计算机学报, 2025, 48(2): 249-265.
YANG Yuting, LI Lingling, LIU Xu, et al. Multi-scale and multi-directional Transformer-based image recognition[J]. Chinese journal of computers, 2025, 48(2): 249-265.
[46] 朱秋慧, 杨靖, 黄若愚, 等. 基于部分卷积的多尺度特征卷积神经网络模型[J/OL]. 无线电通信技术. [2025-05-21]. http://kns.cnki.net/kcms/detail/13.1099.TN.20250310.1707.012.html.
Zhu Qiuhui, Yang Jing, Huang Ruoyu, et al. Partial convolution-based multi-scale feature convolutional neural network model[J/OL]. Radio communications technology. [2025-05-21]. http://kns.cnki.net/kcms/detail/13.1099.TN.20250310.1707.012.html.

相似文献/References:: [1]李海峰,杜军平.颜色特征的图像分类技术研究[J].智能系统学报,2008,3(2):65.[doi:CNKI:SUN:ZNXT.0.2008-02-017]
[2]李海峰,杜军平.颜色特征的图像分类技术研究[J].智能系统学报,2008,3(2):155.
　LI Hai-feng,DU Jun-ping.Image classification technology based on color features[J].CAAI Transactions on Intelligent Systems,2008,3():155.
[3]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(5):396.
　YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6():396.
[4]尤雅萍,成运,苏松志,等.基于谱域-空域结合特征和图割原理的高光谱图像分类[J].智能系统学报,2015,10(2):201.[doi:10.3969/j.issn.1673-4785.201410040]
　YOU Yaping,CHENG Yun,SU Songzhi,et al.Hyperspectral image classification based on spectral-spatial combination features and graph cut[J].CAAI Transactions on Intelligent Systems,2015,10():201.[doi:10.3969/j.issn.1673-4785.201410040]
[5]赵骞,李敏,赵晓杰,等.基于感受野学习的特征词袋模型简化算法[J].智能系统学报,2016,11(5):663.[doi:10.11992/tis.201601001]
　ZHAO Qian,LI Min,ZHAO Xiaojie,et al.Learning receptive fields for compact bag-of-feature model[J].CAAI Transactions on Intelligent Systems,2016,11():663.[doi:10.11992/tis.201601001]
[6]费宇杰,吴小俊.一种局部聚合描述符和组显著编码相结合的编码方法[J].智能系统学报,2017,12(2):172.[doi:10.11992/tis.201602010]
　FEI Yujie,WU Xiaojun.A new feature coding algorithm based on the combination of group salient coding and VLAD[J].CAAI Transactions on Intelligent Systems,2017,12():172.[doi:10.11992/tis.201602010]
[7]杨梦铎,栾咏红,刘文军,等.基于自编码器的特征迁移算法[J].智能系统学报,2017,12(6):894.[doi:10.11992/tis.201706037]
　YANG Mengduo,LUAN Yonghong,LIU Wenjun,et al.Feature transfer algorithm based on an auto-encoder[J].CAAI Transactions on Intelligent Systems,2017,12():894.[doi:10.11992/tis.201706037]
[8]马忠丽,刘权勇,武凌羽,等.一种基于联合表示的图像分类方法[J].智能系统学报,2018,13(2):220.[doi:10.11992/tis.201611036]
　MA Zhongli,LIU Quanyong,WU Lingyu,et al.Syncretic representation method for image classification[J].CAAI Transactions on Intelligent Systems,2018,13():220.[doi:10.11992/tis.201611036]
[9]魏彩锋,孙永聪,曾宪华.图正则化字典对学习的轻度认知功能障碍预测[J].智能系统学报,2019,14(2):369.[doi:10.11992/tis.201709033]
　WEI Caifeng,SUN Yongcong,ZENG Xianhua.Dictionary pair learning with graph regularization for mild cognitive impairment prediction[J].CAAI Transactions on Intelligent Systems,2019,14():369.[doi:10.11992/tis.201709033]
[10]赵玉新,赵廷.海底声呐图像智能底质分类技术研究综述[J].智能系统学报,2020,15(3):587.[doi:10.11992/tis.202004026]
　ZHAO Yuxin,ZHAO Ting.Survey of the intelligent seabed sediment classification technology based on sonar images[J].CAAI Transactions on Intelligent Systems,2020,15():587.[doi:10.11992/tis.202004026]

备注/Memo

收稿日期:2025-5-27。
基金项目:国家自然科学基金项目(61601213)；辽宁省自然科学基金项目(20170540426)；辽宁省教育厅重点基金项目(LJYL049).
作者简介:姜文涛，副教授，博士，主要研究方向为图像与视觉信息计算。主持国防预研基金项目、辽宁省教育厅科学技术项目和辽宁省自然科学基金面上项目，发表学术论文35余篇。 E-mail：lntuwulue@163.com。;王鑫杰，硕士研究生，主要研究方向为深度学习与图像处理、模式识别与人工智能。E-mail：2585178999@qq.com。;张晟翀，硕士研究生，高级工程师，主要研究方向为数字信号处理。发表学术论文10余篇。E-mail：zsc417@126.com。
通讯作者:姜文涛. E-mail：lntuwulue@163.com

更新日期/Last Update: 1900-01-01

空间约束注意力机制的图像分类网络 PDF下载HTML

备注/Memo

空间约束注意力机制的图像分类网络

PDF下载 HTML