<-上一篇/Previous Article 下一篇/Next Article->

[1]李涛,高志刚,管晟媛,等.结合全局注意力机制的实时语义分割网络[J].智能系统学报,2023,18(2):282-292.[doi:10.11992/tis.202208027]
　LI Tao,GAO Zhigang,GUAN Shengyuan,et al.Global attention mechanism with real-time semantic segmentation network[J].CAAI Transactions on Intelligent Systems,2023,18(2):282-292.[doi:10.11992/tis.202208027]

点击复制

结合全局注意力机制的实时语义分割网络

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 18 期数: 2023年第2期页码: 282-292 栏目: 学术论文—智能系统出版日期: 2023-05-05

Title:: Global attention mechanism with real-time semantic segmentation network

作者:: 李涛^1,2, 高志刚³, 管晟媛⁴, 徐久成^1,2, 马媛媛¹; 1. 河南师范大学计算机与信息工程学院，河南新乡 453007;
2. 智慧商务与物联网技术河南省工程实验室，河南新乡 453007;
3. 河南师范大学软件学院，河南新乡 453007;
4. 中国人民公安大学国家安全学院，北京 100038

Author(s):: LI Tao^1,2, GAO Zhigang³, GUAN Shengyuan⁴, XU Jiucheng^1,2, MA Yuanyuan¹; 1. College of Computer and Information Engineering, He’nan Normal University, Xinxiang 453007, China;
2. Engineering Lab of He’nan Province for Intelligence Business & Internet of Things, Xinxiang 453007, China;
3. College of Software, He’nan Normal University, Xinxiang 453007, China;
4. National Security Academy, People’s Public Security University of China, Beijing 100038, China

关键词:: 实时语义分割; 全局注意力机制; 多尺度特征融合; 混合空洞卷积; 卷积神经网络; 金字塔池化; 感受野; 特征提取

Keywords:: real-time semantic segmentation; global attention mechanism; multiscale feature fusion; hybrid dilated convolution; convolutional neural network; pyramid pooling; receptive field; feature extraction

分类号:: TP391

DOI:: 10.11992/tis.202208027

摘要:: 针对轻量化网络结构从特征图提取有效语义信息不足，以及语义信息与空间细节信息融合模块设计不合理而导致分割精度降低的问题，本文提出一种结合全局注意力机制的实时语义分割网络(global attention mechanism with real time semantic segmentation network ,GaSeNet)。首先在双分支结构的语义分支中引入全局注意力机制，在通道与空间两个维度引导卷积神经网来关注与分割任务相关的语义类别，以提取更多有效语义信息；其次在空间细节分支设计混合空洞卷积块，在卷积核大小不变的情况下扩大感受野，以获取更多全局空间细节信息，弥补关键特征信息损失。然后重新设计特征融合模块，引入深度聚合金塔池化，将不同尺度的特征图深度融合，从而提高网络的语义分割性能。最后将所提出的方法在CamVid数据集和Vaihingen数据集上进行实验，通过与最新的语义分割方法对比分析可知，GaSeNet在分割精度上分别提高了4.29%、16.06%，实验结果验证了本文方法处理实时语义分割问题的有效性。

Abstract:: The lightweight network structure cannot sufficiently extract effective semantic information from feature maps, and the unreasonable design of the semantic information and spatial detail information fusion block leads to a decrease in segmentation accuracy. To address these problems, a global attention mechanism with a real-time semantic segmentation network (GaSeNet) is proposed in the paper. First, a global attention mechanism is introduced into the semantic branch of the dual-branch structure. The convolutional neural network is then guided in the two dimensions of channel and space to focus on the semantic categories related to the segmentation task to extract remarkably effective semantic information. Second, a mixed hole convolution block is designed in the spatial detail branch, and the receptive field is enlarged while maintaining the size of the convolution kernel to obtain additional global spatial detail information and compensate for the loss of key feature information. The feature fusion module is then redesigned, and the deep aggregation pyramid pooling module is introduced to fuse feature maps of different scales comprehensively, thereby improving the semantic segmentation performance of the network. Finally, the proposed method is tested on CamVid and Vaihingen datasets. Compared with the latest semantic segmentation algorithm, GaSeNet improves the segmentation accuracy by 4.29% and 16.06%. Experimental results verify the effectiveness of this method in dealing with real-time semantic segmentation problems.

参考文献/References:: [1] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2015: 3431?3440.
[2] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015?04?10)[2022?08?06].https://doi.org/10.48550/arXiv.1409.1556.
[3] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234?241.
[4] YU F, KOLTUN V. MULTI-SCALE context aggregation by dilated convolutions[EB/OL]. (2016?04?30)[2022?08?06].https://doi.org/10.48550/arXiv.1511.07122.
[5] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017?12?05)[2022?08?06].https://doi.org/10.48550/arXiv.1706.05587.
[6] WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//2018 IEEE winter conference on applications of computer vision. Nevada: IEEE, 2018: 1451?1460.
[7] ZHAO Hengshuang, QI Xiaojuan, SHEN Xiaoyong, et al. ICNet for Real-Time Semantic Segmentation on High-Resolution Images[C]//European Conference on Computer Vision. Cham: Springer, 2018: 418?434.
[8] LI Hanchao, XIONG Pengfei, FAN Haoqiang, et al. DFANet: deep feature aggregation for real-time semantic segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9514?9523.
[9] YU Changqian, GAO Changxin, WANG Jingbo, et al. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International journal of computer vision, 2021, 129(11): 3051–3068.
[10] HONG Yuanduo, PAN Huihui, SUN Weichao, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[EB/OL]. (2021?09?01)[2022?08?06].https://doi.org/10.48550/arXiv.2101.06085.
[11] LIU Yichao, SHAO Zongru, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2021-12-10)[2022-08-06].https://doi.org/10.48550/arXiv.2112.05561.
[12] 丁宗元, 孙权森, 王涛, 等. 基于融合多尺度标记信息的深度交互式图像分割[J]. 计算机研究与发展, 2021, 58(8): 1705–1717
DING Zongyuan, SUN Quansen, WANG Tao, et al. Deep interactive image segmentation based on fusion multi-scale annotation information[J]. Journal of computer research and development, 2021, 58(8): 1705–1717
[13] 张墺琦, 亢宇鑫, 武卓越, 等. 基于多尺度特征和注意力机制的肝脏组织病理图像语义分割网络[J]. 模式识别与人工智能, 2021, 34(4): 375–384
ZHANG Aoqi, KANG Yuxin, WU Zhuoyue, et al. Semantic segmentation network of pathological images of liver tissue based on multi-scale feature and attention mechanism[J]. Pattern recognition and artificial intelligence, 2021, 34(4): 375–384
[14] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3?19.
[15] 杨昆, 常世龙, 王尉丞, 等. 基于sECANet通道注意力机制的肾透明细胞癌病理图像ISUP分级预测[J]. 电子与信息学报, 2022, 44(1): 138–148
YANG Kun, CHANG Shilong, WANG Yucheng, et al. Predict the ISUP grade of clear cell renal cell carcinoma using pathological images based on sECANet Chanel attention[J]. Journal of electronics & information technology, 2022, 44(1): 138–148
[16] 张志华, 温亚楠, 慕号伟, 等. 结合双注意力机制的道路裂缝检测[J]. 中国图象图形学报, 2022, 27(7): 2240–2250
ZHANG Zhihua, WEN Yanan, MU Haowei, et al. Dual attention mechanism based pavement crack detection[J]. Journal of image and graphics, 2022, 27(7): 2240–2250
[17] FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]//2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146?3154.
[18] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481–2495.
[19] 徐硕, 郑锋, 唐俊, 等. 双分支特征融合网络的步态识别算法[J]. 中国图象图形学报, 2022, 27(7): 2263–2273
XU Shuo, ZHENG Feng, TANG Jun, et al. Dual branch feature fusion network based gait recognition algorithm[J]. Journal of image and graphics, 2022, 27(7): 2263–2273
[20] 刘万军, 佟畅, 曲海成. 空洞卷积与注意力融合的对抗式图像阴影去除算法[J]. 智能系统学报, 2021, 16(6): 1081–1089
LIU Wanjun, TONG Chang, QU Haicheng. An antagonistic image shadow removal algorithm based on dilated convolution and attention mechanism[J]. CAAI transactions on intelligent systems, 2021, 16(6): 1081–1089
[21] 吴止锾, 高永明, 李磊, 等. 类别非均衡遥感图像语义分割的全卷积网络方法[J]. 光学学报, 2019(4): 393?404.
WU Zhihuan, GAO Yongming, LI Lei, et al. Fully convolutional network method of semantic segmentation of class imbalance remote sensing images[J]. Acta optica sinica, 2019(4): 393?404.
[22] PENG J, LIU Y, TANG S, et al. PP-LiteSeg: A superior real-time semantic segmentation model[EB/OL]. (2022?04?06)[2022?08?06].https://doi.org/10.48550/arXiv.2204.02681
[23] YU Changqian, WANG Jingbo, PENG Chao, et al. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 334?349.
[24] 张小娟, 汪西莉. 完全残差连接与多尺度特征融合遥感图像分割[J]. 遥感学报, 2020, 24(9): 1120–1133
ZHANG Xiaojuan, WANG Xili. Image segmentation models of remote sensing using full residual connection and multiscale feature fusion[J]. Journal of remote sensing, 2020, 24(9): 1120–1133
[25] CHEN L C, ZHU Yukun, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 833?851.

备注/Memo

收稿日期:2022-08-19。
基金项目:国家自然科学基金项目（61976082，62002103）；河南省高等学校重点科研项目（22B520013）；河南省科技攻关计划项目（222102210169）.
作者简介:李涛,讲师,博士,主要研究方向为智能信息处理、数据挖掘。参与或主持国家自然科学基金、省级自然科学基金和省级科技攻关项目5项。发表学术论文10余篇;高志刚,本科生,主要研究方向为深度学习、图像语义分割、目标检测、计算机视觉;管晟媛,硕士研究生,主要研究方向为深度学习、数字水印、计算机视觉
通讯作者:李涛. E-mail：litao0116@163.com

更新日期/Last Update: 1900-01-01

结合全局注意力机制的实时语义分割网络 PDF下载HTML

备注/Memo

结合全局注意力机制的实时语义分割网络

PDF下载 HTML