<-Previous Article Next Article->

[1]SHAO Kai,WANG Mingzheng,WANG Guangyu.Transformer-based multiscale remote sensing semantic segmentation network[J].CAAI Transactions on Intelligent Systems,2024,19(4):920-929.[doi:10.11992/tis.202304026]

Copy

Transformer-based multiscale remote sensing semantic segmentation network

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 19 Number of periods: 2024 4 Page number: 920-929 Column: 学术论文—智能系统 Public date: 2024-07-05

Title:: Transformer-based multiscale remote sensing semantic segmentation network

Author(s):: SHAO Kai¹; 2; 3; WANG Mingzheng¹; WANG Guangyu¹; 2; 1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. Chongqing Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
3. Engineering Research Center of Mobile Communications of the Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Keywords:: remote sensing image; semantic segmentation; convolutional neural network; Transformer; global contextual information; multiscale receptive field; encoder; decoder

CLC:: TP391

DOI:: 10.11992/tis.202304026

Abstract:: For improving the semantic segmentation effect of remote sensing images, this paper proposes a Transformer based multi-scale Transformer network(MSTNet) based on the characteristics of small inter-class variance and large intra-class variance of segmentation targets, focusing on two key points: global contextual information and multi-scale semantic features. The MSTNet consists of an encoder and a decoder. The encoder includes an improved visual attention network(VAN) backbone based on Transformer and an improved multi-scale semantic feature extraction module(MSFEM) based on atrous spatial pyramid pooling(ASPP) to extract multi-scale semantic features. The decoder is designed with a lightweight multi-layer perception(MLP) and an encoder, to fully analyze the global contextual information and multi-scale representations features extracted by utilizing the inductive property of transformer. The proposed MSTNet was validated on two high-resolution remote sensing semantic segmentation datasets, ISPRS Potsdam and LoveDA, achieving an average intersection over union(mIoU) of 79.50% and 54.12%, and an average F₁-score(mF₁) of 87.46% and 69.34%, respectively. The experimental results verify that the proposed method has effectively improved the semantic segmentation of remote sensing images.

References:: [1] 皮新宇, 曾永年, 贺城墙. 融合多源遥感数据的高分辨率城市植被覆盖度估算[J]. 遥感学报, 2021, 25(6): 1216–1226
PI Xinyu, ZENG Yongnian, HE Chengqiang. High-resolution urban vegetation coverage estimation based on multi-source remote sensing data fusion[J]. National remote sensing bulletin, 2021, 25(6): 1216–1226
[2] 高吉喜, 万华伟, 王永财, 等. 大尺度生态质量遥感评价方法构建及应用[J]. 遥感学报, 2023, 27(12): 2860–2872
GAO Jixi, WAN Huawei, WANG Yongcai, et al. New framework for large-scale ecological quality evaluation and application research using remote sensing data[J]. National remote sensing bulletin, 2023, 27(12): 2860–2872
[3] 刘红超, 张磊. 面向类型特征的自适应阈值遥感影像变化检测[J]. 遥感学报, 2020, 24(6): 728–738
LIU Hongchao, ZHANG Lei. Adaptive threshold change detection based on type feature for remote sensing image[J]. Journal of remote sensing, 2020, 24(6): 728–738
[4] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[5] BADRINARAYANAN V, KENDALL A, CIPOLLA R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481–2495.
[6] RONNEBERGER O, FISCHER P, BROX T. U-NET: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference. Munich: Springer International Publishing, 2015: 234-241.
[7] ZHAO Hengshaung, SHI Jiangping, QI Xiaojuan, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2881-2890.
[8] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs[EB/OL]. (2014-12-22) [2022-06-07]. https://arxiv.dosf.top/abs/1412.7062.
[9] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834–848.
[10] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017–06–17)[2021–11–12].https://arxiv.org/abs/1706.05587.
[11] CHEN L C, ZHU Yukun, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2018: 801-818.
[12] 邵凯, 闫力力, 王光宇. 压缩感知重构算法的两步深度展开策略研究[J]. 智能系统学报, 2023, 18(5): 1117–1126
SHAO Kai, YAN Lili, WANG Guangyu. Two-step deep unfolding strategy for compressed sensing reconstruction algorithms[J]. CAAI transactions on intelligent systems, 2023, 18(5): 1117–1126
[13] D’ASCOLI S, TOUVRON H, LEAVITT M L, et al. Convit: improving vision transformers with soft convolutional inductive biases[C]//International Conference on Machine Learning. Vienna: PMLR, 2021: 2286-2296.
[14] ZHANG Cheng, JIANG Wanshou, ZHANG Yuan, et al. Transformer and CNN hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery[J]. IEEE transactions on geoscience and remote sensing, 2022, 60: 1–20.
[15] GAO Liang, LIU Hui, YANG Minhang, et al. Stransfuse: fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2021, 14: 10990–11003.
[16] 龙丽红, 朱宇霆, 闫敬文, 等. 新型语义分割D-UNet的建筑物提取[J]. 遥感学报, 2023, 27(11): 2593–2602
Long Lihong, Zhu Yuting, Yan Jingwen, et al. New building extraction method based on semantic segmentation[J]. National remote sensing bulletin, 2023, 27(11): 2593–2602
[17] LI Rui, ZHENG Sunyi, ZHANG Ce, et al. Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE transactions on geoscience and remote sensing, 2021, 60: 1–13.
[18] 王潇棠, 闫河, 刘建骐, 等. 一种边缘梯度插值的双分支deeplabv3+语义分割模型[J]. 智能系统学报, 2023, 18(3): 604–612
WANG Xiaotang, YAN He, LIU Jianqi, et al. A new deeplabv3+ semantic segmentation model of edge gradient interpolation with double branch structure[J]. CAAI transactions on intelligent systems, 2023, 18(3): 604–612
[19] 李涛, 高志刚, 管晟媛, 等. 结合全局注意力机制的实时语义分割网络[J]. 智能系统学报, 2023, 18(2): 282–292
LI Tao, GAO Zhigang, GUAN Shengyuan, et al. Global attention mechanism with real-time semantic segmentation network[J]. CAAI transactions on intelligent systems, 2023, 18(2): 282–292
[20] FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146-3154.
[21] HUANG Zilong, WANG Xinggang, HUANG Lichao, et al. CCNET: criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 603-612.
[22] LI Rui, ZHENG Shunyi, ZHANG Ce, et al. ABCNet: attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery[J]. ISPRS journal of photogrammetry and remote sensing, 2021, 181: 84–98.
[23] WANG Libo, LI Rui, WANG Dongzhi, et al. Transformer meets convolution: A bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote sensing, 2021, 13(16): 3065.
[24] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22)[2021-01-03]. https://arxiv.dosf.top/abs/2010.11929.
[25] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Nashville: IEEE, 2021: 10012-10022.
[26] GUO Menghao, LU Chengze, LIU Zhengning, et al. Visual attention network[J]. Computational visual media, 2023, 9(4): 733–752.
[27] WANG Panqu, CHEN Pengfei, YUAN Ye, et al. Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 1451-1460.
[28] BAI Haiwei, CHENG Jia, HUANG Xia, et al. HCANet: a hierarchical context aggregation network for semantic segmentation of high-resolution remote sensing images[J]. IEEE geoscience and remote sensing letters, 2021, 19: 1–5.
[29] HASSANI A, SHI H. Dilated neighborhood attention transformer[EB/OL]. (2022-09-29)[2023-01-16]. https://arxiv.dosf.top/abs/2209.15001.
[30] HASSANI A, WALTON S, LI J, et al. Neighborhood attention transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 6185-6194.
[31] XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077-12090.
[32] XIAO Tete, LIU Yingcheng, ZHOU Bolei, et al. Unified perceptual parsing for scene understanding[C]//Proceedings of the European Conference on Computer Vision. Cham: Springer, 2018: 418-434.

Similar References:

Memo

Last Update: 1900-01-01

Transformer-based multiscale remote sensing semantic segmentation network PDF DownloadHTML

Memo

Transformer-based multiscale remote sensing semantic segmentation network

PDF Download HTML