[1]邵凯,王明政,王光宇.基于Transformer的多尺度遥感语义分割网络[J].智能系统学报,2024,19(4):920-929.[doi:10.11992/tis.202304026]
SHAO Kai,WANG Mingzheng,WANG Guangyu.Transformer-based multiscale remote sensing semantic segmentation network[J].CAAI Transactions on Intelligent Systems,2024,19(4):920-929.[doi:10.11992/tis.202304026]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第4期
页码:
920-929
栏目:
学术论文—智能系统
出版日期:
2024-07-05
- Title:
-
Transformer-based multiscale remote sensing semantic segmentation network
- 作者:
-
邵凯1,2,3, 王明政1, 王光宇1,2
-
1. 重庆邮电大学 通信与信息工程学院, 重庆 400065;
2. 重庆邮电大学 移动通信技术重庆市重点实验室, 重庆 400065;
3. 重庆邮电大学 移动通信教育部工程研究中心, 重庆 400065
- Author(s):
-
SHAO Kai1,2,3, WANG Mingzheng1, WANG Guangyu1,2
-
1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. Chongqing Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
3. Engineering Research Center of Mobile Communications of the Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
- 关键词:
-
遥感图像; 语义分割; 卷积神经网络; Transformer; 全局上下文信息; 多尺度感受野; 编码器; 解码器
- Keywords:
-
remote sensing image; semantic segmentation; convolutional neural network; Transformer; global contextual information; multiscale receptive field; encoder; decoder
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202304026
- 摘要:
-
为了提升遥感图像语义分割效果,本文针对分割目标类间方差小、类内方差大的特点,从全局上下文信息和多尺度语义特征2个关键点提出一种基于Transformer的多尺度遥感语义分割网络(muliti-scale Transformer network,MSTNet)。其由编码器和解码器2个部分组成,编码器包含基于Transformer改进的视觉注意网络(visual attention network,VAN)主干和基于空洞空间金字塔池化(atrous spatial pyramid pooling, ASPP)结构改进的多尺度语义特征提取模块(multi-scale semantic feature extraction module, MSFEM)。解码器采用轻量级多层感知器(multi-layer perception,MLP)配合编码器设计,充分分析所提取的包含全局上下文信息和多尺度表示的语义特征。MSTNet在2个高分辨率遥感语义分割数据集ISPRS Potsdam和LoveDA上进行验证,平均交并比(mIoU)分别达到79.50%和54.12%,平均F1-score(mF1)分别达到87.46%和69.34%,实验结果验证了本文所提方法有效提升了遥感图像语义分割的效果。
- Abstract:
-
For improving the semantic segmentation effect of remote sensing images, this paper proposes a Transformer based multi-scale Transformer network(MSTNet) based on the characteristics of small inter-class variance and large intra-class variance of segmentation targets, focusing on two key points: global contextual information and multi-scale semantic features. The MSTNet consists of an encoder and a decoder. The encoder includes an improved visual attention network(VAN) backbone based on Transformer and an improved multi-scale semantic feature extraction module(MSFEM) based on atrous spatial pyramid pooling(ASPP) to extract multi-scale semantic features. The decoder is designed with a lightweight multi-layer perception(MLP) and an encoder, to fully analyze the global contextual information and multi-scale representations features extracted by utilizing the inductive property of transformer. The proposed MSTNet was validated on two high-resolution remote sensing semantic segmentation datasets, ISPRS Potsdam and LoveDA, achieving an average intersection over union(mIoU) of 79.50% and 54.12%, and an average F1-score(mF1) of 87.46% and 69.34%, respectively. The experimental results verify that the proposed method has effectively improved the semantic segmentation of remote sensing images.
更新日期/Last Update:
1900-01-01