[1]SHAO Kai,WANG Mingzheng,WANG Guangyu.Transformer-based multiscale remote sensing semantic segmentation network[J].CAAI Transactions on Intelligent Systems,2024,19(4):920-929.[doi:10.11992/tis.202304026]
                                
                                Copy
                                
                                
                             
                            
                                
                                    CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
                                    19
                                    Number of periods:
                                    2024 4
                                    Page number:
                                    920-929
                                    Column:
                                    学术论文—智能系统
                                    Public date:
                                    2024-07-05
                                
                                
                                    - Title:
 
                                    - 
                                        Transformer-based multiscale remote sensing semantic segmentation network
 
                                
                                
                                
                                    - Author(s):
 
                                    - 
                                        SHAO Kai1; 2; 3;  WANG Mingzheng1;  WANG Guangyu1; 2
 
                                    - 
                                        1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
2. Chongqing Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
3. Engineering Research Center of Mobile Communications of the Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China 
                                    - 
                                
 
                                
                                    - Keywords:
 
                                    - 
                                        remote sensing image; semantic segmentation; convolutional neural network; Transformer; global contextual information; multiscale receptive field; encoder; decoder
 
                                
                                
                                    - CLC:
 
                                    - 
                                        TP391
 
                                
                                
                                    - DOI:
 
                                    - 
                                        10.11992/tis.202304026
 
                                
                                
                                
                                    - Abstract:
 
                                    - 
                                        For improving the semantic segmentation effect of remote sensing images, this paper proposes a Transformer based multi-scale Transformer network(MSTNet) based on the characteristics of small inter-class variance and large intra-class variance of segmentation targets, focusing on two key points: global contextual information and multi-scale semantic features. The MSTNet consists of an encoder and a decoder. The encoder includes an improved visual attention network(VAN) backbone based on Transformer and an improved multi-scale semantic feature extraction module(MSFEM) based on atrous spatial pyramid pooling(ASPP) to extract multi-scale semantic features. The decoder is designed with a lightweight multi-layer perception(MLP) and an encoder, to fully analyze the global contextual information and multi-scale representations features extracted by utilizing the inductive property of transformer. The proposed MSTNet was validated on two high-resolution remote sensing semantic segmentation datasets, ISPRS Potsdam and LoveDA, achieving an average intersection over union(mIoU) of 79.50% and 54.12%, and an average F1-score(mF1) of 87.46% and 69.34%, respectively. The experimental results verify that the proposed method has effectively improved the semantic segmentation of remote sensing images.