<-Previous Article Next Article->

[1]WANG Xingwu,LEI Tao,WANG Yingbo,et al.Semantic segmentation of remote sensing image based on multimodal complementary feature learning[J].CAAI Transactions on Intelligent Systems,2022,17(6):1123-1133.[doi:10.11992/tis.202201025]

Copy

Semantic segmentation of remote sensing image based on multimodal complementary feature learning

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 17 Number of periods: 2022 6 Page number: 1123-1133 Column: 学术论文—机器学习 Public date: 2022-11-05

Title:: Semantic segmentation of remote sensing image based on multimodal complementary feature learning

Author(s):: WANG Xingwu¹; 2; LEI Tao¹; 2; WANG Yingbo¹; 2; GENG Xinzhe¹; 2; ZHANG Yue¹; 2; 1. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

Keywords:: computer vision; remote sensing image; image segmentation; convolutional neural network; semantic segmentation; multimodal feature fusion; deep learning; complementary feature learning

CLC:: TP183

DOI:: 10.11992/tis.202201025

Abstract:: In the semantic segmentation of remote sensing images, the digital surface model can provide a corresponding geometric representation of the spectral data, which can effectively increase segmentation accuracy. However, most literature studies simply add or merge spectral and elevation features at different stages, ignoring the correlation and complementarity between multimodal data. This makes the network unable to accurately segment some complex features. This paper studies a multimodal data semantic segmentation network based on complementary feature learning. The network uses the multicore maximum mean distance as a complementary constraint to extract similar and complementary features between two modal features. The complementary features are borrowed from each other before decoding to enhance the feature sharing capability of the network. The proposed network is verified on the Potsdam and Vaihingen datasets of ISPRS and achieves higher segmentation accuracy.

References:: [1] YUAN X, SHI J, GU L. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert systems with applications, 2021, 169: 114417–114430.
[2] DING Lei, TANG Hao, BRUZZONE L. LANet: local attention embedding to improve the semantic segmentation of remote sensing images[J]. IEEE transactions on geoscience and remote sensing, 2021, 59(1): 426–435.
[3] FU Gang, LIU Changjun, ZHOU Rong, et al. Classification for high resolution remote sensing imagery using a fully convolutional network[J]. Remote sensing, 2017, 9(5): 498–518.
[4] LI Jinglun, XIU Jiapeng, YANG Zhengqiu, et al. Dual path attention net for remote sensing semantic image segmentation[J]. ISPRS international journal of geo-information, 2020, 9(10): 571–591.
[5] LI Haifeng, QIU Kaijian, CHEN Li, et al. SCAttNet: semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J]. IEEE geoscience and remote sensing letters, 2021, 18(5): 905–909.
[6] ZHANG Jing, LIN Shaofu, DING Lei, et al. Multi-scale context aggregation for semantic segmentation of remote sensing images[J]. Remote sensing, 2020, 12(4): 701–716.
[7] DONG Rongsheng, PAN Xiaoquan, LI Fengying. DenseU-net-based semantic segmentation of small objects in urban remote sensing images[J]. IEEE access, 2019, 7: 65347–65356.
[8] LIU Shuo, DING Wenrui, LIU Chunhui, et al. ERN: edge loss reinforced semantic segmentation network for remote sensing images[J]. Remote sensing, 2018, 10(9): 1339.
[9] CHEN Kaiqiang, FU Kun, GAO Xin, et al. Effective fusion of multi-modal data with group convolutions for semantic segmentation of aerial imagery[C]//2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE, 2019: 3911-3914.
[10] SUN Weiwei, WANG Ruisheng. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM[J]. IEEE geoscience and remote sensing letters, 2018, 15(3): 474–478.
[11] CAO Zhiying, FU Kun, LU Xiaode, et al. End-to-end DSM fusion networks for semantic segmentation in high-resolution aerial images[J]. IEEE geoscience and remote sensing letters, 2019, 16(11): 1766–1770.
[12] HAZIRBAS C, MA Lingni, DOMOKOS C, et al. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture[M]//Computer Vision-ACCV 2016. Cham: Springer International Publishing, 2017: 213-228.
[13] AUDEBERT N, LE SAUX B, LEFèVRE S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks[M]//Computer Vision-ACCV 2016. Cham: Springer International Publishing, 2017: 180-196.
[14] QIN Rongjun, FANG Wei. A hierarchical building detection method for very high resolution remotely sensed images combined with DSM using graph cut optimization[J]. Photogrammetric engineering & remote sensing, 2014, 80(9): 873–883.
[15] CAI Ziyun, HAN Jungong, LIU Li, et al. RGB-D datasets using microsoft kinect or similar sensors: a survey[J]. Multimedia tools and applications, 2017, 76(3): 4313–4355.
[16] ZHANG Wenkai, HUANG Hai, SCHMITZ M, et al. Effective fusion of multi-modal remote sensing data in a fully convolutional network for semantic labeling[J]. Remote sensing, 2017, 10(2): 52–65.
[17] WEINMANN M, WEINMANN M. Geospatial computer vision based on multi-modal data—how valuable is shape information for the extraction of semantic information?[J]. Remote sensing, 2017, 10(2): 2–21.
[18] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016 : 770-778.
[19] WANG Yuhao, LIANG Binxiu, DING Meng, et al. Dense semantic labeling with atrous spatial pyramid pooling and decoder for high-resolution remote sensing imagery[J]. Remote sensing, 2018, 11(1): 20–37.
[20] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(4): 834–848.
[21] ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.
[22] CHEN L C, ZHU Yukun, PAPANDREOU G, et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[23] YANG Maoke, YU Kun, ZHANG Chi, et al. DenseASPP for semantic segmentation in street scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018 : 3684-3692.
[24] SHI Lukui, WANG Ziyuan, PAN Bin, et al. An end-to-end network for remote sensing imagery semantic segmentation via joint pixel- and representation-level domain adaptation[J]. IEEE geoscience and remote sensing letters, 2021, 18(11): 1896–1900.
[25] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139–144.
[26] YAN Hongliang, DING Yukang, LI Peihua, et al. Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017 : 945-954.
[27] GRETTON A, SEJDINOVIC D, STRATHMANN H, et al. Optimal kernel choice for large-scale two-sample tests[C]//Annual Conference on Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 1205-1213.
[28] ROTTENSTEINE F, SOHN G, GEREK M, et al. Results of the ISPRS benchmark on urban object detection and 3D building reconstruction[J]. ISPRS journal of photogrammetry and remote sensing, 2014, 93: 256–271.
[29] LIU Yifan, ZHU Qigang, CAO Feng, et al. High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting[J]. International journal of geo-information, 2021, 10(4): 241–258.
[30] CAO Zhiying, DIAO Wenhui, SUN Xian, et al. C3Net: cross-modal feature recalibrated, cross-scale semantic aggregated and compact network for semantic segmentation of multi-modal high-resolution aerial images[J]. Remote sensing, 2021, 13(3): 528–545.
[31] LIU Siyu, HE Changtao, BAI Haiwei, et al. Light-weight attention semantic segmentation network for high-resolution remote sensing images[C]//IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. Waikoloa: IEEE, 2020: 2595-2598. .
[32] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[33] AUDEBERT N, LE SAUX B, LEFèVRE S. Beyond RGB: very high resolution urban remote sensing with multimodal deep networks[J]. ISPRS journal of photogrammetry and remote sensing, 2018, 140: 20–32.
[34] PENG Cheng, LI Yangyang, JIAO Licheng, et al. Densely based multi-scale and multi-modal fully convolutional networks for high-resolution remote-sensing image semantic segmentation[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2019, 12(8): 2612–2626.

Similar References:

Memo

Last Update: 1900-01-01

Semantic segmentation of remote sensing image based on multimodal complementary feature learning PDF DownloadHTML

Memo

Semantic segmentation of remote sensing image based on multimodal complementary feature learning

PDF Download HTML