<-上一篇/Previous Article 下一篇/Next Article->

[1]王兴武,雷涛,王营博,等.基于多模态互补特征学习的遥感影像语义分割[J].智能系统学报,2022,17(6):1123-1133.[doi:10.11992/tis.202201025]
　WANG Xingwu,LEI Tao,WANG Yingbo,et al.Semantic segmentation of remote sensing image based on multimodal complementary feature learning[J].CAAI Transactions on Intelligent Systems,2022,17(6):1123-1133.[doi:10.11992/tis.202201025]

点击复制

基于多模态互补特征学习的遥感影像语义分割

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第6期页码: 1123-1133 栏目: 学术论文—机器学习出版日期: 2022-11-05

Title:: Semantic segmentation of remote sensing image based on multimodal complementary feature learning

作者:: 王兴武^1,2, 雷涛^1,2, 王营博^1,2, 耿新哲^1,2, 张月^1,2; 1. 陕西科技大学陕西省人工智能联合实验室，陕西西安710021;
2. 陕西科技大学电子信息与人工智能学院，陕西西安 710021

Author(s):: WANG Xingwu^1,2, LEI Tao^1,2, WANG Yingbo^1,2, GENG Xinzhe^1,2, ZHANG Yue^1,2; 1. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

关键词:: 计算机视觉; 遥感影像; 图像分割; 卷积神经网络; 语义分割; 多模态特征融合; 深度学习; 互补特征学习

Keywords:: computer vision; remote sensing image; image segmentation; convolutional neural network; semantic segmentation; multimodal feature fusion; deep learning; complementary feature learning

分类号:: TP183

DOI:: 10.11992/tis.202201025

文献标志码:: 2022-10-09

摘要:: 在遥感影像语义分割任务中，数字表面模型可以为光谱数据生成对应的几何表示，能够有效提升语义分割的精度。然而，大部分现有工作仅简单地将光谱特征和高程特征在不同的阶段相加或合并，忽略了多模态数据之间的相关性与互补性，导致网络对某些复杂地物无法准确分割。本文基于互补特征学习的多模态数据语义分割网络进行研究。该网络采用多核最大均值距离作为互补约束，提取两种模态特征之间的相似特征与互补特征。在解码之前互相借用互补特征，增强网络共享特征的能力。在国际摄影测量及遥感探测学会 (international society for photogrammetry and remote sensing, ISPRS)的Potsdam与Vaihingen公开数据集上验证所提出的网络，证明了该网络可以实现更高的分割精度。

Abstract:: In the semantic segmentation of remote sensing images, the digital surface model can provide a corresponding geometric representation of the spectral data, which can effectively increase segmentation accuracy. However, most literature studies simply add or merge spectral and elevation features at different stages, ignoring the correlation and complementarity between multimodal data. This makes the network unable to accurately segment some complex features. This paper studies a multimodal data semantic segmentation network based on complementary feature learning. The network uses the multicore maximum mean distance as a complementary constraint to extract similar and complementary features between two modal features. The complementary features are borrowed from each other before decoding to enhance the feature sharing capability of the network. The proposed network is verified on the Potsdam and Vaihingen datasets of ISPRS and achieves higher segmentation accuracy.

参考文献/References:: [1] YUAN X, SHI J, GU L. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert systems with applications, 2021, 169: 114417–114430.
[2] DING Lei, TANG Hao, BRUZZONE L. LANet: local attention embedding to improve the semantic segmentation of remote sensing images[J]. IEEE transactions on geoscience and remote sensing, 2021, 59(1): 426–435.
[3] FU Gang, LIU Changjun, ZHOU Rong, et al. Classification for high resolution remote sensing imagery using a fully convolutional network[J]. Remote sensing, 2017, 9(5): 498–518.
[4] LI Jinglun, XIU Jiapeng, YANG Zhengqiu, et al. Dual path attention net for remote sensing semantic image segmentation[J]. ISPRS international journal of geo-information, 2020, 9(10): 571–591.
[5] LI Haifeng, QIU Kaijian, CHEN Li, et al. SCAttNet: semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J]. IEEE geoscience and remote sensing letters, 2021, 18(5): 905–909.
[6] ZHANG Jing, LIN Shaofu, DING Lei, et al. Multi-scale context aggregation for semantic segmentation of remote sensing images[J]. Remote sensing, 2020, 12(4): 701–716.
[7] DONG Rongsheng, PAN Xiaoquan, LI Fengying. DenseU-net-based semantic segmentation of small objects in urban remote sensing images[J]. IEEE access, 2019, 7: 65347–65356.
[8] LIU Shuo, DING Wenrui, LIU Chunhui, et al. ERN: edge loss reinforced semantic segmentation network for remote sensing images[J]. Remote sensing, 2018, 10(9): 1339.
[9] CHEN Kaiqiang, FU Kun, GAO Xin, et al. Effective fusion of multi-modal data with group convolutions for semantic segmentation of aerial imagery[C]//2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE, 2019: 3911-3914.
[10] SUN Weiwei, WANG Ruisheng. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM[J]. IEEE geoscience and remote sensing letters, 2018, 15(3): 474–478.
[11] CAO Zhiying, FU Kun, LU Xiaode, et al. End-to-end DSM fusion networks for semantic segmentation in high-resolution aerial images[J]. IEEE geoscience and remote sensing letters, 2019, 16(11): 1766–1770.
[12] HAZIRBAS C, MA Lingni, DOMOKOS C, et al. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture[M]//Computer Vision-ACCV 2016. Cham: Springer International Publishing, 2017: 213-228.
[13] AUDEBERT N, LE SAUX B, LEFèVRE S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks[M]//Computer Vision-ACCV 2016. Cham: Springer International Publishing, 2017: 180-196.
[14] QIN Rongjun, FANG Wei. A hierarchical building detection method for very high resolution remotely sensed images combined with DSM using graph cut optimization[J]. Photogrammetric engineering & remote sensing, 2014, 80(9): 873–883.
[15] CAI Ziyun, HAN Jungong, LIU Li, et al. RGB-D datasets using microsoft kinect or similar sensors: a survey[J]. Multimedia tools and applications, 2017, 76(3): 4313–4355.
[16] ZHANG Wenkai, HUANG Hai, SCHMITZ M, et al. Effective fusion of multi-modal remote sensing data in a fully convolutional network for semantic labeling[J]. Remote sensing, 2017, 10(2): 52–65.
[17] WEINMANN M, WEINMANN M. Geospatial computer vision based on multi-modal data—how valuable is shape information for the extraction of semantic information?[J]. Remote sensing, 2017, 10(2): 2–21.
[18] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016 : 770-778.
[19] WANG Yuhao, LIANG Binxiu, DING Meng, et al. Dense semantic labeling with atrous spatial pyramid pooling and decoder for high-resolution remote sensing imagery[J]. Remote sensing, 2018, 11(1): 20–37.
[20] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(4): 834–848.
[21] ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.
[22] CHEN L C, ZHU Yukun, PAPANDREOU G, et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[23] YANG Maoke, YU Kun, ZHANG Chi, et al. DenseASPP for semantic segmentation in street scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018 : 3684-3692.
[24] SHI Lukui, WANG Ziyuan, PAN Bin, et al. An end-to-end network for remote sensing imagery semantic segmentation via joint pixel- and representation-level domain adaptation[J]. IEEE geoscience and remote sensing letters, 2021, 18(11): 1896–1900.
[25] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139–144.
[26] YAN Hongliang, DING Yukang, LI Peihua, et al. Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017 : 945-954.
[27] GRETTON A, SEJDINOVIC D, STRATHMANN H, et al. Optimal kernel choice for large-scale two-sample tests[C]//Annual Conference on Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 1205-1213.
[28] ROTTENSTEINE F, SOHN G, GEREK M, et al. Results of the ISPRS benchmark on urban object detection and 3D building reconstruction[J]. ISPRS journal of photogrammetry and remote sensing, 2014, 93: 256–271.
[29] LIU Yifan, ZHU Qigang, CAO Feng, et al. High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting[J]. International journal of geo-information, 2021, 10(4): 241–258.
[30] CAO Zhiying, DIAO Wenhui, SUN Xian, et al. C3Net: cross-modal feature recalibrated, cross-scale semantic aggregated and compact network for semantic segmentation of multi-modal high-resolution aerial images[J]. Remote sensing, 2021, 13(3): 528–545.
[31] LIU Siyu, HE Changtao, BAI Haiwei, et al. Light-weight attention semantic segmentation network for high-resolution remote sensing images[C]//IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium. Waikoloa: IEEE, 2020: 2595-2598. .
[32] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[33] AUDEBERT N, LE SAUX B, LEFèVRE S. Beyond RGB: very high resolution urban remote sensing with multimodal deep networks[J]. ISPRS journal of photogrammetry and remote sensing, 2018, 140: 20–32.
[34] PENG Cheng, LI Yangyang, JIAO Licheng, et al. Densely based multi-scale and multi-modal fully convolutional networks for high-resolution remote-sensing image semantic segmentation[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2019, 12(8): 2612–2626.

相似文献/References:: [1]夏凡,王宏.基于局部异常行为检测的欺骗识别研究[J].智能系统学报,2007,2(5):12.
　XIA Fan,WANG Hong.Methodologies for deception detection based on abnormal b ehavior[J].CAAI Transactions on Intelligent Systems,2007,2():12.
[2]杨戈,刘宏.视觉跟踪算法综述[J].智能系统学报,2010,5(2):95.
　YANG Ge,LIU Hong.Survey of visual tracking algorithms[J].CAAI Transactions on Intelligent Systems,2010,5():95.
[3]刘宏,李哲媛,许超.视错觉现象的分类和研究进展[J].智能系统学报,2011,6(1):1.
　LIU Hong,LI Zheyuan,XU Chao.The categories and research advances of visual illusions[J].CAAI Transactions on Intelligent Systems,2011,6():1.
[4]叶果,程洪,赵洋.电影中吸烟活动识别[J].智能系统学报,2011,6(5):440.
　YE Guo,CHENG Hong,ZHAO Yang.moking recognition in movies[J].CAAI Transactions on Intelligent Systems,2011,6():440.
[5]史晓鹏,何为,韩力群.采用Hough变换的道路边界检测算法[J].智能系统学报,2012,7(1):81.
　SHI Xiaopeng,HE Wei,HAN Liqun.A road edge detection algorithm based on the Hough transform[J].CAAI Transactions on Intelligent Systems,2012,7():81.
[6]顾照鹏,刘宏.单目视觉同步定位与地图创建方法综述[J].智能系统学报,2015,10(4):499.[doi:10.3969/j.issn.1673-4785.201503003]
　GU Zhaopeng,LIU Hong.A survey of monocular simultaneous localization and mapping[J].CAAI Transactions on Intelligent Systems,2015,10():499.[doi:10.3969/j.issn.1673-4785.201503003]
[7]赵军,於俊,汪增福.基于改进逆向运动学的人体运动跟踪[J].智能系统学报,2015,10(4):548.[doi:10.3969/j.issn.1673-4785.201403032]
　ZHAO Jun,YU Jun,WANG Zengfu.Human motion tracking based on an improved inverse kinematics[J].CAAI Transactions on Intelligent Systems,2015,10():548.[doi:10.3969/j.issn.1673-4785.201403032]
[8]姬晓飞,王昌汇,王扬扬.分层结构的双人交互行为识别方法[J].智能系统学报,2015,10(6):893.[doi:10.11992/tis.201505006]
　JI Xiaofei,WANG Changhui,WANG Yangyang.Human interaction behavior-recognition method based on hierarchical structure[J].CAAI Transactions on Intelligent Systems,2015,10():893.[doi:10.11992/tis.201505006]
[9]方鹏,李贤,汪增福.运用核聚类和偏最小二乘回归的歌唱声音转换[J].智能系统学报,2016,11(1):55.[doi:10.11992/tis.201506022]
　FANG Peng,LI Xian,WANG Zengfu.Conversion of singing voice based on kernel clustering and partial least squares regression[J].CAAI Transactions on Intelligent Systems,2016,11():55.[doi:10.11992/tis.201506022]
[10]李雪,蒋树强.智能交互的物体识别增量学习技术综述[J].智能系统学报,2017,12(2):140.[doi:10.11992/tis.201701006]
　LI Xue,JIANG Shuqiang.Incremental learning and object recognition system based on intelligent HCI: a survey[J].CAAI Transactions on Intelligent Systems,2017,12():140.[doi:10.11992/tis.201701006]

备注/Memo

收稿日期:2022-01-16。
基金项目:国家自然科学基金项目(61871259；61861024；62201334)；陕西省重点研发计划项目(2021ZDLGY08-07)；陕西省人工智能联合实验室项目（2020SS-03）.
作者简介:王兴武，硕士研究生，主要研究方向为人工智能、深度学习;雷涛，教授，博士生导师，陕西科技大学电子信息与人工智能学院副院长，IEEE高级会员，主要研究方向为计算机视觉、机器学习。发表学术论文90余篇;王营博，讲师，博士，主要研究方向为散射环境下图像复原与场景感知
通讯作者:雷涛.E-mail:leitao@sust.edu.cn

更新日期/Last Update: 1900-01-01

基于多模态互补特征学习的遥感影像语义分割 PDF下载HTML

备注/Memo

基于多模态互补特征学习的遥感影像语义分割

PDF下载 HTML