<-上一篇/Previous Article 下一篇/Next Article->

[1]刘诗怡,刘金平,黄丽娟,等.基于多尺度协调卷积与自适应加权的红外与可见光图像融合[J].智能系统学报,2026,21(1):95-108.[doi:10.11992/tis.202504002]
　LIU Shiyi,LIU Jinping,HUANG Lijuan,et al.Infrared and visible image fusion based on multi-scale coordinated convolution and adaptive weighting[J].CAAI Transactions on Intelligent Systems,2026,21(1):95-108.[doi:10.11992/tis.202504002]

点击复制

基于多尺度协调卷积与自适应加权的红外与可见光图像融合

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 21 期数: 2026年第1期页码: 95-108 栏目: 学术论文—机器感知与模式识别出版日期: 2026-01-05

Title:: Infrared and visible image fusion based on multi-scale coordinated convolution and adaptive weighting

作者:: 刘诗怡¹, 刘金平¹, 黄丽娟², 蒋嘉豪¹, 宋殿义³, 杨广益⁴; 1. 湖南师范大学信息科学与工程学院, 湖南长沙 410081;
2. 湖南省智能康复机器人与辅助设备工程技术研究中心, 湖南长沙 410004;
3. 国防科技大学军政基础教育学院, 湖南长沙 410072;
4. 湖南省计量检测院, 湖南长沙 410081

Author(s):: LIU Shiyi¹, LIU Jinping¹, HUANG Lijuan², JIANG Jiahao¹, SONG Dianyi³, YANG Guangyi⁴; 1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China;
2. Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center, Changsha 410004, China;
3. Basic Education College, National University of Defense Technology, Changsha 410072, China;
4. Hunan Institute of Metrology and Testing, Changsha 410081, China

关键词:: 图像融合; 红外图像; 可见光图像; 多尺度协调卷积; 卷积加权重排多层感知器; 坐标注意力; 自适应权重

Keywords:: image fusion; infrared image; visible image; multiscale coordinate convolution; convolutional multilayer perceptron; coordinate attention; adaptive weighting

分类号:: TP391

DOI:: 10.11992/tis.202504002

摘要:: 针对当前基于卷积神经网络的图像融合模型在全局信息感知、高频细节保持及损失函数权重设定上的局限性，提出一种集成卷积和多层感知器架构的多尺度协调网络，以实现红外与可见光图像的高质量融合。提出一种卷积加权重排多层感知器模块，通过模拟特征排列增强空间维度理解，并结合自适应特征重加权机制有效整合全局信息。同时，提出多尺度协调卷积模块，利用中心差分卷积增强高频信息的保留能力，并通过多尺度并行子网络优化多层次特征表达；其内嵌的坐标注意力机制，通过通道–空间联合调制强化互补信息并抑制冗余特征。此外，还提出一种数据驱动的自适应权重策略，基于图像特征统计量动态调整监督信号的贡献度，降低调参复杂性并提升损失函数的自适应性。在RoadScene、TNO和M³FD这3个公开数据集上的实验结果表明，本文算法生成的融合图像在边缘保持、纹理过渡方面表现更优，且在信息熵、标准差、空间频率、视觉信息保真度和平均梯度等指标上全面超越主流融合方法，为红外与可见光图像融合提供了新的思路，为图像融合领域的进一步发展打下了坚实的基础。

Abstract:: To address the limitations of convolution neural networks-based image fusion models, such as restricted global information perception, high-frequency detail preservation, and the loss function weights configuration, this article proposes a convolution and multilayer perceptron-integrated multiscale coordinate network (CM-MCNet) for high-quality infrared and visible image fusion. In the encoder of CM-McNet, a convolutional weighted permute multilayer perceptron module is introduced to enhance spatial understanding by simulating feature permutation and integrates an adaptive feature reweighting mechanism to effectively capture global information. Meanwhile, a multiscale coordinate convolution (MsCConv) module is designed, leveraging the advantages of central difference convolution to enhance the retention and expression of high-frequency details. By incorporating multiscale parallel sub-networks, MsCConv ensures the comprehensive preservation of multi-level features. Moreover, the embedded coordinate attention mechanism jointly modulates channel and spatial dimensions, enhancing complementary information while suppressing redundancy. Furthermore, a data-driven adaptive loss weighting strategy is proposed, which can dynamically adjust the contribution of supervision signals based on image feature statistics. This reduces the complexity of hyperparameter tuning while ensuring the loss function more accurately reflects the characteristics of the source images. Experimental results on the RoadScene, TNO, and M³FD public datasets demonstrate that CM-MCNet generates fused images with sharper edge preservation and more natural texture transitions. Additionally, our method achieves superior performance across various objective metrics, including information entropy, standard deviation, spatial frequency, visual information fidelity, and average gradient, outperforming existing state-of-the-art fusion methods. This work provides a novel perspective for infrared and visible image fusion and lays a solid foundation for further advancements in the field.

参考文献/References:: [1] LIU Jinyuan, LIN Runjia, WU Guanyao, et al. Coconet: coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion[J]. International journal of computer vision, 2024, 132(5): 1748-1775.
[2] 蓝鑫, 谷小婧. 基于域适应互增强的多模态图像语义分割[J]. 计算机工程与设计, 2022, 43(9): 2584-2593. LAN Xin, GU Xiaojing. Multi-modal image semantic segmentation based on domain adaptation and mutual enhancement[J]. Computer engineering and design, 2022, 43(9): 2584-2593.
[3] 黎瑞虹, 付志涛, 张韶琛, 等. 基于多注意力机制的红外与可见光图像夜间目标检测[J]. 红外技术, 2024, 46(12): 1371-1379. LI Ruihong, FU Zhitao, ZHANG Shaochen, et al. Nighttime object detection in infrared and visible images based on multi-attention mechanism[J]. Infrared technology, 2024, 46(12): 1371-1379.
[4] SUN Yiming, CAO Bing, ZHU Pengfei, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE transactions on circuits and systems for video technology, 2022, 32(10): 6700-6713.
[5] 张心祎, 谭耀, 邢向磊. 基于物理先验的深度特征融合水下图像复原[J]. 智能系统学报, 2023, 18(6): 1185-1196. ZHANG Xinyi, TAN Yao, XING Xianglei. Deep feature fusion for underwater-image restoration based on physical priors[J]. CAAI transactions on intelligent systems, 2023, 18(6): 1185-1196.
[6] 张志超, 左雷鹏, 邹捷, 等. 基于多模态图像信息的变电设备红外分割方法[J]. 红外技术, 2023, 45(11): 1198-1206. ZHANG Zhichao, ZUO Leipeng, ZOU Jie, et al. Segmentation method of substation equipment infrared based on multimodal image information[J]. Infrared technology, 2023, 45(11): 1198-1206.
[7] 杨爱萍, 刘瑾, 邢金娜, 等. 基于内容特征和风格特征融合的单幅图像去雾网络[J]. 自动化学报, 2023, 49(4): 769-777. YANG Aiping, LIU Jin, XING Jinna, et al. Content feature and style feature fusion network for single image dehazing[J]. Acta automatica sinica, 2023, 49(4): 769-777.
[8] 李景景, 杜梅, 孙滨. 基于卷积神经网络的红外与可见光图像融合方法[J]. 激光杂志, 2024, 45(2): 135-139. LI Jingjing, DU Mei, SUN Bin. Infrared and visible image fusion method based on convolutional neural network[J]. Laser journal, 2024, 45(2): 135-139.
[9] ZHAO Zixiang, XU Shuang, ZHANG Chunxia, et al. DIDFuse: Deep image decomposition for infrared and visible image fusion[EB/OL]. (2020-03-20) [2025-08-20]. https://arxiv.org/abs/2003.09210.
[10] LIANG Pengwei, JIANG Junjun, LIU Xianming, et al. Fusion from decomposition: A self-supervised decomposition approach for image fusion[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 719-735.
[11] LIU Jinyang, DIAN Renwei, LI Shutao, et al. SGFusion: A saliency guided deep-learning framework for pixel-level image fusion[J]. Information fusion, 2023, 91: 205-214.
[12] ZHAO Zixiang, BAI Haowen, ZHANG Jiangshe, et al. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 5906-5916.
[13] 罗小同, 杨汶锦, 曲延云, 等. 基于全局局部协同的非均匀图像去雾方法[J]. 自动化学报, 2024, 50(7): 1-12. LUO Xiaotong, YANG Wenjin, QU Yanyun, et al. Dehazeformer: nonhomogeneous image dehazing with collaborative global-local network[J]. Acta automatica sinica, 2024, 50(7): 1-12.
[14] TANG Wei, HE Fazhi, LIU Yu. YDTR: Infrared and visible image fusion via Y-shape dynamic transformer[J]. IEEE Transactions on Multimedia, 2022, 25: 5413-5428.
[15] TANG Wei, HE Fazhi, LIU Yu, et al. DATFuse: Infrared and visible image fusion via dual attention transformer[J]. IEEE transactions on circuits and systems for video technology, 2023, 33(7): 3159-3172.
[16] MA Jiayi, YU Wei, LIANG Pengwei, et al. FusionGAN: A generative adversarial network for infrared and visible image fusion[J]. Information fusion, 2019, 48: 11-26.
[17] LIU Jinyuan, SHANG Jingjie, LIU Risheng, et al. Attention-guided global-local adversarial learning for detail-preserving multi-exposure image fusion[J]. IEEE transactions on circuits and systems for video technology, 2022, 32(8): 5026-5040.
[18] CHENG Chunyang, XU Tianyang, WU Xiaojun. MUFusion: A general unsupervised image fusion network based on memory unit[J]. Information fusion, 2023, 92: 80-92.
[19] OUYANG Daliang, HE Su, ZHANG Guozhong, et al. Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes: IEEE, 2023: 1-5.
[20] HOU Qibin, ZHOU Daquan, FENG Jiashi. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. Virtual: IEEE, 2021: 13713-13722.
[21] TU Zhengzhong, Talebi H, ZHANG Han, et al. Maxim: Multi-axis mlp for image processing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5769-5780.
[22] ZHANG Hang, WU Chongruo, ZHANG Zhongyue, et al. Resnest: split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 2736-2746.
[23] 刘金平, 吴娟娟, 张荣, 等. 基于结构重参数化与多尺度深度监督的COVID-19胸部CT图像自动分割[J]. 电子学报, 2023, 51(5): 1163-1171. LIU Jinping, WU Juanjuan, ZHANG Rong, et al. Toward automated segmentation of COVID-19 chest CT images based on structural reparameterization and multi-scale deep supervision[J]. Acta electronica sinica, 2023, 51(5): 1163-1171.
[24] WANG Zhou, BOVIK A C, SHEIKHJ H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE transactions on image processing, 2004, 13(4): 600-612.
[25] LI Hui, WU Xiaojun. CrossFuse: a novel cross attention mechanism based infrared and visible image fusion approach[J]. Information fusion, 2024, 103: 102147.
[26] XU Han, MA Jiayi, LE Zhuliang, et al. Fusiondn: a unified densely connected network for image fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020, 34(7): 12484-12491.
[27] TOET A, HOGERVORST M A. Progress in color night vision[J]. Optical engineering, 2012, 51(1): 010901-010901.
[28] LIU Jinyuan, FAN Xin, HUANG Zhanbo, et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5802-5811.
[29] XIE Xiangning, LIU Yuqiao, SUN Yanan, et al. BenchENAS: a benchmarking platform for evolutionary neural architecture search[J]. IEEE transactions on evolutionary computation, 2022, 26(6): 1473-1485.
[30] BROWN M, SüSSTRUNK S. Multi-spectral SIFT for scene category recognition[C]//The 24th IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs: IEEE, 2011: 177-184.
[31] SINGH S, SINGH H, BUENO G, et al. A review of image fusion: Methods, applications and performance metrics[J]. Digital signal processing, 2023, 137: 104020.
[32] SELVRAJU R R, MICHAEL C, ABIISHEK D, et al. Grad-cam: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 618-626.
[33] ZHANG Hao, MA Jiayi. SDNet: a versatile squeeze-and-decomposition network for real-time image fusion[J]. International journal of computer vision, 2021, 129(10): 2761-2785.
[34] HUANG Zhanbo, LIU Jinyuan, FAN Xin, et al. Reconet: Recurrent correction network for fast and efficient multi-modality image fusion[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 539-555.
[35] WANG Di, LIU Jinyuan, LIU Risheng, et al. An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection[J]. Information fusion, 2023, 98: 101828.
[36] XIE Xinyu, CUI Yawen, TAN Tao, et al. Fusionmamba: Dynamic feature enhancement for multimodal image fusion with mamba[J]. Visual intelligence, 2024, 2(1): 37.
[37] TANG Linfeng, YUAN Jiteng, ZHANG Hao, et al. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware[J]. Information fusion, 2022, 83: 79-92.
[38] CHEN L C , ZHU Yukun, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer nature, 2018: 801-818.

相似文献/References:: [1]李洋,焦淑红,孙新童.基于IHS和小波变换的可见光与红外图像融合[J].智能系统学报,2012,7(6):554.
　LI Yang,JIAO Shuhong,SUN Xintong.Fusion of visual and infrared images based on IHS and wavelet transforms[J].CAAI Transactions on Intelligent Systems,2012,7():554.
[2]任晓霞,孙秀明,耿鹏,等.多小波和NSDFB组合域递归滤波多聚焦图像融合[J].智能系统学报,2016,11(2):241.[doi:10.11992/tis.201509017]
　REN Xiaoxia,SUN Xiuming,GENG Peng,et al.Multifocus image fusion using a recursive filter in the combined domain of multiwavelets and NSDFB[J].CAAI Transactions on Intelligent Systems,2016,11():241.[doi:10.11992/tis.201509017]
[3]王建,吴锡生.基于改进的稀疏表示和PCNN的图像融合算法研究[J].智能系统学报,2019,14(5):922.[doi:10.11992/tis.201805045]
　WANG Jian,WU Xisheng.Image fusion based on the improved sparse representation and PCNN[J].CAAI Transactions on Intelligent Systems,2019,14():922.[doi:10.11992/tis.201805045]
[4]姜义,吕荣镇,刘明珠,等.基于生成对抗网络的人脸口罩图像合成[J].智能系统学报,2021,16(6):1073.[doi:10.11992/tis.202012010]
　JIANG Yi,LYU Rongzhen,LIU Mingzhu,et al.Masked face image synthesis based on a generative adversarial network[J].CAAI Transactions on Intelligent Systems,2021,16():1073.[doi:10.11992/tis.202012010]
[5]王凯旋,任福继,倪红军,等.基于循环互相关系数的CGAN温度值图像扩增[J].智能系统学报,2022,17(1):32.[doi:10.11992/tis.202106036]
　WANG Kaixuan,REN Fuji,NI Hongjun,et al.Image amplification for temperature value image based on cyclic cross-correlation coefficient CGAN[J].CAAI Transactions on Intelligent Systems,2022,17():32.[doi:10.11992/tis.202106036]
[6]王凯旋,任福继,倪红军,等.面向电力设备红外图像的温度值识别算法[J].智能系统学报,2022,17(3):617.[doi:10.11992/tis.202105043]
　WANG Kaixuan,REN Fuji,NI Hongjun,et al.Temperature value recognition algorithm for the infrared image of power equipment[J].CAAI Transactions on Intelligent Systems,2022,17():617.[doi:10.11992/tis.202105043]
[7]曲海成,王宇萍,谢梦婷,等.结合亮度感知与密集卷积的红外与可见光图像融合[J].智能系统学报,2022,17(3):643.[doi:10.11992/tis.202104004]
　QU Haicheng,WANG Yuping,XIE Mengting,et al.Infrared and visible image fusion combined with brightness perception and dense convolution[J].CAAI Transactions on Intelligent Systems,2022,17():643.[doi:10.11992/tis.202104004]
[8]张心祎,谭耀,邢向磊.基于物理先验的深度特征融合水下图像复原[J].智能系统学报,2023,18(6):1185.[doi:10.11992/tis.202304038]
　ZHANG Xinyi,TAN Yao,XING Xianglei.Deep feature fusion for underwater-image restoration based on physical priors[J].CAAI Transactions on Intelligent Systems,2023,18():1185.[doi:10.11992/tis.202304038]
[9]陶岩,张辉,黄志鸿,等.面向配电网典型部件的热故障精准判别方法[J].智能系统学报,2025,20(2):506.[doi:10.11992/tis.202311035]
　TAO Yan,ZHANG Hui,HUANG Zhihong,et al.Accurate identification of thermal faults for typical components of distribution networks[J].CAAI Transactions on Intelligent Systems,2025,20():506.[doi:10.11992/tis.202311035]
[10]王文卿,张小乔,何霁,等.基于混合双分支卷积神经网络和图卷积神经网络的全色锐化方法[J].智能系统学报,2025,20(3):649.[doi:10.11992/tis.202401003]
　WANG Wenqing,ZHANG Xiaoqiao,HE Ji,et al.Pansharpening based on hybrid dual-branch convolutional and graph convolutional neural networks[J].CAAI Transactions on Intelligent Systems,2025,20():649.[doi:10.11992/tis.202401003]

备注/Memo

收稿日期:2025-4-1。
基金项目:国家自然科学基金项目(62371187)；湖南省自然科学基金项目(2024JJ8309).
作者简介:刘诗怡，硕士研究生，主要研究方向为机器学习、计算机视觉和图像处理。E-mail：liushiyi@hunnu.edu.cn。;刘金平，教授，博士生导师，主要研究方向为机器学习、模式识别、工业过程监测、故障诊断、计算机视觉。主持、参与国家和省部级科研课题 10 余项，获国家发明专利授权20项。发表学术论文80 余篇。E-mail：ljp@hunnu.edu.cn。;黄丽娟，讲师，主要研究方向为智能控制、机器学习和工业过程控制。主持、参与省部级和市厅级科研课题5项，获国家发明专利授权6项。E-mail：huanglijuan@csmzxy.edu.cn。
通讯作者:刘金平. E-mail：ljp@hunnu.edu.cn

更新日期/Last Update: 2026-01-05

基于多尺度协调卷积与自适应加权的红外与可见光图像融合 PDF下载HTML

备注/Memo

基于多尺度协调卷积与自适应加权的红外与可见光图像融合

PDF下载 HTML