[1]LIU Shiyi,LIU Jinping,HUANG Lijuan,et al.Infrared and visible image fusion based on multi-scale coordinated convolution and adaptive weighting[J].CAAI Transactions on Intelligent Systems,2026,21(1):95-108.[doi:10.11992/tis.202504002]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
21
Number of periods:
2026 1
Page number:
95-108
Column:
学术论文—机器感知与模式识别
Public date:
2026-03-05
- Title:
-
Infrared and visible image fusion based on multi-scale coordinated convolution and adaptive weighting
- Author(s):
-
LIU Shiyi1; LIU Jinping1; HUANG Lijuan2; JIANG Jiahao1; SONG Dianyi3; YANG Guangyi4
-
1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China;
2. Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center, Changsha 410004, China;
3. Basic Education College, National University of Defense Technology, Changsha 410072, China;
4. Hunan Institute of Metrology and Testing, Changsha 410081, China
-
- Keywords:
-
image fusion; infrared image; visible image; multiscale coordinate convolution; convolutional multilayer perceptron; coordinate attention; adaptive weighting
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202504002
- Abstract:
-
To address the limitations of convolution neural networks-based image fusion models, such as restricted global information perception, high-frequency detail preservation, and the loss function weights configuration, this article proposes a convolution and multilayer perceptron-integrated multiscale coordinate network (CM-MCNet) for high-quality infrared and visible image fusion. In the encoder of CM-McNet, a convolutional weighted permute multilayer perceptron module is introduced to enhance spatial understanding by simulating feature permutation and integrates an adaptive feature reweighting mechanism to effectively capture global information. Meanwhile, a multiscale coordinate convolution (MsCConv) module is designed, leveraging the advantages of central difference convolution to enhance the retention and expression of high-frequency details. By incorporating multiscale parallel sub-networks, MsCConv ensures the comprehensive preservation of multi-level features. Moreover, the embedded coordinate attention mechanism jointly modulates channel and spatial dimensions, enhancing complementary information while suppressing redundancy. Furthermore, a data-driven adaptive loss weighting strategy is proposed, which can dynamically adjust the contribution of supervision signals based on image feature statistics. This reduces the complexity of hyperparameter tuning while ensuring the loss function more accurately reflects the characteristics of the source images. Experimental results on the RoadScene, TNO, and M3FD public datasets demonstrate that CM-MCNet generates fused images with sharper edge preservation and more natural texture transitions. Additionally, our method achieves superior performance across various objective metrics, including information entropy, standard deviation, spatial frequency, visual information fidelity, and average gradient, outperforming existing state-of-the-art fusion methods. This work provides a novel perspective for infrared and visible image fusion and lays a solid foundation for further advancements in the field.