<-Previous Article Next Article->

[1]SHAO Yuxiao,LU Tao,WANG Zhenyu,et al.Human detection algorithm in infrared images combining multi-scale large kernel convolution[J].CAAI Transactions on Intelligent Systems,2025,20(4):787-799.[doi:10.11992/tis.202404027]

Copy

Human detection algorithm in infrared images combining multi-scale large kernel convolution

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 4 Page number: 787-799 Column: 学术论文—机器学习 Public date: 2025-08-05

Title:: Human detection algorithm in infrared images combining multi-scale large kernel convolution

Author(s):: SHAO Yuxiao¹; LU Tao²; WANG Zhenyu¹; PENG Yongjie¹; YAO Wei¹; 1. The School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China;
2. The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science, Beijing 100190, China

Keywords:: infrared image; object detection; reconstruction attention; multi-scale feature; large kernel convolution; convolutional neural network; feature extraction; re-parameterization

CLC:: TP391.4

DOI:: 10.11992/tis.202404027

Abstract:: Aiming at the problems of low image resolution and inconspicuous human features in the human detection task of infrared images under the ruins environment, an infrared image human detection network re-parameterization multi-scale large kernel convolution(RML-YOLO) is designed based on the YOLO framework, which includes re-parameterization and multi-scale large kernel convolution. The network, RML-YOLO, reconfigures the spatial and channel reconstruction attention module to focus on regions that are more important for the detection task. Edge features are strengthened by the Sobel operator to improve the detection ability of human with different poses. The validity of RML-YOLO is verified on a homegrown dataset. With only 1.8×10⁶ learnable parameters, the AP₅₀ and AP_50-75 of the model reach 91.2% and 87.3%, respectively, which are improved by 4.4% and 5.3% compared with YOLOv8-n with similar number of parameters. The results show that RML-YOLO significantly improves the accuracy of human detection in the ruins environment using infrared images.

References:: [1] 高荣伟. 人类应对“气候紧急状态”, 需快速强力行动[J]. 世界文化, 2021(4): 4-7.
GAO Rongwei. To cope with the “climate emergency”, human beings need to act quickly and forcefully[J]. World culture, 2021(4): 4-7.
[2] 郑学召, 杨卓瑞, 郭军, 等. 灾后救援生命探测仪的现状和发展趋势[J]. 工矿自动化, 2023, 49(6): 104-111.
ZHENG Xuezhao, YANG Zhuorui, GUO Jun, et al. The current status and development trend of post-disaster rescue life detectors[J]. Journal of mine automation, 2023, 49(6): 104-111.
[3] 苏卫华, 吴航, 张西正, 等. 救援机器人研究起源、发展历程与问题[J]. 军事医学, 2014, 38(12): 981-985.
SU Weihua, WU Hang, ZHANG Xizheng, et al. Rescue robot research: origin, development and future[J]. Military medical sciences, 2014, 38(12): 981-985.
[4] 曲海成, 王宇萍, 谢梦婷, 等. 结合亮度感知与密集卷积的红外与可见光图像融合[J]. 智能系统学报, 2022, 17(3): 643-652.
QU Haicheng, WANG Yuping, XIE Mengting, et al. Infrared and visible image fusion combined with brightness perception and dense convolution[J]. CAAI transactions on intelligent systems, 2022, 17(3): 643-652.
[5] 张铭津, 周楠, 李云松. 平滑交互式压缩网络的红外小目标检测算法[J]. 西安电子科技大学学报, 2024, 51(4): 1-14.
ZHANG Mingjin, ZHOU Nan, LI Yunsong. Smooth interactive compression network for infrared small target detection[J]. Journal of Xidian University, 2024, 51(4): 1-14.
[6] 吴一非, 杨瑞, 吕其深, 等. 红外与可见光图像融合: 统计分析, 深度学习方法和未来展望[J]. 激光与光电子学进展, 2024, 61(14): 42-60.
WU Yifei, YANG Rui, LYU Qishen, et al. Infrared and visible image fusion: statistical analysis, deep learning methods and future prospects[J]. Laser & optoelectronics progress, 2024, 61(14): 42-60.
[7] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2024-04-01]. https://arxiv.org/abs/1804.02767.
[10] BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2024-04-01]. https://arxiv.org/abs/2004.10934.
[11] JOCHER G. Ultralytics YOLOv5[EB/OL]. (2022-11-22)[2024-04-01]. https://github.com/ultralytics/yolov5.
[12] JOCHER G, CHAURASIA A, QIU Jing. Ultralytics YOLOv8[EB/OL]. (2023-01-22) [2024-04-01]. https://github.com/ultralytics/ultralytics.
[13] LI Chuyin, LI Lu, JIANG Hongliang, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07)[2024-04-01]. https://arxiv.org/abs/2209.02976.
[14] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464-7475.
[15] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//IEEE/CVF International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[16] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[17] GIRSHICK R. Fast R-CNN[C]//IEEE/CVF International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[18] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[19] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
[21] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 510-519.
[22] QIN Zequn, ZHANG Pengyi, WU Fei, et al. FcaNet: frequency channel attention networks[C]//IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 763-772.
[23] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision. Zurich: Springer, 2014: 346-361.
[24] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[25] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768.
[26] CHEN Yuming, YUAN Xinbin, WANG Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2025, 47(6): 4240-4252.
[27] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations. San Diego: ICLR, 2014.
[28] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[29] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
[30] DING Xiaohan, ZHANG Xiangyu, MA Ningning, et al. RepVGG: making VGG-style ConvNets great again[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13733-13742.
[31] DING Xiaohan, ZHANG Xiangyu, HAN Jungong, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953-11965.
[32] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[33] TAN Mingxing, PANG Ruoming, LE Q V. EfficientDet: scalable and efficient object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10781-10790.
[34] GAO Shanghua, CHENG Mingming, ZHAO Kai, et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(2): 652-662.
[35] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[36] GE Zheng, LIU Songtao, WANG Feng, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2024-04-01]. https://arxiv.org/abs/2107.08430.
[37] 菲力尔. FLIR ONE Pro红外热像仪[EB/OL]. (2018-01-01)[2024-04-01]. https://www.flir.cn/products/flir-one-pro/?vertical=condition%20monitoring&segment=solutions.
TELEDYNE FILR. FLIR ONE prothermal imaging camera[EB/OL]. (2018-01-01)[2024-04-01]. https://www.flir.cn/products/flir-one-pro/?vertical=condition%20monitoring&segment=solutions.
[38] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[39] LYU Chengqi, ZHANG Wenwei, HUANG Haian, et al. RTMDet: an empirical study of designing real-time object detectors[EB/OL]. (2022-12-16)[2024-04-01]. https://arxiv.org/abs/2212.07784.
[40] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//IEEE/CVF International Conference on Computer Vision. Venice: IEEE, 2017: 618-626.
[41] JIA Xinyu, ZHU Chuang, LI Minzhen, et al. LLVIP: a visible-infrared paired dataset for low-light vision[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 3489-3497.
[42] LUO Wenjie, LI Yujia, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[C]//Neural Information Processing Systems. Long Beach: MIT Press, 2016: 29.

Similar References:

Memo

Last Update: 1900-01-01

Human detection algorithm in infrared images combining multi-scale large kernel convolution PDF DownloadHTML

Memo

Human detection algorithm in infrared images combining multi-scale large kernel convolution

PDF Download HTML