<-上一篇/Previous Article 下一篇/Next Article->

[1]邵煜潇,鲁涛,王震宇,等.结合多尺度大核卷积的红外图像人体检测算法[J].智能系统学报,2025,20(4):787-799.[doi:10.11992/tis.202404027]
　SHAO Yuxiao,LU Tao,WANG Zhenyu,et al.Human detection algorithm in infrared images combining multi-scale large kernel convolution[J].CAAI Transactions on Intelligent Systems,2025,20(4):787-799.[doi:10.11992/tis.202404027]

点击复制

结合多尺度大核卷积的红外图像人体检测算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第4期页码: 787-799 栏目: 学术论文—机器学习出版日期: 2025-08-05

Title:: Human detection algorithm in infrared images combining multi-scale large kernel convolution

作者:: 邵煜潇¹, 鲁涛², 王震宇¹, 彭勇杰¹, 姚巍¹; 1. 华北电力大学控制与计算机工程学院, 北京 102206;
2. 中国科学院自动化研究所多模态人工智能系统全国重点实验室, 北京 100190

Author(s):: SHAO Yuxiao¹, LU Tao², WANG Zhenyu¹, PENG Yongjie¹, YAO Wei¹; 1. The School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China;
2. The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science, Beijing 100190, China

关键词:: 红外图像; 目标检测; 重构注意力; 多尺度特征; 大核卷积; 卷积神经网络; 特征提取; 重参数化

Keywords:: infrared image; object detection; reconstruction attention; multi-scale feature; large kernel convolution; convolutional neural network; feature extraction; re-parameterization

分类号:: TP391.4

DOI:: 10.11992/tis.202404027

文献标志码:: 2025-2-24

摘要:: 针对废墟环境下红外图像人体检测任务中存在的图像分辨率低且人体特征不明显的问题，基于YOLO框架设计了一种包含重参数化(re-parameterization)和多尺度大核卷积(multi-scale large kernel convolution)的红外图像人体检测网络RML-YOLO(re-parameterization multi-scale large kernel convolution)。该网络通过空间和通道重构注意力模块，将注意值集中到对检测任务更重要的区域。通过Sobel算子强化边缘特征，提高对不同姿态人体的检测能力。RML-YOLO的有效性在自制数据集上得到验证。在只有1.8×10⁶可学习参数的情况下，模型的AP₅₀和AP_50-75分别达到了91.2%和87.3%，与参数量相近的YOLOv8-n相比分别提高了4.4%和5.3%。结果表明，RML-YOLO显著提高了利用红外图像进行废墟环境下人体检测的精度。

Abstract:: Aiming at the problems of low image resolution and inconspicuous human features in the human detection task of infrared images under the ruins environment, an infrared image human detection network re-parameterization multi-scale large kernel convolution(RML-YOLO) is designed based on the YOLO framework, which includes re-parameterization and multi-scale large kernel convolution. The network, RML-YOLO, reconfigures the spatial and channel reconstruction attention module to focus on regions that are more important for the detection task. Edge features are strengthened by the Sobel operator to improve the detection ability of human with different poses. The validity of RML-YOLO is verified on a homegrown dataset. With only 1.8×10⁶ learnable parameters, the AP₅₀ and AP_50-75 of the model reach 91.2% and 87.3%, respectively, which are improved by 4.4% and 5.3% compared with YOLOv8-n with similar number of parameters. The results show that RML-YOLO significantly improves the accuracy of human detection in the ruins environment using infrared images.

参考文献/References:: [1] 高荣伟. 人类应对“气候紧急状态”, 需快速强力行动[J]. 世界文化, 2021(4): 4-7.
GAO Rongwei. To cope with the “climate emergency”, human beings need to act quickly and forcefully[J]. World culture, 2021(4): 4-7.
[2] 郑学召, 杨卓瑞, 郭军, 等. 灾后救援生命探测仪的现状和发展趋势[J]. 工矿自动化, 2023, 49(6): 104-111.
ZHENG Xuezhao, YANG Zhuorui, GUO Jun, et al. The current status and development trend of post-disaster rescue life detectors[J]. Journal of mine automation, 2023, 49(6): 104-111.
[3] 苏卫华, 吴航, 张西正, 等. 救援机器人研究起源、发展历程与问题[J]. 军事医学, 2014, 38(12): 981-985.
SU Weihua, WU Hang, ZHANG Xizheng, et al. Rescue robot research: origin, development and future[J]. Military medical sciences, 2014, 38(12): 981-985.
[4] 曲海成, 王宇萍, 谢梦婷, 等. 结合亮度感知与密集卷积的红外与可见光图像融合[J]. 智能系统学报, 2022, 17(3): 643-652.
QU Haicheng, WANG Yuping, XIE Mengting, et al. Infrared and visible image fusion combined with brightness perception and dense convolution[J]. CAAI transactions on intelligent systems, 2022, 17(3): 643-652.
[5] 张铭津, 周楠, 李云松. 平滑交互式压缩网络的红外小目标检测算法[J]. 西安电子科技大学学报, 2024, 51(4): 1-14.
ZHANG Mingjin, ZHOU Nan, LI Yunsong. Smooth interactive compression network for infrared small target detection[J]. Journal of Xidian University, 2024, 51(4): 1-14.
[6] 吴一非, 杨瑞, 吕其深, 等. 红外与可见光图像融合: 统计分析, 深度学习方法和未来展望[J]. 激光与光电子学进展, 2024, 61(14): 42-60.
WU Yifei, YANG Rui, LYU Qishen, et al. Infrared and visible image fusion: statistical analysis, deep learning methods and future prospects[J]. Laser & optoelectronics progress, 2024, 61(14): 42-60.
[7] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[9] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2024-04-01]. https://arxiv.org/abs/1804.02767.
[10] BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2024-04-01]. https://arxiv.org/abs/2004.10934.
[11] JOCHER G. Ultralytics YOLOv5[EB/OL]. (2022-11-22)[2024-04-01]. https://github.com/ultralytics/yolov5.
[12] JOCHER G, CHAURASIA A, QIU Jing. Ultralytics YOLOv8[EB/OL]. (2023-01-22) [2024-04-01]. https://github.com/ultralytics/ultralytics.
[13] LI Chuyin, LI Lu, JIANG Hongliang, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07)[2024-04-01]. https://arxiv.org/abs/2209.02976.
[14] WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 7464-7475.
[15] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//IEEE/CVF International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[16] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[17] GIRSHICK R. Fast R-CNN[C]//IEEE/CVF International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.
[18] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[19] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
[21] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 510-519.
[22] QIN Zequn, ZHANG Pengyi, WU Fei, et al. FcaNet: frequency channel attention networks[C]//IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 763-772.
[23] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision. Zurich: Springer, 2014: 346-361.
[24] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[25] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8759-8768.
[26] CHEN Yuming, YUAN Xinbin, WANG Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2025, 47(6): 4240-4252.
[27] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations. San Diego: ICLR, 2014.
[28] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[29] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
[30] DING Xiaohan, ZHANG Xiangyu, MA Ningning, et al. RepVGG: making VGG-style ConvNets great again[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13733-13742.
[31] DING Xiaohan, ZHANG Xiangyu, HAN Jungong, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953-11965.
[32] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[33] TAN Mingxing, PANG Ruoming, LE Q V. EfficientDet: scalable and efficient object detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10781-10790.
[34] GAO Shanghua, CHENG Mingming, ZHAO Kai, et al. Res2Net: a new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(2): 652-662.
[35] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[36] GE Zheng, LIU Songtao, WANG Feng, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. (2021-08-06)[2024-04-01]. https://arxiv.org/abs/2107.08430.
[37] 菲力尔. FLIR ONE Pro红外热像仪[EB/OL]. (2018-01-01)[2024-04-01]. https://www.flir.cn/products/flir-one-pro/?vertical=condition%20monitoring&segment=solutions.
TELEDYNE FILR. FLIR ONE prothermal imaging camera[EB/OL]. (2018-01-01)[2024-04-01]. https://www.flir.cn/products/flir-one-pro/?vertical=condition%20monitoring&segment=solutions.
[38] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
[39] LYU Chengqi, ZHANG Wenwei, HUANG Haian, et al. RTMDet: an empirical study of designing real-time object detectors[EB/OL]. (2022-12-16)[2024-04-01]. https://arxiv.org/abs/2212.07784.
[40] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//IEEE/CVF International Conference on Computer Vision. Venice: IEEE, 2017: 618-626.
[41] JIA Xinyu, ZHU Chuang, LI Minzhen, et al. LLVIP: a visible-infrared paired dataset for low-light vision[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 3489-3497.
[42] LUO Wenjie, LI Yujia, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[C]//Neural Information Processing Systems. Long Beach: MIT Press, 2016: 29.

相似文献/References:: [1]胡光龙,秦世引.动态成像条件下基于SURF和Mean shift的运动目标高精度检测[J].智能系统学报,2012,7(1):61.
　HU Guanglong,QIN Shiyin.High precision detection of a mobile object under dynamic imaging based on SURF and Mean shift[J].CAAI Transactions on Intelligent Systems,2012,7():61.
[2]韩峥,刘华平,黄文炳,等.基于Kinect的机械臂目标抓取[J].智能系统学报,2013,8(2):149.[doi:10.3969/j.issn.1673-4785.201212038]
　HAN Zheng,LIU Huaping,HUANG Wenbing,et al.Kinect-based object grasping by manipulator[J].CAAI Transactions on Intelligent Systems,2013,8():149.[doi:10.3969/j.issn.1673-4785.201212038]
[3]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
　HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10():769.[doi:10.11992/tis.201410010]
[4]曾宪华,易荣辉,何姗姗.流形排序的交互式图像分割[J].智能系统学报,2016,11(1):117.[doi:10.11992/tis.201505037]
　ZENG Xianhua,YI Ronghui,HE Shanshan.Interactive image segmentation based on manifold ranking[J].CAAI Transactions on Intelligent Systems,2016,11():117.[doi:10.11992/tis.201505037]
[5]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(3):366.[doi:10.11992/tis.201706040]
　GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13():366.[doi:10.11992/tis.201706040]
[6]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(6):967.[doi:10.11992/tis.201801025]
　MO Hongwei,WANG Haibo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13():967.[doi:10.11992/tis.201801025]
[7]宁欣,李卫军,田伟娟,等.一种自适应模板更新的判别式KCF跟踪方法[J].智能系统学报,2019,14(1):121.[doi:10.11992/tis.201806038]
　NING Xin,LI Weijun,TIAN Weijuan,et al.Adaptive template update of discriminant KCF for visual tracking[J].CAAI Transactions on Intelligent Systems,2019,14():121.[doi:10.11992/tis.201806038]
[8]伍鹏瑛,张建明,彭建,等.多层卷积特征的真实场景下行人检测研究[J].智能系统学报,2019,14(2):306.[doi:10.11992/tis.201710019]
　WU Pengying,ZHANG Jianming,PENG Jian,et al.Research on pedestrian detection based on multi-layer convolution feature in real scene[J].CAAI Transactions on Intelligent Systems,2019,14():306.[doi:10.11992/tis.201710019]
[9]刘召,张黎明,耿美晓,等.基于改进的Faster R-CNN高压线缆目标检测方法[J].智能系统学报,2019,14(4):627.[doi:10.11992/tis.201905026]
　LIU Zhao,ZHANG Liming,GENG Meixiao,et al.Object detection of high-voltage cable based on improved Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2019,14():627.[doi:10.11992/tis.201905026]
[10]单义,杨金福,武随烁,等.基于跳跃连接金字塔模型的小目标检测[J].智能系统学报,2019,14(6):1144.[doi:10.11992/tis.201905041]
　SHAN Yi,YANG Jinfu,WU Suishuo,et al.Skip feature pyramid network with a global receptive field for small object detection[J].CAAI Transactions on Intelligent Systems,2019,14():1144.[doi:10.11992/tis.201905041]

备注/Memo

收稿日期:2024-4-22。
作者简介:邵煜潇，硕士研究生，主要研究方向为计算机视觉和模式识别。E-mail：yx_shao@ncepu.edu.cn。;鲁涛，副研究员，主要研究方向为智能机器人控制、人机交互、操作技能学习、模仿学习。发表学术论文50余篇，授权国家发明专利20项。E-mail：tao.lu@ia.ac.cn。;王震宇，教授，博士生导师，主要研究方向为模式识别、计算机视觉。主持国家自然科学基金等科研项目5项，2019年获吴文俊人工智能科学技术奖。E-mail：zywang@ncepu.edu.cn。
通讯作者:王震宇. E-mail：zywang@ncepu.edu.cn

更新日期/Last Update: 1900-01-01

结合多尺度大核卷积的红外图像人体检测算法 PDF下载HTML

备注/Memo

结合多尺度大核卷积的红外图像人体检测算法

PDF下载 HTML