[1]赵文清,赵振寰,巩佳潇.结合倒残差自注意力机制的遥感图像目标检测[J].智能系统学报,2025,20(1):64-72.[doi:10.11992/tis.202312001]
ZHAO Wenqing,ZHAO Zhenhuan,GONG Jiaxiao.Remote sensing image object detection based on inverted residual self-attention mechanism[J].CAAI Transactions on Intelligent Systems,2025,20(1):64-72.[doi:10.11992/tis.202312001]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第1期
页码:
64-72
栏目:
学术论文—机器学习
出版日期:
2025-01-05
- Title:
-
Remote sensing image object detection based on inverted residual self-attention mechanism
- 作者:
-
赵文清1,2, 赵振寰1, 巩佳潇1
-
1. 华北电力大学 控制与计算机工程学院, 河北 保定 071003;
2. 河北省能源电力知识计算重点实验室, 河北 保定 071003
- Author(s):
-
ZHAO Wenqing1,2, ZHAO Zhenhuan1, GONG Jiaxiao1
-
1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China
-
- 关键词:
-
遥感图像; 目标检测; 倒残差; 自注意力机制; 多尺度; 空间金字塔; 特征提取; 特征融合
- Keywords:
-
remote sensing image; object detection; inverted residual; self-attention mechanism; multi-scale; spatial pyramid; feature extraction; feature fusion
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202312001
- 摘要:
-
针对遥感图像目标检测存在背景信息干扰严重、待检测目标尺寸差异大等问题,提出一种结合倒残差自注意力机制的目标检测方法。首先,使用具有强特征提取能力的倒残差自注意力机制骨干网络充分提取目标特征,降低复杂背景信息的干扰;其次,构造多尺度空间金字塔池化模块,提供多尺度感受野,增强捕捉不同尺寸目标的能力;最后,提出轻量级特征融合模块,对骨干网络提取的特征图进行融合,充分结合低层与高层特征,提高网络对不同尺寸目标的检测能力。与传统网络及其他改进目标检测算法进行对比,实验发现该方法的检测精度明显优于其他算法。此外,在DIOR数据集和RSOD数据集上设计消融实验,结果表明,该方法在DIOR数据集与RSOD数据集上的平均精度均值比YOLOv8算法分别提升4.6和4.2百分点,明显提升遥感图像目标检测的精度。
- Abstract:
-
An inverted residual self-attention method (IRSAM) was proposed in this study as an approach for object detection in remote sensing images. The method was designed to address challenges related to significant variations in object sizes and substantial interference from background information in remote sensing image object detection. Firstly, an inverted residual self-attention mechanism backbone network with strong feature extraction ability was utilized to fully extract the object features, thus reducing the interference of complex background information on the object. Additionally, a multi-scale spatial pyramid pooling module was constructed to offer diverse sensory fields at multiple scales and improve the capacity to detect objects of varying sizes. Finally, a lightweight feature fusion structure was employed to integrate the feature maps extracted from the backbone network, effectively combining low-level and high-level features. The study compared the performance of IRSAM with both traditional network and enhanced object detection algorithms. The results indicated that the proposed method exhibited significantly higher detection accuracy. In addition, ablation experiments were designed on the DIOR and the RSOD datasets. The results show that the mean accuracy is 4.6 and 4.2 percentage points higher than the YOLOv8 algorithm on the DIOR dataset and the RSOD dataset, respectively. Consequently, the proposed method significantly enhances the accuracy of object detection in remote sensing images.
备注/Memo
收稿日期:2023-12-1。
基金项目:国家自然科学基金项目(62371188);河北省自然科学基金项目(F2021502013);中央高校基本科研业务费面上项目(2020MS153,2021PT018).
作者简介:赵文清,教授,博士,主要研究方向为人工智能与图像处理。获河北省科技进步二等奖、三等奖各1项。发表学术论文50余篇。E-mail:zhaowenqing@ncepu.edu.cn。;赵振寰,硕士研究生,主要研究方向为深度学习与遥感图像处理。E-mail:zhenhuan_zhao@ncepu.edu.cn。;巩佳潇,硕士研究生,主要研究方向为人工智能与图像处理。E-mail:jiaxiao_gong@ncepu.edu.cn。
通讯作者:赵文清. E-mail:zhaowenqing@ncepu.edu.cn
更新日期/Last Update:
2025-01-05