[1]ZHAO Wenqing,ZHAO Zhenhuan,GONG Jiaxiao.Remote sensing image object detection based on inverted residual self-attention mechanism[J].CAAI Transactions on Intelligent Systems,2025,20(1):64-72.[doi:10.11992/tis.202312001]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 1
Page number:
64-72
Column:
学术论文—机器学习
Public date:
2025-01-05
- Title:
-
Remote sensing image object detection based on inverted residual self-attention mechanism
- Author(s):
-
ZHAO Wenqing1; 2; ZHAO Zhenhuan1; GONG Jiaxiao1
-
1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071003, China
-
- Keywords:
-
remote sensing image; object detection; inverted residual; self-attention mechanism; multi-scale; spatial pyramid; feature extraction; feature fusion
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202312001
- Abstract:
-
An inverted residual self-attention method (IRSAM) was proposed in this study as an approach for object detection in remote sensing images. The method was designed to address challenges related to significant variations in object sizes and substantial interference from background information in remote sensing image object detection. Firstly, an inverted residual self-attention mechanism backbone network with strong feature extraction ability was utilized to fully extract the object features, thus reducing the interference of complex background information on the object. Additionally, a multi-scale spatial pyramid pooling module was constructed to offer diverse sensory fields at multiple scales and improve the capacity to detect objects of varying sizes. Finally, a lightweight feature fusion structure was employed to integrate the feature maps extracted from the backbone network, effectively combining low-level and high-level features. The study compared the performance of IRSAM with both traditional network and enhanced object detection algorithms. The results indicated that the proposed method exhibited significantly higher detection accuracy. In addition, ablation experiments were designed on the DIOR and the RSOD datasets. The results show that the mean accuracy is 4.6 and 4.2 percentage points higher than the YOLOv8 algorithm on the DIOR dataset and the RSOD dataset, respectively. Consequently, the proposed method significantly enhances the accuracy of object detection in remote sensing images.