[1]LIANG Liming,FENG Yao,LONG Pengwei,et al.Remote sensing image object detection based on MobileViT and multiscale feature aggregation[J].CAAI Transactions on Intelligent Systems,2024,19(5):1168-1177.[doi:10.11992/tis.202310022]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 5
Page number:
1168-1177
Column:
学术论文—机器感知与模式识别
Public date:
2024-09-05
- Title:
-
Remote sensing image object detection based on MobileViT and multiscale feature aggregation
- Author(s):
-
LIANG Liming; FENG Yao; LONG Pengwei; LI Renjie
-
School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
-
- Keywords:
-
deep learning; remote sensing image; object detection; YOLOv7-tiny; MobileViT module; multi-scale feature fusion; contextual information; Wise-IoU
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202310022
- Abstract:
-
A new algorithm is proposed based on MobileViT and multi-scale feature aggregation (referred to as FWM-YOLOv7t) to address problems such as complex background interference, difficulty in extracting small objects, and object multi-scale differences in remote sensing image object detection. First, we design a multi-scale feature aggregation module to establish context dependencies for remote sensing targets, which improves the accuracy of detecting multi-scale and small targets. Then, we utilize the MobileViT module to fuse the advantages of convolutional neural networks and vision transformers for effective local and global information encoding to suppress non-target noise interference. Finally, we introduce the Wise-IoU loss function, which focuses on ordinary quality anchor boxes to enhance the detection performance of the algorithm. Experimental evaluations on the public RSOD and NWPU VHR-10 dataset demonstrate that FWM-YOLOv7t can significantly improve the average accuracy of remote sensing image target detection. Furthermore, compared with other object detection algorithms, the FWM-YOLOv7t algorithm exhibits superior effectiveness in detecting complex, small, and multiscale objects in remote sensing imagery.