[1]ZHANG Mingquan,ZHANG Zeen,CAO Jingang,et al.Text detection method combining Segformer with an enhanced feature pyramid[J].CAAI Transactions on Intelligent Systems,2024,19(5):1111-1125.[doi:10.11992/tis.202301013]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 5
Page number:
1111-1125
Column:
学术论文—机器学习
Public date:
2024-09-05
- Title:
-
Text detection method combining Segformer with an enhanced feature pyramid
- Author(s):
-
ZHANG Mingquan1; 2; ZHANG Zeen1; 2; CAO Jingang1; 2; SHAO Xuqiang1; 2
-
1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Engineering Research Center of intelligent Computing for Complex Energy Systems Ministry of Education, Baoding 071003, China
-
- Keywords:
-
text detection; enhanced feature pyramid; attention mechanism; Segformer; ghost convolution; multiscale feature fusion; average pooling; max pooling
- CLC:
-
TP391.4
- DOI:
-
10.11992/tis.202301013
- Abstract:
-
To address the issues of small-scale text omission, text-like pixel misdetection, and inaccurate edge localization in text detection algorithms for natural scenes, we propose a text detection model based on Segformer and an enhanced feature pyramid. First, the model employs an MiT-B2-based encoder to generate multiscale feature maps. Subsequently, during the upsampling phase of the decoder, a cascaded fusion attention module is introduced, which acquires global channel information and text features through global average pooling, global max pooling, and ghost convolution. Then, a two-level orthogonal fusion attention module utilizes asymmetric convolution to enhance the information in the feature fusion section horizontally and vertically. Finally, the results are post-processed using differentiable binarization. The experiments were conducted on the ICDAR2015, ShopSign1265, and MTWI datasets. Compared with the other eight methods, the proposed method achieved the highest F-values, reaching 87.8%, 59.1%, and 74.8%%, respectively. These results demonstrate that the method effectively improves the accuracy of text detection.