[1]CAO Jingang,ZHANG Zeen,ZHANG Mingquan.A lightweight end-to-end text recognition method based on SPTS[J].CAAI Transactions on Intelligent Systems,2024,19(6):1503-1517.[doi:10.11992/tis.202307012]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 6
Page number:
1503-1517
Column:
学术论文—人工智能基础
Public date:
2024-12-05
- Title:
-
A lightweight end-to-end text recognition method based on SPTS
- Author(s):
-
CAO Jingang1; 2; ZHANG Zeen1; 2; ZHANG Mingquan1; 2
-
1. School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China;
2. Engineering Research Center of intelligent Computing for Complex Energy Systems Ministry of Education, Baoding 071003, China
-
- Keywords:
-
attention module; autoregressive decoder; lightweight network; single point position; text spotting; end to end; encoder; decoder
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202307012
- Abstract:
-
Addressing the problems of slow reasoning speed and the large number of model parameters in existing text spotting methods, this paper presents a lightweight end-to-end text spotting method based on single-point scene text spotting. First, PP-LCNet was introduced as the backbone network for feature extraction. Then, a three-local channel attention module was designed before the decoder, utilizing three different scales of one-dimensional convolution to enhance information interaction between channels. Next, a locally enhanced attention module was proposed to replace the feedforward network component in the original decoder, thereby improving the spatial correlation of text features using depthwise separable convolution. Subsequently, a token selector module was added after each decoder layer to highlight text features with saliency markers and reduce the accumulation of irrelevant pixels. Finally, recognition results were predicted using an autoregressive decoding method. The proposed method was tested on three datasets, namely, Total-Text, CTW1500, and ICDAR2015, and then compared with six advanced methods (ABCNet, MANGO, ABCNet v2, SPTS, SwinTextSpotter, and TESTR). Compared to the SPTS method, the proposed method achieved increments in inference speed of 19.6, 35.7, and 21.1 frames/s, respectively, and reduced the number of parameters by 70.7%, demonstrating its effectiveness.