[1]LIU Guanghui,ZHANG Yumin,MENG Yuebo,et al.Natural scene text detection based on double-branch cross-level feature fusion[J].CAAI Transactions on Intelligent Systems,2023,18(5):1079-1089.[doi:10.11992/tis.202303005]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
18
Number of periods:
2023 5
Page number:
1079-1089
Column:
学术论文—机器学习
Public date:
2023-09-05
- Title:
-
Natural scene text detection based on double-branch cross-level feature fusion
- Author(s):
-
LIU Guanghui; ZHANG Yumin; MENG Yuebo; ZHAN Hua
-
School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
-
- Keywords:
-
text detection; arbitrarily shaped; cross-level feature distribution enhancement; adaptive fusion; double branch; spatial dimension; channel dimension; differentiable binarization
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202303005
- Abstract:
-
Current scene text detection methods cause the inaccurate location of text regions and false detection of adjacent texts due to the influence of complex backgrounds in arbitrarily shaped texts. To solve this issue, a natural scene text detection method based on double-branch cross-level feature fusion is proposed. First, the initial features were extracted using Resnet50 as the backbone network, and then a cross-level feature distribution enhancement module was designed to improve the interaction of cross-level feature text information and the expression ability of features. Second, an adaptive fusion strategy was proposed to filter nontext or redundant features adaptively and reduce the false and missed detection rates using the double-branch structure to strengthen the relationship between different dimensional features and optimize the fusion process. Last, the differential binarization method was used to yield text detection results in the prediction phase. The proposed method was employed to perform ablation experiments on the ICDAR2015, ICDAR2017, Total-Text, and CTW1500 datasets. The findings revealed that this method can accurately locate the text area and overcome the impact of text miss and false detections.