[1]刘光辉,张钰敏,孟月波,等.双分支跨级特征融合的自然场景文本检测[J].智能系统学报,2023,18(5):1079-1089.[doi:10.11992/tis.202303005]
LIU Guanghui,ZHANG Yumin,MENG Yuebo,et al.Natural scene text detection based on double-branch cross-level feature fusion[J].CAAI Transactions on Intelligent Systems,2023,18(5):1079-1089.[doi:10.11992/tis.202303005]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第5期
页码:
1079-1089
栏目:
学术论文—机器学习
出版日期:
2023-09-05
- Title:
-
Natural scene text detection based on double-branch cross-level feature fusion
- 作者:
-
刘光辉, 张钰敏, 孟月波, 占华
-
西安建筑科技大学 信息与控制工程学院, 陕西 西安 710055
- Author(s):
-
LIU Guanghui, ZHANG Yumin, MENG Yuebo, ZHAN Hua
-
School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
-
- 关键词:
-
文本检测; 任意形状; 跨级特征分布增强; 自适应融合; 双分支; 空间维度; 通道维度; 可微分二值化
- Keywords:
-
text detection; arbitrarily shaped; cross-level feature distribution enhancement; adaptive fusion; double branch; spatial dimension; channel dimension; differentiable binarization
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202303005
- 摘要:
-
现有的场景文本检测方法在处理任意形状文本时,由于复杂背景的影响会造成文本区域定位不准确、相邻文本漏检误检的问题,基于此提出一种双分支跨级特征融合的自然场景文本检测方法。首先,以Resnet50为主干网络提取初始特征,设计跨级特征分布增强模块(cross-level feature distribution enhancement module,CFDEM),增强跨级特征文本信息的交互性,提高特征的表达能力;然后,为自适应地选择过滤非文本或冗余特征,降低误检率和漏检率,提出自适应融合策略(adaptive fusion strategy,AFS),利用双分支结构加强不同维度特征之间的联系,优化融合过程;最后,预测阶段采用可微分二值化的方法来生成文本检测结果。所提方法在ICDAR2015、ICDAR2017、Total-Text、CTW1500数据集上进行消融实验,实验结果表明该方法能准确定位文本区域,克服文本漏检误检影响。
- Abstract:
-
Current scene text detection methods cause the inaccurate location of text regions and false detection of adjacent texts due to the influence of complex backgrounds in arbitrarily shaped texts. To solve this issue, a natural scene text detection method based on double-branch cross-level feature fusion is proposed. First, the initial features were extracted using Resnet50 as the backbone network, and then a cross-level feature distribution enhancement module was designed to improve the interaction of cross-level feature text information and the expression ability of features. Second, an adaptive fusion strategy was proposed to filter nontext or redundant features adaptively and reduce the false and missed detection rates using the double-branch structure to strengthen the relationship between different dimensional features and optimize the fusion process. Last, the differential binarization method was used to yield text detection results in the prediction phase. The proposed method was employed to perform ablation experiments on the ICDAR2015, ICDAR2017, Total-Text, and CTW1500 datasets. The findings revealed that this method can accurately locate the text area and overcome the impact of text miss and false detections.
更新日期/Last Update:
1900-01-01