[1]张铭泉,张泽恩,曹锦纲,等.结合Segformer与增强特征金字塔的文本检测方法[J].智能系统学报,2024,19(5):1111-1125.[doi:10.11992/tis.202301013]
 ZHANG Mingquan,ZHANG Zeen,CAO Jingang,et al.Text detection method combining Segformer with an enhanced feature pyramid[J].CAAI Transactions on Intelligent Systems,2024,19(5):1111-1125.[doi:10.11992/tis.202301013]
点击复制

结合Segformer与增强特征金字塔的文本检测方法

参考文献/References:
[1] 朱志颖. 基于深度学习的街景文本检测与识别研究[D]. 南京: 南京邮电大学, 2023.
ZHU Zhiying. Research on street view text detection and recognition based on deep learning[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2023.
[2] 周燕, 韦勤彬, 廖俊玮, 等. 自然场景文本检测与端到端识别: 深度学习方法[J]. 计算机科学与探索, 2023, 17(3): 577-594.
ZHOU Yan, WEI Qinbin, LIAO Junwei, et al. Natural scene text detection and end-to-end recognition: deep learning methods[J]. Journal of frontiers of computer science and technology, 2023, 17(3): 577-594.
[3] 李祥鹏, 闵卫东, 韩清, 等. 基于深度学习的车牌定位和识别方法[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 979-987.
LI Xiangpeng, MIN Weidong, HAN Qing, et al. License plate location and recognition based on deep learning[J]. Journal of computer-aided design & computer graphics, 2019, 31(6): 979-987.
[4] 刘光辉, 张钰敏, 孟月波, 等. 双分支跨级特征融合的自然场景文本检测[J]. 智能系统学报, 2023, 18(5): 1079-1089.
LIU Guanghui, ZHANG Yumin, MENG Yuebo, et al. Natural scene text detection based on double-branch cross-level feature fusion[J]. CAAI transactions on intelligent systems, 2023, 18(5): 1079-1089.
[5] 王润民, 桑农, 丁丁, 等. 自然场景图像中的文本检测综述[J]. 自动化学报, 2018, 44(12): 2113-2141.
WANG Runmin, SANG Nong, DING Ding, et al. Text detection in natural scene image: a survey[J]. Acta automatica sinica, 2018, 44(12): 2113-2141.
[6] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//European conference on computer vision. Cham: Springer, 2016: 21-37.
[7] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[8] JIANG Yingying, ZHU Xiangyu, WANG Xiaobing, et al. R2CNN: rotational region CNN for orientation robust scene text detection[EB/OL]. (2017-06-29)[2023-01-11]. https://arxiv.org/abs/1706.09579.
[9] LIAO Minghui, SHI Baoguang, BAI Xiang, et al. TextBoxes: a fast text detector with a single deep neural network[C]//Proceedings of the AAAI conference on artificial intelligence. San Francisco: AAAI, 2017: 4161-4167.
[10] LIAO Minghui, SHI Baoguang, BAI Xiang. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE transactions on image processing, 2018, 27(8): 3676-3690.
[11] HE Tong, HUANG Weilin, QIAO Yu, et al. Accurate text localization in natural image with cascaded convolutional text network[EB/OL]. (2016-03-31)[2023-01-11]. https://arxiv.org/abs/1603.09423.
[12] LI Yi, WU Zhe, ZHAO Shuang, et al. PSENet: psoriasis severity evaluation network[C]//Proceedings of the AAAI conference on artificial intelligence. Palo Alto: AAAI, 2020: 800-807.
[13] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[14] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[15] WANG Wenhai, XIE Enze, SONG Xiaoge, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 8439-8448.
[16] LIAO Minghui, WAN Zhaoyi, YAO Cong, et al. Real-time scene text detection with differentiable binarizationk[C]//Proceedings of the AAAI conference on artificial intelligence. Palo Alto: AAAI, 2020: 11474-11481.
[17] 邵海琳, 季怡, 刘纯平, 等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022, 49(2): 248-255.
SHAO Hailin, JI Yi, LIU Chunping, et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer science, 2022, 49(2): 248-255.
[18] 雷小唐, 胡靖. 文本中心像素重建实现任意形状的文本检测[J]. 计算机工程与应用, 2023, 59(8): 148-156.
LEI Xiaotang, HU Jing. Text center pixel reconstruction to achieve efficient arbitrary shape text detection[J]. Computer engineering and applications, 2023, 59(8): 148-156.
[19] 梁浩然, 叶凌晨, 梁荣华, 等. 注意力监督策略下的自然场景文本检测算法[J]. 计算机辅助设计与图形学学报, 2022, 34(7): 1011-1019.
LIANG Haoran, YE Lingchen, LIANG Ronghua, et al. Text detection algorithm for natural scenes under attention supervision strategy[J]. Journal of computer-aided design & computer graphics, 2022, 34(7): 1011-1019.
[20] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [EB/OL]. (2020-10-22) [2023-01-11]. https://arxiv.org/abs/2010.11929.
[21] CHU Xiangxiang, TIAN Zhi, ZHANG Bo, et al. Conditional positional encodings for vision transformers[EB/OL]. (2021-02-22) [2023-01-11]. https://arxiv.org/abs/2102.10882.
[22] HAN Kai, XIAO An, WU Enhua, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
[23] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[24] WANG Wenhai, XIE Enze, LI Xiang, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 548-558.
[25] XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in neural information processing systems, 2021, 34: 12077-12090.
[26] HAN Kai, WANG Yunhe, TIAN Qi, et al. GhostNet: more features from cheap operations[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586.
[27] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on Robust Reading[C]//2015 13th International Conference on Document Analysis and Recognition. Tunis: IEEE, 2015: 1156-1160.
[28] HE Mengchao, LIU Yuliang, YANG Zhibo, et al. ICPR2018 contest on robust reading for multi-type web images[C]//2018 24th International Conference on Pattern Recognition. Beijing: IEEE, 2018: 7-12.
[29] ZHANG Chongsheng, PENG Guowen, TAO Yuefeng, et al. ShopSign: a diverse scene text dataset of Chinese shop signs in street views[EB/OL]. (2019-03-25)[2023-01-11]. https://arxiv.org/abs/1903.10412.
[30] LONG Shangbang, RUAN Jiaqiang, ZHANG Wenjie, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[C]//European conference on computer vision. Cham: Springer, 2018: 19-35.
[31] WANG Yuxin, XIE Hongtao, ZHA Zhengjun, et al. ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11750-11759.
[32] ZHANG Shixue, ZHU Xiaobin, HOU Jiebo, et al. Deep relational reasoning graph network for arbitrary shape text detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9696-9705.
[33] ZHU Yiqin, CHEN Jianyong, LIANG Lingyu, et al. Fourier contour embedding for arbitrary-shaped text detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3122-3130.
[34] LIAO Minghui, ZOU Zhisheng, WAN Zhaoyi, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(1): 919-931.
[35] LIU Jinpeng, WU Song, HE Dehong, et al. MS-ROCANet: multi-scale residual orthogonal-channel attention network for scene text detection[C]//2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Singapore: IEEE, 2022: 2200-2204.
[36] MA Ningning, ZHANG Xiangyu, ZHENG Haitao, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[C]//European conference on computer vision. Cham: Springer, 2018: 122-138.
[37] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[38] ZHANG Hang, WU Chongruo, ZHANG Zhongyue, et al. ResNeSt: split-attention networks[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New Orleans: IEEE, 2022: 2735-2745.
[39] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11966-11976.
相似文献/References:
[1]黄剑华,唐降龙,刘家锋,等.一种基于Homogeneity的文本检测新方法[J].智能系统学报,2007,2(1):69.
 HUANG Jian-hua,TANG Xiang-long,LIU Jia-feng,et al.A new method for text detection based on Homogeneity[J].CAAI Transactions on Intelligent Systems,2007,2():69.
[2]赵文清,孔子旭,赵振兵.隔级融合特征金字塔与CornerNet相结合的小目标检测[J].智能系统学报,2021,16(1):108.[doi:10.11992/tis.202004033]
 ZHAO Wenqing,KONG Zixu,ZHAO Zhenbing.Small target detection based on a combination of feature pyramid and CornerNet[J].CAAI Transactions on Intelligent Systems,2021,16():108.[doi:10.11992/tis.202004033]
[3]赵文清,杨盼盼.双向特征融合与注意力机制结合的目标检测[J].智能系统学报,2021,16(6):1098.[doi:10.11992/tis.202012029]
 ZHAO Wenqing,YANG Panpan.Target detection based on bidirectional feature fusion and an attention mechanism[J].CAAI Transactions on Intelligent Systems,2021,16():1098.[doi:10.11992/tis.202012029]
[4]刘光辉,张钰敏,孟月波,等.双分支跨级特征融合的自然场景文本检测[J].智能系统学报,2023,18(5):1079.[doi:10.11992/tis.202303005]
 LIU Guanghui,ZHANG Yumin,MENG Yuebo,et al.Natural scene text detection based on double-branch cross-level feature fusion[J].CAAI Transactions on Intelligent Systems,2023,18():1079.[doi:10.11992/tis.202303005]
[5]曲海成,李瑞柯,王蒙,等.基于特征重用和膨胀卷积的遥感图像舰船检测[J].智能系统学报,2024,19(5):1298.[doi:10.11992/tis.202304002]
 QU Haicheng,LI Ruike,WANG Meng,et al.Ship detection in remote sensing images via feature reuse and dilated convolution[J].CAAI Transactions on Intelligent Systems,2024,19():1298.[doi:10.11992/tis.202304002]

备注/Memo

收稿日期:2023-1-11。
基金项目:中央高校基本科研业务费专项资金项目(2021MS092);河北省省级科技计划项目(22310302D).
作者简介:张铭泉,副教授,主要研究方向为计算机组成、机器学习、模式识别。发表学术论文20余篇。E-mail:mqzhang@ncepu.edu.cn;张泽恩,硕士研究生,主要研究方向为深度学习和文本检测。E-mail:zze15832206526@163.com;曹锦纲,讲师,主要研究方向为图像处理和模式识别。发表学术论文10余篇。E-mail:caojg168@126.com。
通讯作者:曹锦纲. E-mail:caojg168@126.com

更新日期/Last Update: 2024-09-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com