[1]曹锦纲,张泽恩,张铭泉.改进单点定位模型的轻量级端到端文本识别方法[J].智能系统学报,2024,19(6):1503-1517.[doi:10.11992/tis.202307012]
 CAO Jingang,ZHANG Zeen,ZHANG Mingquan.A lightweight end-to-end text recognition method based on SPTS[J].CAAI Transactions on Intelligent Systems,2024,19(6):1503-1517.[doi:10.11992/tis.202307012]
点击复制

改进单点定位模型的轻量级端到端文本识别方法

参考文献/References:
[1] 刘崇宇, 陈晓雪, 罗灿杰, 等. 自然场景文本检测与识别的深度学习方法[J]. 2021, 26(6): 1330-1367.
LIU Congyu, CHEN Xiaoxue, LUO Canjie, et al. Deep learning methods for scene text detection and recognition[J]. Journal of image and graphics, 2021, 26(6): 1330-1367.
[2] FENG Wei, HE Wenhao, YIN Fei, et al. TextDragon: an end-to-end framework for arbitrary shaped text spotting[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9075-9084.
[3] QIAO Liang, TANG Sanli, CHENG Zhanzhan, et al. Text perceptron: towards end-to-end arbitrary-shaped text spotting[C]//Proceedings of the AAAI conference on artificial intelligence. New York: AAAI, 2020: 11899-11907.
[4] LIU Yuliang, CHEN Hao, SHEN Chunhua, et al. ABCNet: real-time scene text spotting with adaptive bezier-curve network[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9806-9815.
[5] LIU Yuliang, SHEN Chunhua, JIN Lianwen, et al. ABCNet v2: adaptive bezier-curve network for real-time end-to-end text spotting[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(11): 8048-8064.
[6] HUANG Mingxin, LIU Yuliang, PENG Zhenghao, et al. SwinTextSpotter: scene text spotting via better synergy between text detection and text recognition[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4583-4593.
[7] WU Jingjing, LYU Pengyuan, LU Guangming, et al. Decoupling recognition from detection: single shot self-reliant scene text spotter[C]//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa : ACM, 2022: 1319-1328.
[8] PENG Dezhi, WANG Xinyu, LIU Yuliang, et al. SPTS: single-point text spotting[C]//Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: ACM, 2022: 4272-4281.
[9] ZHANG Xiang, SU Yongwen, TRIPATHI S, et al. Text spotting transformers[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 9509-9518.
[10] KITTENPLON Y, LAVI I, FOGEL S, et al. Towards weakly-supervised text spotting using a multi-task transformer[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4594-4603.
[11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017-06-12)[2021-01-01]. http://arxiv.org/abs/1706.03762.
[12] LIU Ze, LIN Yutong, CAO Yue, et al. Swin Transformer: hierarchical Vision Transformer using Shifted Windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[13] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 213-229.
[14] CUI Cheng, GAO Tingquan, WEI Shengyu, et al. PP-LCNet: a lightweight CPU convolutional neural network[EB/OL]. (2021-09-17)[2021-01-01]. http://arxiv.org/abs/2109.15099.
[15] LIU Xuebo, LIANG Ding, YAN Shi, et al. FOTS: fast oriented text spotting with a unified network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5676-5685.
[16] ZU Xinyan, YU Haiyang, LI Bin, et al. Towards accurate video text spotting with text-wise semantic reasoning[C]//Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. Macau: International Joint Conferences on Artificial Intelligence Organization, 2023: 1858-1866.
[17] LYU Pengyuan, LIAO Minghui, YAO Cong, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[C]//European Conference on Computer Vision. Cham: Springer, 2018: 71-88.
[18] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[19] GARCIA-BORDILS S, KARATZAS D, RUSI?OL M. STEP - towards structured scene-text spotting[C]//2024 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2024: 872-881.
[20] LIAO Minghui, PANG Guan, HUANG Jing, et al. Mask TextSpotter v3: segmentation proposal network for robust scene text spotting[C]//VEDALDI A, BISCHOF H, BROX T, et al. European Conference on Computer Vision. Cham: Springer, 2020: 706-722.
[21] WANG Wenhai, XIE Enze, LI Xiang, et al. PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(9): 5349-5367.
[22] RONEN R, TSIPER S, ANSCHEL O, et al. GLASS: global to local attention for scene-text spotting[C]//AVIDAN S, BROSTOW G, CISSé M, et al. European Conference on Computer Vision. Cham: Springer, 2022: 249-266.
[23] LIU Wei, CHEN Chaofeng, WONG K Y. Char-net: a character-aware neural network for distorted scene text recognition[C]//Proceedings of the AAAI conference on artificial intelligence New Orleans: AAAI, 2018.
[24] WANG Pengfei, ZHANG Chengquan, QI Fei, et al. PGNet: real-time arbitrarily-shaped text spotting with point gathering network[J]. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(4): 2782-2790.
[25] QIAO Liang, CHEN Ying, CHENG Zhanzhan, et al. MANGO: a mask attention guided one-stage scene text spotter[J]. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(3): 2467-2476.
[26] YE Maoyuan, ZHANG Jing, ZHAO Shanshan, et al. DeepSolo: let transformer decoder with explicit points solo for text spotting[EB/OL]. (2022-11-19)[2022-12-01]. http://arxiv.org/abs/2211.10772.
[27] ZHU Xizhou, SU Weijie, LU Lewei, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2010-10-18)[2021-01-01]. http://arxiv.org/abs/2010.04159.
[28] CHEN Ting, SAXENA S, LI Lala, et al. Pix2seq: a language modeling framework for object detection[EB/OL]. (2021-09-22)[2021-12-01]. http://arxiv.org/abs/2109.10852.
[29] GUPTA A, VEDALDI A, ZISSERMAN A. Synthetic data for text localisation in natural images[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2315-2324.
[30] KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on Robust Reading[C]//2015 13th International Conference on Document Analysis and Recognition. Tunis: IEEE, 2015: 1156-1160.
[31] LIU Yuliang, JIN Lianwen, ZHANG Shuaitao, et al. Detecting curve text in the wild: new dataset and new solution[EB/OL]. (2017-12-06)[2021-01-01]. http://arxiv.org/abs/1712.02170.
[32] CHNG C K, CHAN C S. Total-text: a comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition. Kyoto: IEEE, 2017: 935-942.
[33] NAYEF N, YIN Fei, BIZID I, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT[C]//2017 14th IAPR International Conference on Document Analysis and Recognition. Kyoto: IEEE, 2017: 1454-1459.
[34] MA Ningning, ZHANG Xiangyu, ZHENG Haitao, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018: 122-138.
[35] HOWARD A, SANDLER M, CHEN Bo, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
[36] HAN Kai, WANG Yunhe, TIAN Qi, et al. GhostNet: more features from cheap operations[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 1577-1586.
[37] TAN Mingxing, LE Q V. MixConv: mixed depthwise convolutional kernels[EB/OL]. (2019-07-22)[2021-01-01]. http://arxiv.org/abs/1907.09595.
[38] WAN A, DAI Xiaoliang, ZHANG Peizhao, et al. FBNetV2: differentiable neural architecture search for spatial and channel dimensions[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12962-12971.

备注/Memo

收稿日期:2023-7-13。
基金项目:中央高校基本科研业务费专项项目(2021MS092).
作者简介:曹锦纲,讲师,博士,主要研究方向为图像处理和模式识别,发表学术论文10余篇。E-mail:caojg168@126.com;张泽恩,硕士研究生,主要研究方向为深度学习和文本检测。E-mail:zze15832206526@126.com;张铭泉,副教授,博士,主要研究方向为计算机组成、机器学习、模式识别,发表学术论文20余篇。E-mail:mqzhang@ncepu.edu.cn。
通讯作者:张铭泉. E-mail:mqzhang@ncepu.edu.cn

更新日期/Last Update: 2024-11-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com