This is an unofficial implementation of paper "SCATTER: Selective Context Attentional Scene Text Recognizer" published at CVPR 2020.
- This work was tested with PyTorch 1.6.0, CUDA 10.2, python 3.6.10 and Ubuntu 18.04.
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
- requirements : lmdb, pillow, nltk, natsort
pip3 install lmdb pillow nltk natsort
- training dataset: MJSynth (MJ)[1], SynthText (ST)[2] and SynthAdd (SA) [3]
- validation datasets : the union of the training sets IC13[4], IC15[5], IIIT[6], and SVT[7].
evaluation datasets : benchmark evaluation datasets, consist of IIIT[5], SVT[7], IC03[8], IC13[4], IC15[5], SVTP[9], and CUTE[10].
Two pretrained models are provided (Will be updated when better models are trained):
- non-senstive: includes ten digits (0-9) and 26 characters (a-z).
- sensitive: includes all readable characters.
Pretrained models can be downloaded here
- With non-sensitve model
python demo.py --saved_model scatter-case-non-sensitive.pth --image_folder <path_to_image_folder>
- With sensitve model
python demo.py --saved_model scatter-case-sensitive.pth --sensitive --image_folder <path_to_image_folder>
Download lmdb dataset for traininig and evaluation provided by deep-text-recognition-benchmark from here
Download addition dataset SynthText_Add (SA) for training from here (includes raw images and lmdb format).
Training
python3 train.py --train_data data_lmdb_release/training --valid_data data_lmdb_release/validation --select_data MJ-ST-SA --batch_ratio 0.4-0.4-0.2 --sensitive
Testing
python3 test.py --eval_data data_lmdb_release/evaluation --saved_model scatter-case-sensitive.pth --sensitive --data_filtering_off
-
Using evaluation set here
-
Compare with result in the original paper and baseline model.
Model | IIIT5K | SVT | IC03 | IC13 | Regular Text | IC15 | SVTP | CUTE | Irregular Text |
---|---|---|---|---|---|---|---|---|---|
Paper (non-sensitive) | 93.7 | 92.7 | 96.3 | 93.9 | 94.0 | 82.2 | 86.9 | 87.5 | 83.7 |
Baseline | 87.9 | 87.5 | 94.9 | 92.3 | 89.8 | 71.8 | 79.2 | 74.0 | 73.6 |
Our (sensitive) | 93.5 | 90.9 | 95.0 | 93.6 | 93.4 | 78.6 | 83.4 | 83.3 | 80.0 |
Our (non-sensitive) | 93.8 | 90.9 | 95.3 | 93.8 | 93.7 | 79.7 | 85.0 | 86.1 | 81.5 |
This code is built upon deep-text-recognition-benchmark.
[1] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.
[2] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.
[3] Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI, 2019
[4] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[5] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.
[6] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.
[7] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.
[8] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.
[9] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.
[10] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.
[11] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.
Please consider citing this work in your publications if it helps your research.
@inproceedings{litman2020scatter,
title={SCATTER: selective context attentional scene text recognizer},
author={Litman, Ron and Anschel, Oron and Tsiper, Shahar and Litman, Roee and Mazor, Shai and Manmatha, R},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11962--11972},
year={2020}
}