- We propose a novel dictionary-guided sense text recognition approach that could be used to improve many state-of-the-art models.
- We also introduce a new benchmark dataset (namely, VinText) for Vietnamese scene text recognition.
Comparison between the traditional approach and our proposed approach. |
Details of the dataset construction, model architecture, and experimental results can be found in our following paper:
@inproceedings{m_Nguyen-etal-CVPR21,
author = {Nguyen Nguyen and Thu Nguyen and Vinh Tran and Triet Tran and Thanh Ngo and Thien Nguyen and Minh Hoai},
title = {Dictionary-guided Scene Text Recognition},
year = {2021},
booktitle = {Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)},
}
Please CITE our paper whenever our dataset or model implementation is used to help produce published results or incorporated into other software.
We introduce ✨ a new VinText dataset.
By downloading this dataset, USER agrees:
- to use this dataset for research or educational purposes only
- to not distribute or part of this dataset in any original or modified form.
- and to cite our paper whenever this dataset are employed to help produce published results.
Name | #imgs | #text instances | Examples |
---|---|---|---|
VinText | 2000 | About 56000 |
Detail about ✨ VinText dataset can be found in our paper. Download Converted dataset to try with our model
Dataset variant | Input format | Link download |
---|---|---|
Original | x1,y1,x2,y2,x3,y3,x4,y4,TRANSCRIPT | Download here |
Converted dataset | COCO format | Download here |
Extract data and copy folder to folder datasets/
datasets
└───vintext
└───test.json
│train.json
|train_images
|test_images
└───evaluation
└───gt_vintext.zip
- python=3.7
- torch==1.4.0
- detectron2==0.2
conda create -n dict-guided -y python=3.7
conda activate dict-guided
conda install -y pytorch torchvision cudatoolkit=10.0 -c pytorch
python -m pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX pyclipper Polygon3 weighted-levenshtein editdistance
# Install Detectron2
python -m pip install detectron2==0.2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu100/torch1.4/index.html
git clone https://github.com/nguyennm1024/dict-guided.git
cd dict-guided
python setup.py build develop
Prepare folders
mkdir sample_input
mkdir sample_output
Copy your images to sample_input/
. Output images would result in sample_output/
python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input sample_input/ --output sample_output/ --opts MODEL.WEIGHTS path-to-trained_model-checkpoint
Qualitative Results on VinText. |
For training, we employed the pre-trained model tt_attn_R_50 from the ABCNet repository for initialization.
python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_tt_attn_R_50_checkpoint
Example:
python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./tt_attn_R_50.pth
Trained model output will be saved in the folder output/batext/vintext/
that is then used for evaluation
python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_trained_model_checkpoint
Example:
python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./output/batext/vintext/trained_model.pth
This repository is built based-on ABCNet