Recommend: TextBoxes++ is an extended work of TextBoxes, which supports oriented scene text detection. The recognition part is also included in TextBoxes++.
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard nonmaximum suppression. For more details, please refer to our paper.
Please cite TextBoxes in your publications if it helps your research:
@inproceedings{LiaoSBWL17,
author = {Minghui Liao and
Baoguang Shi and
Xiang Bai and
Xinggang Wang and
Wenyu Liu},
title = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
booktitle = {AAAI},
year = {2017}
}
- Get the code. We will call the directory that you cloned Caffe into
$CAFFE_ROOT
git clone https://github.com/MhLiao/TextBoxes.git
cd TextBoxes
make -j8
make py
- Models trained on ICDAR 2013: Dropbox link BaiduYun link
- Fully convolutional reduced (atrous) VGGNet: Dropbox link BaiduYun link
- Compiled mex file for evaluation(for multi-scale test evaluation: evaluation_nms.m): Dropbox link BaiduYun link
- run "python examples/demo.py".
- You can modify the "use_multi_scale" in the "examples/demo.py" script to control whether to use multi-scale or not.
- The results are saved in the "examples/results/".
- Train about 50k iterions on Synthetic data which refered in the paper.
- Train about 2k iterions on corresponding training data such as ICDAR 2013 and SVT.
- For more information, such as learning rate setting, please refer to the paper.
- Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
- Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.
- More performance information, please refer to the paper and Task1 and Task4 of Challenge2 on the ICDAR 2015 website: http://rrc.cvc.uab.es/?ch=2&com=evaluation
The reference xml file is as following:
<?xml version="1.0" encoding="utf-8"?>
<annotation>
<object>
<name>text</name>
<bndbox>
<xmin>158</xmin>
<ymin>128</ymin>
<xmax>411</xmax>
<ymax>181</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<bndbox>
<xmin>443</xmin>
<ymin>128</ymin>
<xmax>501</xmax>
<ymax>169</ymax>
</bndbox>
</object>
<folder></folder>
<filename>100.jpg</filename>
<size>
<width>640</width>
<height>480</height>
<depth>3</depth>
</size>
</annotation>
Please let me know if you encounter any issues.