Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine document and scripts of CTC model. #798

Merged
merged 8 commits into from
Apr 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 178 additions & 3 deletions fluid/ocr_recognition/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,179 @@
# OCR Model

[toc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This model built with paddle fluid is still under active development and is not
the final version. We welcome feedbacks.
运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本。

# Optical Character Recognition

这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fluid -> Fluid


## 1. CRNN-CTC

本章的任务是识别含有单行汉语字符图片,首先采用卷积将图片转为`features map`, 然后使用`im2sequence op`将`features map`转为`sequence`,经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss,最终的评估指标为`instance error rate`。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

features map换成中文吧,叫特征图。
sequence-> 序列。

经过双向GRU RNN得到每个step的汉语字符的概率分布

实际模型里并没有得到概率分布。

通过双向GRU学习到序列特征。

第一出现的CTC地方,需要中文。

instance error rate: 需要解释明白, 可以写成: 样本级别的错误率。


本路径下各个文件的作用如下:

- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()` 和 `test()` 分别产生训练集和测试集的数据迭代器。
- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
- **ctc_train.py :** 用于模型的训练,可通过命令`python train.py --help` 获得使用方法。
- **inference.py :** 加载训练好的模型文件,对新数据进行预测。可通过命令`python inference.py --help` 获得使用方法。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference.py -> infer.py

- **eval.py :** 评估模型在指定数据集上的效果。可通过命令`python inference.py --help` 获得使用方法。
- **utility.py :** 实现的一些通用方法,包括参数配置、tensor的构造等。


### 1.1 数据

数据的下载和简单预处理都在`ctc_reader.py`中实现。

#### 1.1.1 数据格式

我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的中文字符串,这些图片都是经过检测算法进行预框选处理的。

<p align="center">
<img src="images/demo.jpg" width="620" hspace='10'/> <br/>
<strong>图 1</strong>
</p>

在训练集中,每张图片对应的label是由若干数字组成的sequence。 Sequence中的每个数字表示一个字符在字典中的index。 `图1` 对应的label如下所示:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每张图片对应的label是由若干数字组成的sequence。Sequence中的每个数字表示一个字符在字典中的index。

每张图片对应的label是汉字在词典中的索引。

```
3835,8371,7191,2369,6876,4162,1938,168,1517,4590,3793
```
在上边这个label中,`3835` 表示字符‘两’的index,`4590` 表示中文字符逗号的index。


#### 1.1.2 数据准备

**A. 训练集**

我们需要把所有参与训练的图片放入同一个文件夹,暂且记为`train_images`。然后用一个list文件存放每张图片的信息,包括图片大小、图片名称和对应的label,这里暂记该list文件为`train_list`,其格式如下所示:

```
185 48 00508_0215.jpg 7740,5332,2369,3201,4162
48 48 00197_1893.jpg 6569
338 48 00007_0219.jpg 4590,4788,3015,1994,3402,999,4553
150 48 00107_4517.jpg 5936,3382,1437,3382
...
157 48 00387_0622.jpg 2397,1707,5919,1278
```

<center>文件train_list</center>

上述文件中的每一行表示一张图片,每行被空格分为四列,前两列分别表示图片的宽和高,第三列表示图片的名称,第四列表示该图片对应的sequence label。
最终我们应有以下类似文件结构:

```
|-train_data
|- train_list
|- train_imags
|- 00508_0215.jpg
|- 00197_1893.jpg
|- 00007_0219.jpg
| ...
```

在训练时,我们通过选项`--train_images` 和 `--train_list` 分别设置准备好的`train_images` 和`train_list`。


>**注:** 如果`--train_images` 和 `--train_list`都未设置或设置为None, ctc_reader.py会自动下载使用[示例数据](http://cloud.dlnel.org/filepub/?uuid=df937251-3c0b-480d-9a7b-0080dfeee65c),并将其缓存到`$HOME/.cache/paddle/dataset/ctc_data/data/` 路径下。


**B. 测试集和评估集**

测试集、评估集的准备方式与训练集相同。
在训练阶段,测试集的路径通过train.py的选项`--test_images` 和 `--test_list` 来设置。
在评估时,评估集的路径通过eval.py的选项`--input_images_dir` 和`--input_images_list` 来设置。

**C. 待预测数据集**

预测支持三种形式的输入:

第一种:设置`--input_images_dir`和`--input_images_list`, 与训练集类似, 只不过list文件中的最后一列可以放任意占位字符或字符串,如下所示:

```
185 48 00508_0215.jpg s
48 48 00197_1893.jpg s
338 48 00007_0219.jpg s
...
```

第二种:仅设置`--input_images_list`, 其中list文件中只需放图片的完整路径,如下所示:

```
data/test_images/00000.jpg
data/test_images/00001.jpg
data/test_images/00003.jpg
```

第三种:从stdin读入一张图片的path,然后进行一次inference.

#### 1.2 训练

使用默认数据在GPU单卡上训练:

```
env CUDA_VISIABLE_DEVICES=0 python ctc_train.py
```

使用默认数据在GPU多卡上训练:

```
env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True
```

执行`python ctc_train.py --help`可查看更多使用方式和参数详细说明。

图2为使用默认参数和默认数据集训练的收敛曲线,其中横坐标轴为训练pass数,纵轴为在测试集上的sequence_error.

<p align="center">
<img src="images/train.jpg" width="620" hspace='10'/> <br/>
<strong>图 2</strong>
</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thx.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 给出的图的同时,解释下图的意思, 以及说下seq error是多少。 有train的seq error吗?如果有画两条?
  2. 是否需要给出train和test的cost图?




### 1.3 评估

通过以下命令调用评估脚本用指定数据集对模型进行评估:

```
env CUDA_VISIBLE_DEVICE=0 python eval.py \
--model_path="./models/model_0" \
--input_images_dir="./eval_data/images/" \
--input_images_list="./eval_data/eval_list\" \
```

执行`python ctc_train.py --help`可查看参数详细说明。


### 1.4 预测

从标准输入读取一张图片的路径,并对齐进行预测:

```
env CUDA_VISIBLE_DEVICE=0 python inference.py \
--model_path="models/model_00044_15000"
```

执行上述命令进行预测的效果如下:

```
----------- Configuration Arguments -----------
use_gpu: True
input_images_dir: None
input_images_list: None
model_path: /home/work/models/fluid/ocr_recognition/models/model_00052_15000
------------------------------------------------
Init model from: /home/work/models/fluid/ocr_recognition/models/model_00052_15000.
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0060.jpg
result: [3298 2371 4233 6514 2378 3298 2363]
Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
这样的路径在文档中,对用户不友好。

这里可以使用图1的图片吗? 输出结果可以换成在词典中转换之后的汉字吗?

result: [2067 2067 8187 8477 5027 7191 2431 1462]
```

从文件中批量读取图片路径,并对其进行预测:

```
env CUDA_VISIBLE_DEVICE=0 python inference.py \
--model_path="models/model_00044_15000" \
--input_images_list="data/test.list"
```
2 changes: 1 addition & 1 deletion fluid/ocr_recognition/crnn_ctc_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def ctc_train_net(images, label, args, num_classes):
gradient_clip = None
if args.parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places)
pd = fluid.layers.ParallelDo(places, use_nccl=True)
with pd.do():
images_ = pd.read_input(images)
label_ = pd.read_input(label)
Expand Down
74 changes: 60 additions & 14 deletions fluid/ocr_recognition/ctc_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ def train_reader(self, img_root_dir, img_label_list, batchsize):
Reader interface for training.

:param img_root_dir: The root path of the image for training.
:type file_list: str
:type img_root_dir: str

:param img_label_list: The path of the <image_name, label> file for training.
:type file_list: str
:type img_label_list: str

'''

Expand Down Expand Up @@ -91,10 +91,10 @@ def test_reader(self, img_root_dir, img_label_list):
Reader interface for inference.

:param img_root_dir: The root path of the images for training.
:type file_list: str
:type img_root_dir: str

:param img_label_list: The path of the <image_name, label> file for testing.
:type file_list: list
:type img_label_list: str
'''

def reader():
Expand All @@ -111,6 +111,42 @@ def reader():

return reader

def infer_reader(self, img_root_dir=None, img_label_list=None):
'''A reader interface for inference.

:param img_root_dir: The root path of the images for training.
:type img_root_dir: str

:param img_label_list: The path of the <image_name, label> file for
inference. It should be the path of <image_path> file if img_root_dir
was None. If img_label_list was set to None, it will read image path
from stdin.
:type img_root_dir: str
'''

def reader():
if img_label_list is not None:
for line in open(img_label_list):
if img_root_dir is not None:
# h, w, img_name, labels
img_name = line.split(' ')[2]
img_path = os.path.join(img_root_dir, img_name)
else:
img_path = line.strip("\t\n\r")
img = Image.open(img_path).convert('L')
img = np.array(img) - 127.5
img = img[np.newaxis, ...]
yield img, label
else:
while True:
img_path = raw_input("Please input the path of image: ")
img = Image.open(img_path).convert('L')
img = np.array(img) - 127.5
img = img[np.newaxis, ...]
yield img, [[0]]

return reader


def num_classes():
'''Get classes number of this dataset.
Expand All @@ -124,21 +160,31 @@ def data_shape():
return DATA_SHAPE


def train(batch_size):
def train(batch_size, train_images_dir=None, train_list_file=None):
generator = DataGenerator()
data_dir = download_data()
return generator.train_reader(
path.join(data_dir, TRAIN_DATA_DIR_NAME),
path.join(data_dir, TRAIN_LIST_FILE_NAME), batch_size)
if train_images_dir is None:
data_dir = download_data()
train_images_dir = path.join(data_dir, TRAIN_DATA_DIR_NAME)
if train_list_file is None:
train_list_file = path.join(data_dir, TRAIN_LIST_FILE_NAME)
return generator.train_reader(train_images_dir, train_list_file, batch_size)


def test(batch_size=1, test_images_dir=None, test_list_file=None):
generator = DataGenerator()
if test_images_dir is None:
data_dir = download_data()
test_images_dir = path.join(data_dir, TEST_DATA_DIR_NAME)
if test_list_file is None:
test_list_file = path.join(data_dir, TEST_LIST_FILE_NAME)
return paddle.batch(
generator.test_reader(test_images_dir, test_list_file), batch_size)


def test(batch_size=1):
def inference(infer_images_dir=None, infer_list_file=None):
generator = DataGenerator()
data_dir = download_data()
return paddle.batch(
generator.test_reader(
path.join(data_dir, TRAIN_DATA_DIR_NAME),
path.join(data_dir, TRAIN_LIST_FILE_NAME)), batch_size)
generator.infer_reader(infer_images_dir, infer_list_file), 1)


def download_data():
Expand Down
Loading