-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OCR CTC model #596
Add OCR CTC model #596
Conversation
1. Split data reader and train script. 2. Wrapper some function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add README.md like https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification
2. Need to add test part in next PR.
fluid/ocr_ctc/train.py
Outdated
add_arg('device', int, -1, "Device id.'-1' means running on CPU" | ||
"while '0' means GPU-0.") | ||
# yapf: disable | ||
def _to_lodtensor(data, place): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python/paddle/v2/fluid/executor.py
can process the sequence data. This can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code to process sequence was commented.
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/executor.py#L108
fluid/ocr_ctc/train.py
Outdated
res.set_lod([lod]) | ||
return res | ||
|
||
def _get_feeder_data(data, place): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, only add prefix _
for the non-exposing function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regard _get_feeder_data
as an internal function of the training module. So I add prefix _
according to google python code style.
fluid/ocr_ctc/train.py
Outdated
label_tensor = _to_lodtensor(map(lambda x: x[1], data), place) | ||
return {"pixel": pixel_tensor, "label": label_tensor} | ||
|
||
def _ocr_conv(input, num, with_bn, param_attrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ocr_conv -> conv_group ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an internal function. So I add prefix _
.
fluid/ocr_ctc/train.py
Outdated
return conv4 | ||
|
||
|
||
def _ocr_ctc_net(images, num_classes, param_attrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ocr_ctc_net -> ctc_net ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an internal function. So I add prefix _
.
fluid/ocr_ctc/train.py
Outdated
label=label, | ||
size=num_classes + 1, | ||
blank=num_classes, | ||
norm_by_times=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norm_by_times=True -> norm_by_times=False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norm_by_times
means whether to divide gradients by sequence length.
With mean
op, gradients were divided by batch_size.
If we want to avoid the effect of mean
op, it's more reasonable to remove mean_grad
op but not make norm_by_times=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in current code, the target to be minimized by optimizer is cost
but not avg_cost
.
# define cost and optimizer
113 cost = fluid.layers.warpctc(
114 input=fc_out,
115 label=label,
116 size=num_classes + 1,
117 blank=num_classes,
118 norm_by_times=True)
119 avg_cost = fluid.layers.mean(x=cost)
120 optimizer = fluid.optimizer.Momentum(
121 learning_rate=args.learning_rate, momentum=args.momentum)
122 opts = optimizer.minimize(cost)
后续log再整理清晰一些。 |
fluid/ocr_ctc/train.py
Outdated
def main(): | ||
args = parser.parse_args() | ||
print_arguments(args) | ||
train(l2=args.l2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train() 的参数直接是 args更简单一些吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
fluid/ocr_ctc/train.py
Outdated
norm_by_times=True) | ||
avg_cost = fluid.layers.mean(x=cost) | ||
optimizer = fluid.optimizer.Momentum( | ||
learning_rate=learning_rate / batch_size, momentum=momentum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learning_rate / batch_size -> learning_rate
超参数里不用考虑batch_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
fluid/ocr_ctc/train.py
Outdated
num_classes = data_reader.num_classes() | ||
# define network | ||
param_attrs = fluid.ParamAttr( | ||
regularizer=fluid.regularizer.L2Decay(l2 * batch_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
l2 * batch_size -> l2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
fluid/ocr_ctc/train.py
Outdated
def _ocr_ctc_net(images, num_classes, param_attrs): | ||
conv_features = _ocr_conv(images, 8, True, param_attrs) | ||
sliced_feature = fluid.layers.im2sequence( | ||
input=conv_features, stride=[1, 1], filter_size=[1, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sliced_feature输出layout是NCHW
filter_size=[1, 3]
-> filter_size=[1, sliced_feature.shape[2]]
更通用一些,输入图片的height变了之后,这里也不用改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Fixed.
1. Remove 'ocr_ctc' directory to 'ocr'. 2. Init README.md 3. Fix learning rate and l2 4. Refine training log format 5. Reduce arguments of train function 6. Set filter_size of im2sequence dynamicly 7. Add fc op before GRU op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- fluid/ocr -> fluid/ocr_recognition
- If verify the forward network, please add an inference.py
fluid/ocr/ctc_train.py
Outdated
#distributed under the License is distributed on an "AS IS" BASIS, | ||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
#See the License for the specific language governing permissions and | ||
#limitations under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other models, there is no copyright, so remove it?
fluid/ocr/ctc_train.py
Outdated
conv4 = _conv_block(conv3, 128, (num / 4), with_bn) | ||
return conv4 | ||
|
||
def _ocr_ctc_net(images, num_classes, param_attrs, rnn_hidden_size=200): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
关于命名,觉得可以和其他配置模型保持一致, 我看其他配置里没加 _ 前缀。
-
被其他文件import, ocr_conv这样的配置也是可以用的吧。
-
_ocr_conv, _ocr_ctc_net这样的命名都不好
- 这组conv不是ocr特有
- ocr_ctc_net里并没有ctc
fluid/ocr/ctc_train.py
Outdated
size=num_classes + 1, | ||
blank=num_classes, | ||
norm_by_times=True) | ||
avg_cost = fluid.layers.mean(x=cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面已经把模型的定义隔离了, 这里的def train()里又包含了一部分网络,隔离不干净!
可以在另一个文件里定义网络:比如叫 crnn_ctc_model.py
这样后续attention模型也可以继续加个文件,复用train.py。
1. Move all network defining to 'crnn_ctc_model.py' 2. Add initilizer for some layers 3. Rename 'fluid/ocr' to 'fluid/ocr_recognition' 4. Remove copyright 5. Rename some functions
2. Add inference script 3. Add load model script 4. Add some functions into ctc_reader
fix #591
A test result on random dummy data as below: