saving checkpoint after restoring #534

lkluo · 2018-01-22T10:23:18Z

During continuously training, the program automatically saves checkpoint immediately after it restores the latest one, thus resulting in two checkpoint models with only one step gap, e.g., , model.ckpt-1000 and model.ckpt-1001. I did find difference between these two models via testing, i.e., resulting in different outputs. May I know why saving checkpoints in such a manner? Any special concern?

martinpopel · 2018-01-22T14:02:01Z

The issue with saving two checkpoints (e.g. 1000 and 1001) is a duplicate of #495 (comment).
The issue with two "neighboring" checkpoints leading to different results needs more details: Is the difference between the BLEU (supposing you do MT) significant? I think some difference should be expected even after one step (otherwise, the model would never train anything), but it should not be big in most cases.

lkluo · 2018-01-22T14:18:36Z

@martinpopel Yes, I trained an MT model. The BLEU difference is tiny for my test data (around 15k sentences), but the translations vary in terms of fluency for some sentences.

lkluo closed this as completed Jan 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

saving checkpoint after restoring #534

saving checkpoint after restoring #534

lkluo commented Jan 22, 2018

martinpopel commented Jan 22, 2018 •

edited

Loading

lkluo commented Jan 22, 2018

saving checkpoint after restoring #534

saving checkpoint after restoring #534

Comments

lkluo commented Jan 22, 2018

martinpopel commented Jan 22, 2018 • edited Loading

lkluo commented Jan 22, 2018

martinpopel commented Jan 22, 2018 •

edited

Loading