Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

saving checkpoint after restoring #534

Closed
lkluo opened this issue Jan 22, 2018 · 2 comments
Closed

saving checkpoint after restoring #534

lkluo opened this issue Jan 22, 2018 · 2 comments

Comments

@lkluo
Copy link

lkluo commented Jan 22, 2018

During continuously training, the program automatically saves checkpoint immediately after it restores the latest one, thus resulting in two checkpoint models with only one step gap, e.g., , model.ckpt-1000 and model.ckpt-1001. I did find difference between these two models via testing, i.e., resulting in different outputs. May I know why saving checkpoints in such a manner? Any special concern?

@martinpopel
Copy link
Contributor

martinpopel commented Jan 22, 2018

The issue with saving two checkpoints (e.g. 1000 and 1001) is a duplicate of #495 (comment).
The issue with two "neighboring" checkpoints leading to different results needs more details: Is the difference between the BLEU (supposing you do MT) significant? I think some difference should be expected even after one step (otherwise, the model would never train anything), but it should not be big in most cases.

@lkluo
Copy link
Author

lkluo commented Jan 22, 2018

@martinpopel Yes, I trained an MT model. The BLEU difference is tiny for my test data (around 15k sentences), but the translations vary in terms of fluency for some sentences.

@lkluo lkluo closed this as completed Jan 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants