Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug of resume model: #1130

Closed
luckycaicai opened this issue Jun 23, 2021 · 4 comments
Closed

bug of resume model: #1130

luckycaicai opened this issue Jun 23, 2021 · 4 comments

Comments

@luckycaicai
Copy link

Original code:

meta.update(self.meta)

Fix to:
self.meta.update(meta)

As well as,

meta.update(self.meta)

@zhouzaida zhouzaida assigned dreamerlin and unassigned dreamerlin Jun 24, 2021
@zhouzaida
Copy link
Collaborator

meta is the information to be saved in the checkpoint rather than self.meta

@luckycaicai
Copy link
Author

I know that, but the trouble is, you have initialized self.meta in base_runner.resume, like this:

self.meta = checkpoint['meta']

but not do any update it after epoch/iter.

For example, when I resume model from epoch=4, iter=200, and save model after 1 epoch, after the line,

meta.update(self.meta)

meta is updated by self.meta, the saved information is always epoch=4, iter=200.
It will be always the meta you initialized.

so, the following line maybe the simplest solution to fix it.
self.meta.update(meta)

@ChiefGodMan
Copy link

I think you

I know that, but the trouble is, you have initialized self.meta in base_runner.resume, like this:

self.meta = checkpoint['meta']

but not do any update it after epoch/iter.
For example, when I resume model from epoch=4, iter=200, and save model after 1 epoch, after the line,

meta.update(self.meta)

meta is updated by self.meta, the saved information is always epoch=4, iter=200.
It will be always the meta you initialized.
so, the following line maybe the simplest solution to fix it.
self.meta.update(meta)

I think you are right. Actually I had trained my model to 7 epoches, but after I resume from ckpt it returned to previous 4 epoches.

@zhouzaida
Copy link
Collaborator

zhouzaida commented Jun 24, 2021

@luckycaicai @ailias , thanks for your feedback, the issue will be resolved by the PR #1108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants