Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for load_from_checkpoint #2776

Merged
merged 21 commits into from
Oct 5, 2020
Merged

Conversation

jbschiratti
Copy link
Contributor

@jbschiratti jbschiratti commented Jul 31, 2020

What does this PR do?

Aims at fixes #2550 and fixes #2769

This PR should allow users to load model weights from checkpoints even though self.save_hyperparameters() may not have been called before training. A test was added reproduce the problem described below.

Context

In #2550, the user defines his own LightningModule called MNISTModel . In the __init__ method of MNISTModel, the call to self.save_hyperparameters() is missed. Still, the model is trained and a model checkpoint is created. Later, the user wishes to restore model weights from this checkpoint using:

model = MNISTModel.load_from_checkpoint('mnist.ckpt', net='cnn', h_dim=200)

However, this call to load_from_checkpoint raises an error. As described here, this error is raised because, in load_from_checkpoint, the model to be restored receives multiple values for its class arguments. This problem is also described in #2769

PR review

Anyone in the community is free to review the PR!

cc @Borda @rohitgr7

@mergify mergify bot requested a review from a team July 31, 2020 09:19
@Borda Borda added allowed_pre_1.0 bug Something isn't working priority: 0 High priority task labels Jul 31, 2020
@ananyahjha93 ananyahjha93 self-requested a review July 31, 2020 19:00
@jbschiratti jbschiratti changed the title Fix for load_from_checkpoint [WIP] Fix for load_from_checkpoint Aug 3, 2020
@jbschiratti
Copy link
Contributor Author

@Borda : test_no_val_module (with url_ckpt=True) is passing locally but keeps failing on the CI. Do you have any idea why? Shall I add a pip freeze here?

@Borda
Copy link
Member

Borda commented Aug 3, 2020

@Borda : test_no_val_module (with url_ckpt=True) is passing locally but keeps failing on the CI. Do you have any idea why? Shall I add a pip freeze here?

mind rather use one of our devel images - pytorchlightning/pytorch_lightning:cuda-extras-py3.7-torch1.6.0

@edenlightning edenlightning linked an issue Aug 3, 2020 that may be closed by this pull request
@jbschiratti jbschiratti changed the title [WIP] Fix for load_from_checkpoint Fix for load_from_checkpoint Aug 4, 2020
@jbschiratti
Copy link
Contributor Author

@Borda Thanks! I found the problem and fixed it. The PR should be ready to be reviewed.

@mergify mergify bot requested a review from a team August 4, 2020 15:40
@Borda
Copy link
Member

Borda commented Aug 4, 2020

Borda Thanks! I found the problem and fixed it. The PR should be ready to be reviewed.

cool, mind share what was the problem?

@Borda Borda added this to the 0.9.0 milestone Aug 6, 2020
@pep8speaks
Copy link

pep8speaks commented Aug 6, 2020

Hello @jbschiratti! Thanks for updating this PR.

Line 178:121: E501 line too long (124 > 120 characters)

Line 541:50: W605 invalid escape sequence '('
Line 541:52: W605 invalid escape sequence ')'
Line 543:1: W391 blank line at end of file

Comment last updated at 2020-10-05 15:25:01 UTC

@Borda
Copy link
Member

Borda commented Aug 6, 2020

@jbschiratti seems like I made a mistake in rebasing, mind help to get it back... probably just yous your local version...

@mergify
Copy link
Contributor

mergify bot commented Aug 12, 2020

This pull request is now in conflict... :(

@awaelchli
Copy link
Contributor

Hi @jbschiratti, I just looked at this PR to try and fix the failed tests, but I find it hard to understand all of your changes. I understand the issue in #2550 and why it happens, and I can verify that your PR fixes the problem, but unfortunately the current state seems to fail other tests.

@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2020

This pull request is now in conflict... :(

@williamFalcon
Copy link
Contributor

williamFalcon commented Aug 15, 2020

@jbschiratti @awaelchli ... the added test also passes on master. Not sure what this is fixing now.

Either the test is wrong, or this is not a bug on master

Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it's a real bug

@mergify mergify bot requested a review from a team August 15, 2020 13:05
@mergify
Copy link
Contributor

mergify bot commented Aug 20, 2020

This pull request is now in conflict... :(

@edenlightning edenlightning removed this from the 0.9.0 milestone Aug 20, 2020
@Borda Borda added the ready PRs ready to be merged label Oct 5, 2020
@mergify mergify bot requested a review from a team October 5, 2020 07:44
@mergify
Copy link
Contributor

mergify bot commented Oct 5, 2020

This pull request is now in conflict... :(

@williamFalcon williamFalcon merged commit cea5f1f into Lightning-AI:master Oct 5, 2020
@Borda Borda modified the milestones: 1.0, 0.10.0 Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working checkpointing Related to checkpointing priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
8 participants