-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce that the optimizer closure is executed when optimizer_step
is overridden
#9360
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9360 +/- ##
======================================
Coverage 92% 92%
======================================
Files 179 179
Lines 14910 14978 +68
======================================
+ Hits 13760 13826 +66
- Misses 1150 1152 +2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
@carmocca it looks like this PR introduced a test failure in https://github.com/PyTorchLightning/pytorch-lightning/runs/3543485388 |
I can't think of a reason why the changes in this PR would make that test fail. Since 1.10 testing comes from nightly, it might be caused by a change in PyTorch and this was the first PR to see it. |
result = self._result | ||
self._result = None # free memory | ||
if self._result is None: | ||
raise MisconfigurationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca FB had a problem here where they got the error message but not calling the closure was actually intentional. Are we ok with downgrading this to a warning message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can downgrade this to a warning if necessary, however, if there isn't a reason for allowing this, I still believe this should be an error.
We can make this a warning and a deprecation and then convert this back to an error two minor releases later.
@ananthsub what do you think of this solution? Do you have an argument for why skipping the closure should be allowed?
What does this PR do?
Part of #9349
This has the advantage that the
ClosureResult
is no longerOptional
.Does your PR introduce any breaking changes? If yes, please list them.
optimizer_step
now requires that the closure is executed. Not doing it would skip the entiretraining_step
,zero_grad
, andbackward
. We consider this to be a bug so it's better to explicitly fail. Not doing it could also break progress tracking and fault tolerance.Before submitting
PR review