fix missing call to untoggle_optimizer when accumulating gradients #8284

awaelchli · 2021-07-05T09:44:29Z

What does this PR do?

The training loop toggles the optimizers in case there are mutliple, and untoggles after the optimizer step is completed.
However, the untoggle is missing during the accumulation phase, and it is certainly necessary.

Every toggle_optimizer() call needs a matching untoggle_optimizer() call.

Test fails on master

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

I made sure I had fun coding 🙃

codecov · 2021-07-05T09:46:12Z

Codecov Report

Merging #8284 (43c7fb0) into master (ea5cfd2) will decrease coverage by 5%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #8284    +/-   ##
=======================================
- Coverage      93%     88%    -5%     
=======================================
  Files         212     212            
  Lines       13716   13729    +13     
=======================================
- Hits        12747   12075   -672     
- Misses        969    1654   +685

pytorch_lightning/loops/batch/training_batch_loop.py

carmocca

~~Is the milestone correct? Is this also an issue in the bug-fix branch or just after the loop refactor?~~

edit: nvm saw the original reported issue is for 1.3.7

pytorch_lightning/loops/batch/training_batch_loop.py

bmahlbrand · 2021-07-10T18:27:15Z

I know this is closed, but I just pulled master, and tried testing w/3 optimizers instead of 2 (this problem is fixed when I only have 2) and the issue persists.

Check this out: #8365

bmahlbrand · 2021-07-10T18:36:17Z

pytorch_lightning/loops/batch/training_batch_loop.py

@@ -185,20 +185,17 @@ def _run_optimization(
        else:
            if self.trainer.lightning_module.automatic_optimization:
                self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
-                if len(self.trainer.optimizers) > 1:


This assumes len() == 2, consider 3 optimizers - it should be a cyclic toggle instead of boolean

add fix

013cb7b

awaelchli added the bug Something isn't working label Jul 5, 2021

awaelchli added this to the v1.3.x milestone Jul 5, 2021

awaelchli added 4 commits July 5, 2021 12:02

toggle test

e6aff65

re-structure

ab8336b

update changelog

2eb1912

update comment

521ff99

awaelchli commented Jul 5, 2021

View reviewed changes

pytorch_lightning/loops/batch/training_batch_loop.py Show resolved Hide resolved

awaelchli commented Jul 5, 2021

View reviewed changes

pytorch_lightning/loops/batch/training_batch_loop.py Show resolved Hide resolved

awaelchli marked this pull request as ready for review July 5, 2021 10:39

awaelchli requested review from Borda, carmocca, justusschock, kaushikb11, SeanNaren, tchaton and williamFalcon as code owners July 5, 2021 10:39

carmocca approved these changes Jul 5, 2021

View reviewed changes

pytorch_lightning/loops/batch/training_batch_loop.py Outdated Show resolved Hide resolved

remove debugging assertion

43c7fb0

tchaton approved these changes Jul 5, 2021

View reviewed changes

ethanwharris approved these changes Jul 5, 2021

View reviewed changes

ethanwharris enabled auto-merge (squash) July 5, 2021 11:47

ethanwharris merged commit ced2c94 into master Jul 5, 2021

ethanwharris deleted the bugfix/untoggle branch July 5, 2021 11:59

bmahlbrand reviewed Jul 10, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix missing call to untoggle_optimizer when accumulating gradients #8284

fix missing call to untoggle_optimizer when accumulating gradients #8284

awaelchli commented Jul 5, 2021 •

edited

Loading

codecov bot commented Jul 5, 2021 •

edited

Loading

carmocca left a comment •

edited

Loading

bmahlbrand commented Jul 10, 2021 •

edited

Loading

bmahlbrand Jul 10, 2021

fix missing call to untoggle_optimizer when accumulating gradients #8284

fix missing call to untoggle_optimizer when accumulating gradients #8284

Conversation

awaelchli commented Jul 5, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jul 5, 2021 • edited Loading

Codecov Report

carmocca left a comment • edited Loading

Choose a reason for hiding this comment

bmahlbrand commented Jul 10, 2021 • edited Loading

bmahlbrand Jul 10, 2021

Choose a reason for hiding this comment

awaelchli commented Jul 5, 2021 •

edited

Loading

codecov bot commented Jul 5, 2021 •

edited

Loading

carmocca left a comment •

edited

Loading

bmahlbrand commented Jul 10, 2021 •

edited

Loading