-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create rule S6986: "optimizer.zero_grad()" should be used in conjunct…
…ion with "optimizer.step()" and "loss.backward()"
- Loading branch information
Showing
2 changed files
with
73 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,90 @@ | ||
FIXME: add a description | ||
|
||
// If you want to factorize the description uncomment the following line and create the file. | ||
//include::../description.adoc[] | ||
|
||
This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`. | ||
== Why is this an issue? | ||
|
||
FIXME: remove the unused optional headers (that are commented out) | ||
In PyTorch the training loop of a neural network is comprised of a several steps: | ||
* Forward pass, to pass the data through the model and output predictions | ||
* Loss computation, to compute the loss based and the predictions and the actual data | ||
* Backward pass, to compute the gradient loss with the `loss.backward()` method | ||
* Weights update, to update the model weights with the `optimizer.step()` method | ||
* Gradients zeroed out, to prevent the gradients to accumulate with the `optimizer.zero_grad()` method | ||
|
||
When training a model it is important to reset gradients for each training loop. Failing to do so will skew the | ||
results as the update of the model's parameters will be done with the accumulated gradients from the previous iterations. | ||
|
||
//=== What is the potential impact? | ||
|
||
== How to fix it | ||
//== How to fix it in FRAMEWORK NAME | ||
|
||
To fix the issue call the `optimizer.zero_grad()` method. | ||
|
||
=== Code examples | ||
|
||
==== Noncompliant code example | ||
|
||
[source,text,diff-id=1,diff-type=noncompliant] | ||
[source,python,diff-id=1,diff-type=noncompliant] | ||
---- | ||
FIXME | ||
import torch | ||
from my_data import data | ||
loss_fn = torch.nn.CrossEntropyLoss() | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
for epoch in range(100): | ||
for i in range(len(data)): | ||
output = model(data[i]) | ||
loss = loss_fn(output, labels[i]) | ||
loss.backward() | ||
optimizer.step() # Noncompliant: optimizer.zero_grad() was not called in the training loop | ||
---- | ||
|
||
==== Compliant solution | ||
|
||
[source,text,diff-id=1,diff-type=compliant] | ||
[source,python,diff-id=1,diff-type=compliant] | ||
---- | ||
FIXME | ||
import torch | ||
from my_data import data, labels | ||
loss_fn = torch.nn.CrossEntropyLoss() | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
for epoch in range(100): | ||
for i in range(len(data)): | ||
optimizer.zero_grad() | ||
output = model(data[i]) | ||
loss = loss_fn(output, labels[i]) | ||
loss.backward() | ||
optimizer.step() # Compliant | ||
---- | ||
|
||
//=== How does this work? | ||
== Resources | ||
=== Documentation | ||
|
||
* PyTorch Documentation - https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#the-training-loop[The Training Loop] | ||
* PyTorch Documentation - https://pytorch.org/tutorials/recipes/recipes/zeroing_out_gradients.html#zeroing-out-gradients-in-pytorch[Zeroing out gradients in PyTorch] | ||
* PyTorch Documentation - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html#torch-optim-optimizer-zero-grad[torch.optim.Optimizer.zero_grad - reference] | ||
* PyTorch Documentation - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html#torch-optim-optimizer-step[torch.optim.Optimizer.step - reference] | ||
* PyTorch Documentation - https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html#torch-tensor-backward[torch.Tensor.backward - reference] | ||
|
||
|
||
ifdef::env-github,rspecator-view[] | ||
|
||
(visible only on this page) | ||
|
||
== Implementation specification | ||
|
||
Only in a loop if an optimizer.step() is called and loss.backward() is called, we shall raise the issue. | ||
|
||
=== Message | ||
|
||
Primary: Call the {optimizer name}.zero_grad() method | ||
|
||
|
||
=== Issue location | ||
|
||
Primary : The {optimizer name}.step() method | ||
|
||
//=== Pitfalls | ||
=== Quickfix | ||
|
||
//=== Going the extra mile | ||
No | ||
|
||
endif::env-github,rspecator-view[] | ||
|
||
//== Resources | ||
//=== Documentation | ||
//=== Articles & blog posts | ||
//=== Conference presentations | ||
//=== Standards | ||
//=== External coding guidelines | ||
//=== Benchmarks |