Create rule S6986: "optimizer.zero_grad()" should be used in conjunct…

…ion with "optimizer.step()" and "loss.backward()"
SonarSource · Jun 6, 2024 · 9e9e5e2 · 9e9e5e2
1 parent 7549e7c
commit 9e9e5e2
Show file tree

Hide file tree

Showing 2 changed files with 15 additions and 12 deletions.
diff --git a/rules/S6986/python/metadata.json b/rules/S6986/python/metadata.json
@@ -1,12 +1,14 @@
 {
-  "title": "FIXME",
+  "title": "\"optimizer.zero_grad()\" should be used in conjunction with \"optimizer.step()\" and \"loss.backward()\"",
   "type": "CODE_SMELL",
   "status": "ready",
   "remediation": {
     "func": "Constant\/Issue",
-    "constantCost": "5min"
+    "constantCost": "1min"
   },
   "tags": [
+    "pytorch",
+    "machine-learning"
   ],
   "defaultSeverity": "Major",
   "ruleSpecification": "RSPEC-6986",
@@ -16,9 +18,7 @@
   "quickfix": "unknown",
   "code": {
     "impacts": {
-      "MAINTAINABILITY": "HIGH",
-      "RELIABILITY": "MEDIUM",
-      "SECURITY": "LOW"
+      "RELIABILITY": "HIGH"
     },
     "attribute": "CONVENTIONAL"
   }

diff --git a/rules/S6986/python/rule.adoc b/rules/S6986/python/rule.adoc
@@ -1,13 +1,16 @@
-FIXME: add a description
-
-// If you want to factorize the description uncomment the following line and create the file.
-//include::../description.adoc[]
-
+This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`.
 == Why is this an issue?
 
-FIXME: remove the unused optional headers (that are commented out)
+In PyTorch a training loop is comprised of a several steps: 
+* Forward pass, to pass the data through the model and output predictions
+* Loss computation, to compute the loss based and the predictions and the actual data
+* Backward pass, to compute the gradient loss with the `loss.backward()` method
+* Weights update, to update the model weights with the `optimizer.step()` method
+* Gradients zeroed out, to prevent the gradients to accumulate with the `optimizer.zero_grad()` method
+
+When training a model it is important to reset gradients for each training loop. Failing to do so will skew the 
+results as the update of the model's parameter will be done with the accumulated gradients from the previous iterations.
 
-//=== What is the potential impact?
 
 == How to fix it
 //== How to fix it in FRAMEWORK NAME