Deci-AI · BloodAxe · Aug 7, 2023 · Aug 7, 2023 · Aug 7, 2023 · Aug 7, 2023
@@ -10,12 +10,136 @@ In SuperGradients, we aim to collect such models and make them very convenient a
 
 ## Implemented models
 
-| Model                                      | Yaml                                                                                                                                                                     | Model class                                                                                                                              | Loss Class                                                                                                                                  | NMS Callback                                                                                                                                                                                       |
-|--------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 
-| [SSD](https://arxiv.org/abs/1512.02325)    | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss)                                 | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback)                                                           |
-| [YOLOX](https://arxiv.org/abs/2107.08430)  | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml)                           | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py)          | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback)                              |
-| [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml)                           | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE)  | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss)                      | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |
+| Model                                        | Yaml                                                                                                                                                                     | Model class                                                                                                                              | Loss Class                                                                                                                                  | NMS Callback                                                                                                                                                                                       |
+|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 
+| [SSD](https://arxiv.org/abs/1512.02325)      | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss)                                 | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback)                                                           |
+| [YOLOX](https://arxiv.org/abs/2107.08430)    | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml)                           | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py)          | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback)                              |
+| [PPYolo](https://arxiv.org/abs/2007.12099)   | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml)                           | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE)  | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss)                      | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |
+| YoloNAS                                      | [yolo_nas_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/recipes/arch_params/yolo_nas_s_arch_params.yaml) | [Yolo NAS S](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py#L16) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss)                      | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |
 
+## Understanding model's predictions
+
+This section covers what is the output of each model class in train, eval and tracing modes.
+Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models.
+That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model. 
+Mixing them with other models will result in an error.
+
+It is important to understand the output of the model class in order to use it correctly in the training process and especially
+if you are going to use the model's prediction in a custom callback or loss.
+
+
+### YoloX
+#### Training mode
+
+In training mode, YoloX returns a list of 3 tensors that contains the intermediates required for the loss calculation.
+They correspond to output feature maps of the prediction heads:
+- Output feature map at index 0: `[B, 1, H/8, W/8, C + 5]`
+- Output feature map at index 1: `[B, 1, H/16, W/16, C + 5]`
+- Output feature map at index 2: `[B, 1, H/32, W/32, C + 5]`
+
+Value `C` corresponds to the number of classes in the dataset. 
+And remaining `5`elements are box coordinates and objectness score.
+Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]`
+Box regression in these outputs are NOT in pixel coordinates.
+X and Y coordinates are normalized coordinates.
+Width and height values are the power factor for the base of `e`
+
+`raw_predictions_0, raw_predictions_1, raw_predictions_2 = yolo_x_model(images)`
+
+In this mode, predictions decoding is not performed.
+
+#### Eval mode
+
+In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates.
+
+`predictions, (raw_predictions_0, raw_predictions_1, raw_predictions_2) = yolo_x_model(images)`
+
+`predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps.
+
+The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`.
+Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`.
+
+#### Tracing mode
+
+Same as in Eval mode.
+
+
+### PPYolo-E
+#### Training mode
+
+In training mode, PPYoloE returns a tuple of 6 tensors that contains the intermediates required for the loss calculation.
+You can access individual components of the model's output using the following snippet:
+
+`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)`
+
+They are as follows:
+  * `cls_score_list` - `[B, num_anchors, num_classes]`
+  * `reg_distri_list` - `[B, num_anchors, num_regression_dims]`
+  * `anchors` - `[num_anchors, 4]`
+  * `anchor_points` - `[num_anchors, 2]`
+  * `num_anchors_list` - `[num_anchors]`
+  * `stride_tensor` - `[num_anchors]`
+
+In this mode, predictions decoding is not performed.
+
+#### Eval mode
+
+In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode:
+
+`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)`
+
+New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows:
+
+  * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates
+  * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box
+
+Please note that box predictions are not clipped and may extend beyond the image boundaries.
+Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play.
+
+#### Tracing mode
+
+In tracing mode, Yolo-NAS returns only decoded predictions:
+
+`pred_bboxes, pred_scores = yolo_nas_model(images)`
+
+### Yolo NAS
+#### Training mode
+
+In training mode, Yolo-NAS returns a tuple of 6 tensors that contains the intermediates required for the loss calculation.
+You can access individual components of the model's output using the following snippet:
+
+`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)`
+
+They are as follows:
+  * `cls_score_list` - `[B, num_anchors, num_classes]`
+  * `reg_distri_list` - `[B, num_anchors, num_regression_dims]`
+  * `anchors` - `[num_anchors, 4]`
+  * `anchor_points` - `[num_anchors, 2]`
+  * `num_anchors_list` - `[num_anchors]`
+  * `stride_tensor` - `[num_anchors]`
+
+In this mode, predictions decoding is not performed.
+
+
+#### Eval mode
+
+In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode:
+
+`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)`
+
+New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows:
+
+  * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates
+  * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box
+
+Please note that box predictions are not clipped and may extend beyond the image boundaries.
+Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play.
+
+#### Tracing mode
+
+In tracing mode, Yolo-NAS returns only decoded predictions:
+
+`pred_bboxes, pred_scores = yolo_nas_model(images)`
 
 ## Training