Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OD docs with clarified output formats #1348

Merged
merged 6 commits into from
Aug 7, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 129 additions & 5 deletions documentation/source/ObjectDetection.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,136 @@ In SuperGradients, we aim to collect such models and make them very convenient a

## Implemented models

| Model | Yaml | Model class | Loss Class | NMS Callback |
|--------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [SSD](https://arxiv.org/abs/1512.02325) | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss) | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback) |
| [YOLOX](https://arxiv.org/abs/2107.08430) | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml) | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py) | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback) |
| [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml) | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |
| Model | Yaml | Model class | Loss Class | NMS Callback |
|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [SSD](https://arxiv.org/abs/1512.02325) | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss) | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback) |
| [YOLOX](https://arxiv.org/abs/2107.08430) | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml) | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py) | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback) |
| [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml) | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |
| YoloNAS | [yolo_nas_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/recipes/arch_params/yolo_nas_s_arch_params.yaml) | [Yolo NAS S](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py#L16) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) |

## Understanding model's predictions
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

This section covers what is the output of each model class in train, eval and tracing modes.
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved
Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models.
That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model.
Mixing them with other models will result in an error.

It is important to understand the output of the model class in order to use it correctly in the training process and especially
if you are going to use the model's prediction in a custom callback or loss.


### YoloX
#### Training mode

In training mode, YoloX returns a list of 3 tensors that contains the intermediates required for the loss calculation.
They correspond to output feature maps of the prediction heads:
- Output feature map at index 0: `[B, 1, H/8, W/8, C + 5]`
- Output feature map at index 1: `[B, 1, H/16, W/16, C + 5]`
- Output feature map at index 2: `[B, 1, H/32, W/32, C + 5]`

Value `C` corresponds to the number of classes in the dataset.
And remaining `5`elements are box coordinates and objectness score.
Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]`
Box regression in these outputs are NOT in pixel coordinates.
X and Y coordinates are normalized coordinates.
Width and height values are the power factor for the base of `e`

`raw_predictions_0, raw_predictions_1, raw_predictions_2 = yolo_x_model(images)`
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

In this mode, predictions decoding is not performed.

#### Eval mode

In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates.

`predictions, (raw_predictions_0, raw_predictions_1, raw_predictions_2) = yolo_x_model(images)`

`predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps.

The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`.
Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`.

#### Tracing mode
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

Same as in Eval mode.


### PPYolo-E
#### Training mode

In training mode, PPYoloE returns a tuple of 6 tensors that contains the intermediates required for the loss calculation.
You can access individual components of the model's output using the following snippet:

`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)`

They are as follows:
* `cls_score_list` - `[B, num_anchors, num_classes]`
* `reg_distri_list` - `[B, num_anchors, num_regression_dims]`
* `anchors` - `[num_anchors, 4]`
* `anchor_points` - `[num_anchors, 2]`
* `num_anchors_list` - `[num_anchors]`
* `stride_tensor` - `[num_anchors]`

In this mode, predictions decoding is not performed.

#### Eval mode

In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode:

`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)`

New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows:

* `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates
* `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box

Please note that box predictions are not clipped and may extend beyond the image boundaries.
Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play.

#### Tracing mode

In tracing mode, Yolo-NAS returns only decoded predictions:

`pred_bboxes, pred_scores = yolo_nas_model(images)`

### Yolo NAS
#### Training mode

In training mode, Yolo-NAS returns a tuple of 6 tensors that contains the intermediates required for the loss calculation.
You can access individual components of the model's output using the following snippet:

`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)`

They are as follows:
* `cls_score_list` - `[B, num_anchors, num_classes]`
* `reg_distri_list` - `[B, num_anchors, num_regression_dims]`
* `anchors` - `[num_anchors, 4]`
* `anchor_points` - `[num_anchors, 2]`
* `num_anchors_list` - `[num_anchors]`
* `stride_tensor` - `[num_anchors]`

In this mode, predictions decoding is not performed.


#### Eval mode

In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode:
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)`

New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows:

* `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates
* `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box

Please note that box predictions are not clipped and may extend beyond the image boundaries.
Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play.

#### Tracing mode
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

In tracing mode, Yolo-NAS returns only decoded predictions:
BloodAxe marked this conversation as resolved.
Show resolved Hide resolved

`pred_bboxes, pred_scores = yolo_nas_model(images)`

## Training

Expand Down