Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RetinaNet inference time #2799

Closed
fmassa opened this issue Oct 13, 2020 · 1 comment
Closed

Optimize RetinaNet inference time #2799

fmassa opened this issue Oct 13, 2020 · 1 comment

Comments

@fmassa
Copy link
Member

fmassa commented Oct 13, 2020

🚀 Feature

The postprocessing step in RetinaNet is slow, and the whole inference time for RetinaNet is almost twice slower than Faster R-CNN as of today.
In particular,

for class_index in range(num_classes):
# remove low scoring boxes
inds = torch.gt(scores_per_image[:, class_index], self.score_thresh)
boxes_per_class, scores_per_class, labels_per_class = \
boxes_per_image[inds], scores_per_image[inds, class_index], labels_per_image[inds, class_index]
other_outputs_per_class = [(k, v[inds]) for k, v in other_outputs_per_image]
# remove empty boxes
keep = box_ops.remove_small_boxes(boxes_per_class, min_size=1e-2)
boxes_per_class, scores_per_class, labels_per_class = \
boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep]
other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class]
# non-maximum suppression, independently done per class
keep = box_ops.nms(boxes_per_class, scores_per_class, self.nms_thresh)
# keep only topk scoring predictions
keep = keep[:self.detections_per_img]
boxes_per_class, scores_per_class, labels_per_class = \
boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep]
other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class]
image_boxes.append(boxes_per_class)
image_scores.append(scores_per_class)
image_labels.append(labels_per_class)
for k, v in other_outputs_per_class:
if k not in image_other_outputs:
image_other_outputs[k] = []
image_other_outputs[k].append(v)
does a for loop over the number of classes. This for loop can be parallelized by batching operations together over all classes, which should greatly improve the inference speed.

For reference, Detectron2 has sped up inference on RetinaNet a few times already, with latest optimization present in facebookresearch/detectron2@8999946 , and also batch inference over classes (and only does a for loop on the number of feature maps, which is much smaller than the number of COCO classes)

@fmassa
Copy link
Member Author

fmassa commented Oct 21, 2020

This has been implemented by @datumbox in #2828

@fmassa fmassa closed this as completed Oct 21, 2020
@fmassa fmassa assigned datumbox and unassigned fmassa Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants