Optimize RetinaNet inference time #2799

fmassa · 2020-10-13T11:10:18Z

🚀 Feature

The postprocessing step in RetinaNet is slow, and the whole inference time for RetinaNet is almost twice slower than Faster R-CNN as of today.
In particular,

vision/torchvision/models/detection/retinanet.py

Lines 442 to 471 in 5bb81c8

    
           for class_index in range(num_classes): 
        
               # remove low scoring boxes 
        
               inds = torch.gt(scores_per_image[:, class_index], self.score_thresh) 
        
               boxes_per_class, scores_per_class, labels_per_class = \ 
        
                   boxes_per_image[inds], scores_per_image[inds, class_index], labels_per_image[inds, class_index] 
        
               other_outputs_per_class = [(k, v[inds]) for k, v in other_outputs_per_image] 
        
               # remove empty boxes 
        
               keep = box_ops.remove_small_boxes(boxes_per_class, min_size=1e-2) 
        
               boxes_per_class, scores_per_class, labels_per_class = \ 
        
                   boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep] 
        
               other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class] 
        
               # non-maximum suppression, independently done per class 
        
               keep = box_ops.nms(boxes_per_class, scores_per_class, self.nms_thresh) 
        
               # keep only topk scoring predictions 
        
               keep = keep[:self.detections_per_img] 
        
               boxes_per_class, scores_per_class, labels_per_class = \ 
        
                   boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep] 
        
               other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class] 
        
               image_boxes.append(boxes_per_class) 
        
               image_scores.append(scores_per_class) 
        
               image_labels.append(labels_per_class) 
        
               for k, v in other_outputs_per_class: 
        
                   if k not in image_other_outputs: 
        
                       image_other_outputs[k] = [] 
        
                   image_other_outputs[k].append(v)

does a for loop over the number of classes. This for loop can be parallelized by batching operations together over all classes, which should greatly improve the inference speed.

For reference, Detectron2 has sped up inference on RetinaNet a few times already, with latest optimization present in facebookresearch/detectron2@8999946 , and also batch inference over classes (and only does a for loop on the number of feature maps, which is much smaller than the number of COCO classes)

The text was updated successfully, but these errors were encountered:

fmassa · 2020-10-21T09:03:27Z

This has been implemented by @datumbox in #2828

fmassa added enhancement module: models topic: object detection labels Oct 13, 2020

fmassa self-assigned this Oct 14, 2020

This was referenced Oct 15, 2020

Speed up RetinaNet's postprocess_detections() #2819

Closed

Vectorize RetinaNet's postprocessing #2828

Merged

fmassa closed this as completed Oct 21, 2020

fmassa assigned datumbox and unassigned fmassa Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RetinaNet inference time #2799

Optimize RetinaNet inference time #2799

fmassa commented Oct 13, 2020

fmassa commented Oct 21, 2020

Optimize RetinaNet inference time #2799

Optimize RetinaNet inference time #2799

Comments

fmassa commented Oct 13, 2020

🚀 Feature

fmassa commented Oct 21, 2020