Refactor grid default boxes with torch meshgrid #3799

zhiqwang · 2021-05-08T23:04:11Z

As mentioned in #3403 (comment) , I refactor the for-loop default boxes generation as following with torch.meshgrid.

vision/torchvision/models/detection/anchor_utils.py

Lines 209 to 223 in 5339e63

    
           for k, f_k in enumerate(grid_sizes): 
        
               # Now add the default boxes for each width-height pair 
        
               for j in range(f_k[0]): 
        
                   if self.steps is not None: 
        
                       y_f_k = image_size[1] / self.steps[k] 
        
                   else: 
        
                       y_f_k = float(f_k[0]) 
        
                   cy = (j + 0.5) / y_f_k 
        
                   for i in range(f_k[1]): 
        
                       if self.steps is not None: 
        
                           x_f_k = image_size[0] / self.steps[k] 
        
                       else: 
        
                           x_f_k = float(f_k[1]) 
        
                       cx = (i + 0.5) / x_f_k 
        
                       default_boxes.extend([[cx, cy, w, h] for w, h in self._wh_pairs[k]])

I tested the consumption time of torch.meshgrid and for-loop methods locally using torch.utils.benchmark.Timer (Just test the default_boxes generation part).

On CUDA device:

for-loop (Currently used):
  2.11 ms
  1 measurement, 100 runs , 1 thread

torch.meshgrid:
  1.61 ms
  1 measurement, 100 runs , 1 thread

On CPU device:

for-loop (Currently used):
  1.90 ms
  1 measurement, 100 runs , 1 thread

torch.meshgrid:
  1.17 ms
  1 measurement, 100 runs , 1 thread

EDIT: The parameters I am using is below:

device = torch.device('cuda')

# Define the DefaultBoxGenerator for VGG16
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
scales = [0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05]
steps = [8, 16, 32, 64, 100, 300]
dbox_generator = DefaultBoxGenerator(aspect_ratios, scales=scales, steps=steps)
dbox_generator = dbox_generator.to(device)
dbox_generator.eval()

# Forward of DefaultBoxGenerator
images = torch.zeros(2, 3, 300, 300).to(device)
feature_shapes =  [(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
features = [torch.zeros(2, 8, fx, fy).to(device) for (fx, fy) in feature_shapes]
image_shapes = [i.shape[-2:] for i in images]
images = ImageList(images, image_shapes)

grid_sizes = [feature.shape[-2:] for feature in features]
image_size = images.tensors.shape[-2:]
dtype = features[0].dtype

default_boxes = dbox_generator.grid_default_boxes(grid_sizes, image_size, dtype=dtype)

cc @fmassa @datumbox

torchvision/models/detection/anchor_utils.py

zhiqwang · 2021-05-09T01:36:37Z

torchvision/models/detection/anchor_utils.py

+                y_f_k, x_f_k = f_k
+
+            shifts_x = (torch.arange(0, f_k[1], dtype=dtype) + 0.5) / x_f_k
+            shifts_y = (torch.arange(0, f_k[0], dtype=dtype) + 0.5) / y_f_k


Actually the default_boxes are generated on the CPU device and then migrated to the CUDA device. I've tried the following method to generate the default_boxes directly on CUDA device, but It will take longer than the for-loop method.

shifts_x = (torch.arange(0, f_k[1], device=torch.device('cuda'), dtype=dtype) + 0.5) / x_f_k shifts_y = (torch.arange(0, f_k[0], device=torch.device('cuda'), dtype=dtype) + 0.5) / y_f_k

I've hit cases in the past where micro-benchmarks on exactly this part of the code could be faster if running on the CPU, but would present significant slowdowns when training on multiple GPUs. Even if this might be slower on micro-benchmarks if run on a single GPU, it might still be faster on multiple GPUs

@fmassa Thus you recommend passing the target device to this method and putting them right away in there, right?

We can leave it as is for now, but I would create a follow-up issue to benchmark this and the other configuration on multiple GPUs to verify

FYI, I've tested the inferring consumption time of the total COCO eval datasets betweed this two default boxes generations methods on different device, the consumption time of these two is very similar.

Validated with (using one card):

CUDA_VISIBLE_DEVICES=0 python train.py --dataset coco --model ssd300_vgg16 \ --batch-size 16 --pretrained --test-only

fmassa

The overall approach looks great to me, thanks @zhiqwang !

I'll let @datumbox double-check the rest of the logic in the function, but I've made a few minor comments that could potentially help with the speed as well

fmassa · 2021-05-10T10:28:48Z

torchvision/models/detection/anchor_utils.py

+                y_f_k, x_f_k = f_k
+
+            shifts_x = (torch.arange(0, f_k[1], dtype=dtype) + 0.5) / x_f_k
+            shifts_y = (torch.arange(0, f_k[0], dtype=dtype) + 0.5) / y_f_k


I've hit cases in the past where micro-benchmarks on exactly this part of the code could be faster if running on the CPU, but would present significant slowdowns when training on multiple GPUs. Even if this might be slower on micro-benchmarks if run on a single GPU, it might still be faster on multiple GPUs

torchvision/models/detection/anchor_utils.py

Co-authored-by: Francisco Massa <[email protected]>

datumbox

@zhiqwang Thanks a lot for the PR!

I confirmed that using your changes, the statistics of the model remain the same. I left a couple of comments for your consideration but overall it looks good to me. Let me know what you think.

torchvision/models/detection/anchor_utils.py

datumbox · 2021-05-10T14:56:46Z

torchvision/models/detection/anchor_utils.py

+                y_f_k, x_f_k = f_k
+
+            shifts_x = (torch.arange(0, f_k[1], dtype=dtype) + 0.5) / x_f_k
+            shifts_y = (torch.arange(0, f_k[0], dtype=dtype) + 0.5) / y_f_k


@fmassa Thus you recommend passing the target device to this method and putting them right away in there, right?

torchvision/models/detection/anchor_utils.py

Co-authored-by: Vasilis Vryniotis <[email protected]>

torchvision/models/detection/anchor_utils.py

datumbox

LGTM! Thanks for the improvement @zhiqwang!

fmassa · 2021-05-11T09:21:18Z

Thanks a ton @zhiqwang !

zhiqwang · 2021-05-11T09:25:25Z

Hi @fmassa and @datumbox , thanks also for your guidance here!

datumbox · 2021-05-11T10:30:49Z

@zhiqwang I'm afraid that the PR might have broke a branch that is not currently taken on master but it's needed for follow up work. Could you please have a look?
https://app.circleci.com/pipelines/github/pytorch/vision/7996/workflows/f8ff236b-a8d8-4dca-911c-ab8bc3226aa2/jobs/569327

Edit: To avoid the back-and-forth I tried the following fix directly on the other PR. Please have a look and let me know what you think: ea1e2c4

model_name = 'ssd512_resnet50', dev = device(type='cuda')

    @pytest.mark.parametrize('model_name', get_available_detection_models())
    @pytest.mark.parametrize('dev', _devs)
    def test_detection_model(model_name, dev):
>       ModelTester()._test_detection_model(model_name, dev)

test/test_models.py:452: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/test_models.py:223: in _test_detection_model
    out = model(model_input)
env/lib/python3.8/site-packages/torch/nn/modules/module.py:1015: in _call_impl
    return forward_call(*input, **kwargs)
torchvision/models/detection/ssd.py:335: in forward
    anchors = self.anchor_generator(images, features)
env/lib/python3.8/site-packages/torch/nn/modules/module.py:1015: in _call_impl
    return forward_call(*input, **kwargs)
torchvision/models/detection/anchor_utils.py:239: in forward
    default_boxes = self._grid_default_boxes(grid_sizes, image_size, dtype=dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DefaultBoxGenerator(aspect_ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]], clip=True, scales=[0.07, 0.15, 0.33, 0.51, 0.69, 0.87, 1.05], steps=None)
grid_sizes = [torch.Size([32, 32]), torch.Size([16, 16]), torch.Size([8, 8]), torch.Size([4, 4]), torch.Size([2, 2]), torch.Size([1, 1])]
image_size = torch.Size([512, 512]), dtype = torch.float16

    def _grid_default_boxes(self, grid_sizes: List[List[int]], image_size: List[int],
                            dtype: torch.dtype = torch.float32) -> Tensor:
        default_boxes = []
        for k, f_k in enumerate(grid_sizes):
            # Now add the default boxes for each width-height pair
            if self.steps is not None:
                x_f_k, y_f_k = [img_shape / self.steps[k] for img_shape in image_size]
            else:
                y_f_k, x_f_k = f_k
    
>           shifts_x = (torch.arange(0, f_k[1], dtype=dtype) + 0.5) / x_f_k
E           RuntimeError: "arange_cpu" not implemented for 'Half'

torchvision/models/detection/anchor_utils.py:209: RuntimeError

Summary: * Refactor grid default boxes with torch.meshgrid * Fix torch jit tracing * Only doing the list multiplication once * Make grid_default_box private as suggested * Replace list multiplication with torch.repeat * Move the clipping into _grid_default_boxes to accelerate Reviewed By: datumbox Differential Revision: D28473335 fbshipit-source-id: 0e6705ecb9a80fc78213ac1e6da1b02eb3381652 Co-authored-by: Francisco Massa <[email protected]> Co-authored-by: Vasilis Vryniotis <[email protected]> Co-authored-by: Francisco Massa <[email protected]> Co-authored-by: Vasilis Vryniotis <[email protected]>

Refactor grid default boxes with torch.meshgrid

1ac87d7

facebook-github-bot added the cla signed label May 8, 2021

Fix torch jit tracing

422f2c2

zhiqwang commented May 9, 2021

View reviewed changes

torchvision/models/detection/anchor_utils.py Show resolved Hide resolved

zhiqwang commented May 9, 2021

View reviewed changes

datumbox self-requested a review May 9, 2021 15:14

fmassa reviewed May 10, 2021

View reviewed changes

Only doing the list multiplication once

c7a0872

Co-authored-by: Francisco Massa <[email protected]>

datumbox reviewed May 10, 2021

View reviewed changes

Make grid_default_box private as suggested

aef2479

Co-authored-by: Vasilis Vryniotis <[email protected]>

datumbox added improvement module: models labels May 10, 2021

Replace list multiplication with torch.repeat

bbb010a

zhiqwang commented May 10, 2021

View reviewed changes

torchvision/models/detection/anchor_utils.py Outdated Show resolved Hide resolved

zhiqwang added 3 commits May 11, 2021 03:09

Merge branch 'master' into refactor-grid-boxes

03c4d3f

Move the clipping into _grid_default_boxes to accelerate

2fc211d

Merge branch 'master' into refactor-grid-boxes

c47c1de

datumbox approved these changes May 11, 2021

View reviewed changes

datumbox merged commit 48441cc into pytorch:master May 11, 2021

zhiqwang deleted the refactor-grid-boxes branch May 11, 2021 09:25

datumbox added enhancement and removed improvement labels Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor grid default boxes with torch meshgrid #3799

Refactor grid default boxes with torch meshgrid #3799

zhiqwang commented May 8, 2021 •

edited

Loading

zhiqwang May 9, 2021 •

edited

Loading

fmassa May 10, 2021

datumbox May 10, 2021

fmassa May 10, 2021

zhiqwang May 11, 2021 •

edited

Loading

fmassa left a comment

fmassa May 10, 2021

datumbox left a comment

datumbox May 10, 2021

datumbox left a comment

fmassa commented May 11, 2021

zhiqwang commented May 11, 2021

datumbox commented May 11, 2021 •

edited

Loading

	for k, f_k in enumerate(grid_sizes):
	# Now add the default boxes for each width-height pair
	for j in range(f_k[0]):
	if self.steps is not None:
	y_f_k = image_size[1] / self.steps[k]
	else:
	y_f_k = float(f_k[0])
	cy = (j + 0.5) / y_f_k
	for i in range(f_k[1]):
	if self.steps is not None:
	x_f_k = image_size[0] / self.steps[k]
	else:
	x_f_k = float(f_k[1])
	cx = (i + 0.5) / x_f_k
	default_boxes.extend([[cx, cy, w, h] for w, h in self._wh_pairs[k]])

Refactor grid default boxes with torch meshgrid #3799

Refactor grid default boxes with torch meshgrid #3799

Conversation

zhiqwang commented May 8, 2021 • edited Loading

zhiqwang May 9, 2021 • edited Loading

Choose a reason for hiding this comment

fmassa May 10, 2021

Choose a reason for hiding this comment

datumbox May 10, 2021

Choose a reason for hiding this comment

fmassa May 10, 2021

Choose a reason for hiding this comment

zhiqwang May 11, 2021 • edited Loading

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

fmassa May 10, 2021

Choose a reason for hiding this comment

datumbox left a comment

Choose a reason for hiding this comment

datumbox May 10, 2021

Choose a reason for hiding this comment

datumbox left a comment

Choose a reason for hiding this comment

fmassa commented May 11, 2021

zhiqwang commented May 11, 2021

datumbox commented May 11, 2021 • edited Loading

zhiqwang commented May 8, 2021 •

edited

Loading

zhiqwang May 9, 2021 •

edited

Loading

zhiqwang May 11, 2021 •

edited

Loading

datumbox commented May 11, 2021 •

edited

Loading