Incorrect anchor computations #1765

hgaiser · 2020-01-17T10:13:35Z

While working on #1697 I was checking the anchor generation and noticed an error in the computation. The centers of the anchors should be at half the size of the stride of the anchors, but the centers are currently at the top left corner.

An example makes this more clear. I have generated anchors for an imaginary image of size (512, 512). The top left anchor with ratio 1 and scale 1 is as follows:

       [-16., -16.,  16.,  16.],

Note that this anchor extends outside of the image by 16 pixels and that its center is at (0, 0).
The last anchors at the bottom right with ratio 1 and scale 1 is as follows:

       [488., 488., 520., 520.],

Note that this anchor extends on the bottom right by 8 pixels, because the image shape is (512, 512). This means that the anchors are not correctly centered on the image. In general, if we shift all anchors by half the stride (the stride is 8 pixels in this case), then we get the following anchors:

       [-12., -12.,  20.,  20.],

       [492., 492., 524., 524.],

In this case both anchors extend 12 pixels beyond the borders of the image.

Note that this problem becomes more severe when the stride gets larger (such as for P7 in retinanet):

Currently:

       [ 128.,  128.,  640.,  640.],

Correct (offset of 64 pixels):

       [ 192., 192.,  704.,  704. ],

I could make a PR to fix this computation, but I wanted to check first if that is desired. The reason is that it would invalidate all existing networks by offsetting the detections by roughly half the stride of the pyramid they are created for. Ideally it would mean retraining those networks, but I suppose that is perhaps too big of an effort.

EDIT: Actually, it's not exactly half the stride. The offset can be computed like this:

    offset_x = (image_shape[1] - (features_shape[1] - 1) * stride) / 2.0
    offset_y = (image_shape[0] - (features_shape[0] - 1) * stride) / 2.0

Where features_shape and stride are the values for the current pyramid level. This just happens to be half the stride in case the shape of the features is a nice factor of the image shape. If you use half the stride for an input shape of (300, 300) for instance, it will be incorrect.

The text was updated successfully, but these errors were encountered:

fmassa · 2020-01-17T10:32:08Z

Hi,

Thanks for opening the issue, this is a great question!

The anchors, while not exactly equal to the original ones from Detectron, yields statistically identical results compared to the ones from Detectron.
Indeed, the anchors in Detectron are more complicated than they should be, and for no real good reason.
In order to simplify and make as clean as possible the current implementation in torchvision, I decided to make a few changes to the model.

Plus, I believe torchvision uses the same type of anchors as detectron2, and from the comment in their code:

This is different from the anchor generator defined in the original Faster R-CNN
code or Detectron. They yield the same AP, however the old version defines cell
anchors in a less natural way with a shift relative to the feature grid and
quantization that results in slightly different sizes for different aspect ratios.
See also facebookresearch/Detectron#227

(which, interestingly, is an issue you created in Detectron :-) )

Given those, and the BC-breaking nature of such change, I think it might be better to keep the current implementation as is.

Let me know what you think

hgaiser · 2020-01-17T10:41:19Z

The anchors, while not exactly equal to the original ones from Detectron, yields statistically identical results compared to the ones from Detectron.

Agreed, I assume the network will learn to compensate for any incorrect creation of anchors (plus, deep learning regression values are a bit fuzzy anyway ;) ).

Indeed, the anchors in Detectron are more complicated than they should be, and for no real good reason.

Hmm I don't necessarily agree that the current implementation makes the anchor generation simpler. It's still pretty complex in my opinion.

Plus, I believe torchvision uses the same type of anchors as detectron2, and from the comment in their code:

Heh cool, didn't know that issue got referenced in the Detectron code :)

(which, interestingly, is an issue you created in Detectron :-) )

To be honest I completely forgot about that!

Given those, and the BC-breaking nature of such change, I think it might be better to keep the current implementation as is.

Let me know what you think

Yes I agree. It is too big of a change and has probably too little of an impact to really matter. I mainly wanted to leave this find here as a sort of paper trail in case anyone was interested. Shall we close the issue?

fmassa · 2020-01-17T10:50:49Z

Hmm I don't necessarily agree that the current implementation makes the anchor generation simpler. It's still pretty complex in my opinion.

Well, comparing the previous implementation with the current one

vision/torchvision/models/detection/rpn.py

Lines 77 to 88 in c5e972a

    
           def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"): 
        
               # type: (List[int], List[float], int, Device)  # noqa: F821 
        
               scales = torch.as_tensor(scales, dtype=dtype, device=device) 
        
               aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device) 
        
               h_ratios = torch.sqrt(aspect_ratios) 
        
               w_ratios = 1 / h_ratios 
        
               ws = (w_ratios[:, None] * scales[None, :]).view(-1) 
        
               hs = (h_ratios[:, None] * scales[None, :]).view(-1) 
        
               base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2 
        
               return base_anchors.round()

seems to me that just for generating the base anchors we went from 70 lines of code and 6 functions to 10 lines of code in a single function, so this seems like a win to me :-)

I mainly wanted to leave this find here as a sort of paper trail in case anyone was interested

Definitely! It's a very valid remark that you did, thanks for raising this point!

Shall we close the issue?

Yes, closing it.

hgaiser · 2020-01-17T13:31:14Z

seems to me that just for generating the base anchors we went from 70 lines of code and 6 functions to 10 lines of code in a single function, so this seems like a win to me :-)

Hah yes, fair enough. What I had in mind was the current implementation compared to what it would be with adjustments to correctly compute the anchors. It is definitely a lot better (cleaner) than the old implementation!

fmassa added module: models needs discussion topic: object detection labels Jan 17, 2020

fmassa closed this as completed Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect anchor computations #1765

Incorrect anchor computations #1765

hgaiser commented Jan 17, 2020 •

edited

Loading

fmassa commented Jan 17, 2020

hgaiser commented Jan 17, 2020

fmassa commented Jan 17, 2020

hgaiser commented Jan 17, 2020

Incorrect anchor computations #1765

Incorrect anchor computations #1765

Comments

hgaiser commented Jan 17, 2020 • edited Loading

fmassa commented Jan 17, 2020

hgaiser commented Jan 17, 2020

fmassa commented Jan 17, 2020

hgaiser commented Jan 17, 2020

hgaiser commented Jan 17, 2020 •

edited

Loading