Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect anchor computations #1765

Closed
hgaiser opened this issue Jan 17, 2020 · 4 comments
Closed

Incorrect anchor computations #1765

hgaiser opened this issue Jan 17, 2020 · 4 comments

Comments

@hgaiser
Copy link
Contributor

hgaiser commented Jan 17, 2020

While working on #1697 I was checking the anchor generation and noticed an error in the computation. The centers of the anchors should be at half the size of the stride of the anchors, but the centers are currently at the top left corner.

An example makes this more clear. I have generated anchors for an imaginary image of size (512, 512). The top left anchor with ratio 1 and scale 1 is as follows:

       [-16., -16.,  16.,  16.],

Note that this anchor extends outside of the image by 16 pixels and that its center is at (0, 0).
The last anchors at the bottom right with ratio 1 and scale 1 is as follows:

       [488., 488., 520., 520.],

Note that this anchor extends on the bottom right by 8 pixels, because the image shape is (512, 512). This means that the anchors are not correctly centered on the image. In general, if we shift all anchors by half the stride (the stride is 8 pixels in this case), then we get the following anchors:

       [-12., -12.,  20.,  20.],
       [492., 492., 524., 524.],

In this case both anchors extend 12 pixels beyond the borders of the image.

Note that this problem becomes more severe when the stride gets larger (such as for P7 in retinanet):

Currently:

       [ 128.,  128.,  640.,  640.],

Correct (offset of 64 pixels):

       [ 192., 192.,  704.,  704. ],

I could make a PR to fix this computation, but I wanted to check first if that is desired. The reason is that it would invalidate all existing networks by offsetting the detections by roughly half the stride of the pyramid they are created for. Ideally it would mean retraining those networks, but I suppose that is perhaps too big of an effort.

EDIT: Actually, it's not exactly half the stride. The offset can be computed like this:

    offset_x = (image_shape[1] - (features_shape[1] - 1) * stride) / 2.0
    offset_y = (image_shape[0] - (features_shape[0] - 1) * stride) / 2.0

Where features_shape and stride are the values for the current pyramid level. This just happens to be half the stride in case the shape of the features is a nice factor of the image shape. If you use half the stride for an input shape of (300, 300) for instance, it will be incorrect.

@fmassa
Copy link
Member

fmassa commented Jan 17, 2020

Hi,

Thanks for opening the issue, this is a great question!

The anchors, while not exactly equal to the original ones from Detectron, yields statistically identical results compared to the ones from Detectron.
Indeed, the anchors in Detectron are more complicated than they should be, and for no real good reason.
In order to simplify and make as clean as possible the current implementation in torchvision, I decided to make a few changes to the model.

Plus, I believe torchvision uses the same type of anchors as detectron2, and from the comment in their code:

This is different from the anchor generator defined in the original Faster R-CNN
code or Detectron. They yield the same AP, however the old version defines cell
anchors in a less natural way with a shift relative to the feature grid and
quantization that results in slightly different sizes for different aspect ratios.
See also facebookresearch/Detectron#227

(which, interestingly, is an issue you created in Detectron :-) )

Given those, and the BC-breaking nature of such change, I think it might be better to keep the current implementation as is.

Let me know what you think

@hgaiser
Copy link
Contributor Author

hgaiser commented Jan 17, 2020

The anchors, while not exactly equal to the original ones from Detectron, yields statistically identical results compared to the ones from Detectron.

Agreed, I assume the network will learn to compensate for any incorrect creation of anchors (plus, deep learning regression values are a bit fuzzy anyway ;) ).

Indeed, the anchors in Detectron are more complicated than they should be, and for no real good reason.

Hmm I don't necessarily agree that the current implementation makes the anchor generation simpler. It's still pretty complex in my opinion.

Plus, I believe torchvision uses the same type of anchors as detectron2, and from the comment in their code:

Heh cool, didn't know that issue got referenced in the Detectron code :)

(which, interestingly, is an issue you created in Detectron :-) )

To be honest I completely forgot about that!

Given those, and the BC-breaking nature of such change, I think it might be better to keep the current implementation as is.

Let me know what you think

Yes I agree. It is too big of a change and has probably too little of an impact to really matter. I mainly wanted to leave this find here as a sort of paper trail in case anyone was interested. Shall we close the issue?

@fmassa
Copy link
Member

fmassa commented Jan 17, 2020

Hmm I don't necessarily agree that the current implementation makes the anchor generation simpler. It's still pretty complex in my opinion.

Well, comparing the previous implementation with the current one

def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"):
# type: (List[int], List[float], int, Device) # noqa: F821
scales = torch.as_tensor(scales, dtype=dtype, device=device)
aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
h_ratios = torch.sqrt(aspect_ratios)
w_ratios = 1 / h_ratios
ws = (w_ratios[:, None] * scales[None, :]).view(-1)
hs = (h_ratios[:, None] * scales[None, :]).view(-1)
base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
return base_anchors.round()
seems to me that just for generating the base anchors we went from 70 lines of code and 6 functions to 10 lines of code in a single function, so this seems like a win to me :-)

I mainly wanted to leave this find here as a sort of paper trail in case anyone was interested

Definitely! It's a very valid remark that you did, thanks for raising this point!

Shall we close the issue?

Yes, closing it.

@fmassa fmassa closed this as completed Jan 17, 2020
@hgaiser
Copy link
Contributor Author

hgaiser commented Jan 17, 2020

seems to me that just for generating the base anchors we went from 70 lines of code and 6 functions to 10 lines of code in a single function, so this seems like a win to me :-)

Hah yes, fair enough. What I had in mind was the current implementation compared to what it would be with adjustments to correctly compute the anchors. It is definitely a lot better (cleaner) than the old implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants