-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect anchor computations #1765
Comments
Hi, Thanks for opening the issue, this is a great question! The anchors, while not exactly equal to the original ones from Detectron, yields statistically identical results compared to the ones from Detectron. Plus, I believe torchvision uses the same type of anchors as detectron2, and from the comment in their code:
(which, interestingly, is an issue you created in Detectron :-) ) Given those, and the BC-breaking nature of such change, I think it might be better to keep the current implementation as is. Let me know what you think |
Agreed, I assume the network will learn to compensate for any incorrect creation of anchors (plus, deep learning regression values are a bit fuzzy anyway ;) ).
Hmm I don't necessarily agree that the current implementation makes the anchor generation simpler. It's still pretty complex in my opinion.
Heh cool, didn't know that issue got referenced in the Detectron code :)
To be honest I completely forgot about that!
Yes I agree. It is too big of a change and has probably too little of an impact to really matter. I mainly wanted to leave this find here as a sort of paper trail in case anyone was interested. Shall we close the issue? |
Well, comparing the previous implementation with the current one vision/torchvision/models/detection/rpn.py Lines 77 to 88 in c5e972a
Definitely! It's a very valid remark that you did, thanks for raising this point!
Yes, closing it. |
Hah yes, fair enough. What I had in mind was the current implementation compared to what it would be with adjustments to correctly compute the anchors. It is definitely a lot better (cleaner) than the old implementation! |
While working on #1697 I was checking the anchor generation and noticed an error in the computation. The centers of the anchors should be at half the size of the stride of the anchors, but the centers are currently at the top left corner.
An example makes this more clear. I have generated anchors for an imaginary image of size
(512, 512)
. The top left anchor with ratio 1 and scale 1 is as follows:Note that this anchor extends outside of the image by 16 pixels and that its center is at
(0, 0)
.The last anchors at the bottom right with ratio 1 and scale 1 is as follows:
Note that this anchor extends on the bottom right by 8 pixels, because the image shape is
(512, 512)
. This means that the anchors are not correctly centered on the image. In general, if we shift all anchors by half the stride (the stride is 8 pixels in this case), then we get the following anchors:In this case both anchors extend 12 pixels beyond the borders of the image.
Note that this problem becomes more severe when the stride gets larger (such as for P7 in retinanet):
Currently:
Correct (offset of 64 pixels):
I could make a PR to fix this computation, but I wanted to check first if that is desired. The reason is that it would invalidate all existing networks by offsetting the detections by roughly half the stride of the pyramid they are created for. Ideally it would mean retraining those networks, but I suppose that is perhaps too big of an effort.
EDIT: Actually, it's not exactly half the stride. The offset can be computed like this:
Where
features_shape
andstride
are the values for the current pyramid level. This just happens to be half the stride in case the shape of the features is a nice factor of the image shape. If you use half the stride for an input shape of(300, 300)
for instance, it will be incorrect.The text was updated successfully, but these errors were encountered: