-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utility to draw bounding boxes #2785
Conversation
Current caveats. I need help here @pmeier @fmassa
I would like to know the best way to handle this.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @oke-aditya! About your questions:
- IMO this basically reduces to: do we allow batch processing or not. Given that we cant parallelize this due to our
PIL
usage I'd say we don't allow it. Passing batches would internally mean we use a for loop anyway and thus the user has no real advantage. I think we can simplytry
tosqueeze(0)
if we encounter an image with 4 dimensions and handle the error if this fails. - Yes, we should return the tensor. Why wouldn't we or more importantly: why would a user use this function if he gets no results back?
- It can only be transparent if we use a forth channel for the alpha, i.e. change the image from
RGB
toRGBA
. I'm not sure if this is a good idea. Is it common to fill the bounding boxes? I've never seen it before, which does not mean it does not exist. Otherwise I see no point in adding afill
option. - I agree testing this will not be easy, but I think we can cover some basic stuff. For example you can start with an all white image, draw a bounding box and check if only the pixels you would expect are changed. You can also test the colors with this.
In addition to your questions, I have some other remarks below. The linter is also failing, but lets postpone this until we have the functionality right.
torchvision/utils.py
Outdated
if colors is None: | ||
draw.rectangle(bbox, width=width) | ||
else: | ||
draw.rectangle(bbox, width=width, outline=colors[label]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more clear to set colors = {}
if it is None
and use colors.get(label)
here. With this we would have no branching and would use the default color if a label is not included in colors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colors is optional, and white colored boxes will be drawn if it is None.
Colors is very tricky parameter. PIL will throw error for unsupported color though.
If we have to handle for unsupported color we might need to catch exception from PIL and then revert to default color.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use getrgb()
to parse the colors before we enter the loop and handle the exception there. In general, we should not constrict colors to strings, but also allow int
triplets.
torchvision/utils.py
Outdated
else: | ||
if draw_labels is True: | ||
draw.text((bbox[0], bbox[1]), label_names[int(label)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above: can we maybe use dict.get()
to assign a default label if it is not present in label_names
?
Hey, @pmeier I too gave a thought on above points. Let me summarize a few pointers.
I guess there is too much scope in these improvements, but we should first have |
I agree with 1. - 3. IMO we should include the proper color handling in the original PR. The automatic line with adaptation as well as the label drawing could be done in separate PRs. |
Codecov Report
@@ Coverage Diff @@
## master #2785 +/- ##
==========================================
- Coverage 73.22% 73.13% -0.10%
==========================================
Files 96 96
Lines 8446 8473 +27
Branches 1320 1329 +9
==========================================
+ Hits 6185 6197 +12
- Misses 1859 1868 +9
- Partials 402 408 +6
Continue to review full report at Codecov.
|
@pmeier I have handled the image batch issue now. But there are few doubts. I will try to fix up the colors in this PR. There are lot of edge cases to think through. Below are my thoughts,
Same doubt is about label names.
The problem arises when case 2 in colors and case 2 in label names both occur together. Coz this lead to some unexpected conditions. I feel a user should either give everything needed or avoid at all. We can raise Errors to ensure user passes correct content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @oke-aditya
Thanks for the PR!
I've left a few comments, let me know what you think.
On a more high level, I think we are on the right direction but we should think a bit more on a few aspects of the API so that we can cover most use-cases for the users.
Also, I wouldn't want to hurry this into the 0.8.0 release, as the deadline is very fast approaching, but I think it would be better to have it for next release (0.9.0).
Let me know what you think.
I too think the same. |
Let me write here how this API works with torchvision models right now
|
Yes, let's chat a bit more about this after the release. I'm pretty busy today preparing the branch cut etc so I won't have much time to iterate today, but I gave this function a try and I faced a few issues / problem that I enumerate below:
I'll check back on this PR by the end of the week, but here are a few things I would like us to think about: LabelsNow we need to pass two tensors for printing the labels. What I was originally thinking was to let the user directly specify the labels for each box, so that they can do arbitrary customization (including scores). This way, we don't need to expose a # let the user create the description for each box
description = [f'{CLASSES[cls_id]} : {s:02f}' for cls_id, sin zip(labels, scores)]
# now just pass it to the visualization function
draw_bounding_boxes(images, boxes, description I'm not sure how much of a hassle for the user it would be to do this one-liner, but at least it makes things more generic. Thoughts? |
Let's not hurry over this. We can work on this after the release 😄
For the labels API which you proposed, I think the following.
I guess trade-off question that, is this API for plug and play to torchvision models or a generic one ❓ I had in my mind it is plug and play (hence supporting image with 4 dims, allow to pass detection model outputs). Currently. I guess there is lot to discuss over this (maybe I have misunderstood something). Lets catch this up after |
Hi @oke-aditya Sorry for the delay in getting back to you. In general, I still think that it is preferable to have a slightly more generic function, even if it requires the user to be a bit more verbose. I think it's an ok trade-off to be made, as it would allow more use-cases for the function. The function would become something like If you don't have enough bandwidth to work on this refactoring for now it's ok, @datumbox agreed to help and to build on top of your PR so that we can get this functionality merged in torchvision soon. Otherwise @datumbox will help co-design / review this PR. Let us know what you think. |
Hi @fmassa. I think this will take some significant changes, refactoring. I think @datumbox will have something great in his mind. Two people handling this might slow down it. |
hi @oke-aditya, it's completely up to you! We think that this PR is useful and we would like to merge it soon. If you think you have the time to make the changes discussed above, I'm happy to support. Else I'll take your PR and make the necessary changes, so that we can merge it ASAP. Just don't close your branch because I plan to make the changes in-place. Let me know! :) |
Looks like all of us want this 😄 . @datumbox You can go ahead 🚀 . I won't close or delete this branch. Let me know if you need access to this fork, etc. Super Eager to see this PR getting to master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've made a few comments, let me know what you think
torchvision/utils.py
Outdated
colors: Optional[List[str]] = None, | ||
labels: Optional[List[str]] = None, | ||
width: int = 1, | ||
font: Optional[ImageFont] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we won't be using this function in torchscript, I'm ok having the input type of the function to be PIL-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly excited about this TBH:
- On one hand the method receives a uint8 tensor as input (not a PIL image) and hides completely any dependency on PIL. I would agree with earlier comments of yours that it's a bit odd that we expose ImageFont here.
- On the other hand, using PIL's ImageFont gives the flexibility to the user to do whatever they want without having to deal on our side with the details on how to instantiate the object. It's surely is ugly though and makes for a weird API.
I could try to create a font
parameter similar to PIL with description "A filename or file-like object containing a TrueType font." and a font_size
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look on the latest commit for an alternative to passing ImageFont. We can choose any of the two options, I'm OK with both.
test/test_utils.py
Outdated
boxes = torch.tensor([[0, 0, 100, 100], [0, 0, 0, 0], | ||
[10, 15, 30, 35], [23, 35, 93, 95]], dtype=torch.float) | ||
labels = ['a', 'b', 'c', 'd'] | ||
utils.draw_bounding_boxes(img, boxes, labels=labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test checking that the color of the output image at pixel values out[:, 0, 0:100] == fillcolor
etc, so that we know that we are masking the correct pixels in the image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should test pixels but I would rather test all functionalities including labels, fonts etc. I wonder if that's possible or if it will crate a flaky test due to differences on fonts across platforms. I'll give a try to test what I proposed on the earlier comment and see if this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See latest code for the proposed approach of testing.
def set_rng_seed(seed): | ||
torch.manual_seed(seed) | ||
random.seed(seed) | ||
np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was originally done on an intermediate commit where I was producing a random image and had to fix the seed. Though I switched to non-random to reduce the size, I think it's a good idea to move this method from test_models.py
to commont_utils.py
, so I kept the change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks a lot!
if not os.path.exists(path): | ||
Image.fromarray(result.permute(1, 2, 0).numpy()).save(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any particular reason why you use PIL to save the result, and not write_image
? Although this is not really important as the file is committed to the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is worth changing.
draw.rectangle(bbox, width=width, outline=color) | ||
|
||
if labels is not None: | ||
txt_font = ImageFont.load_default() if font is None else ImageFont.truetype(font=font, size=font_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for a follow-up PR: we can move this to outside of the for loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this can move outside of the loop.
Lots of thanks to @oke-aditya and @sumanthratna for their thorough investigations and contributions on the final API and implementation. |
That's so kind of you @datumbox . Not much from me, it was your great work to get this done. |
* initital prototype * flake * Adds documentation * minimal working bboxes * Adds label display * adds colors :-) * adds suggestions and fixes CI * handles image of dim 4 * fixes image handling * removes dev file * adds suggested changes * Updating the API. * Update test. * Implementing code review improvements. * Further refactoring and adding test. * Replace random to white to reduce size and change font on tests. Co-authored-by: Vasilis Vryniotis <[email protected]>
Closes #2556 Supersedes #2631
As per the new API discussion, I will make this compatible with output of detection models.
This will be compatible with only VOC format boxes as this is our default for Input and Output in torchvision.
Users can convert the boxes using the new
box_convert
function and pass. (We can fix this internally too but let's leave it for now)Will try to get this in before October release 😃