Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility to draw bounding boxes #2785

Merged
merged 23 commits into from
Nov 27, 2020
Merged

Conversation

oke-aditya
Copy link
Contributor

@oke-aditya oke-aditya commented Oct 10, 2020

Closes #2556 Supersedes #2631

As per the new API discussion, I will make this compatible with output of detection models.
This will be compatible with only VOC format boxes as this is our default for Input and Output in torchvision.
Users can convert the boxes using the new box_convert function and pass. (We can fix this internally too but let's leave it for now)

Will try to get this in before October release 😃

  • Adds code
  • Adds docs
  • Adds tests

@oke-aditya oke-aditya changed the title [WIP] Adds utlity to draw bounding boxes [WIP] Adds utility to draw bounding boxes Oct 10, 2020
@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 11, 2020

Got it working with Faster RCNN outputs 😄

It supports different colors too.

image

I have implemented this fully, all the parameters are supported as in function decleration.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 11, 2020

Current caveats. I need help here @pmeier @fmassa

  1. Image to be passed should be (C, H, W) and not (B, C, H, W). (B, C, H, W) is the input format for detection models in eval() mode.

I would like to know the best way to handle this.
Should we ask the user to pass list of tensors ? Or simply do squeeze(0) and allow to pass (1, C, H, W) ?
Or we process the entire batch of images (slow operation can be an issue then) ?

  1. Should we return the image drawn tensor? I'm not sure about why we are doing so.

  2. If the user wants to fill the bounding box, how it can be semi-transparent. We don't want a filled up rectangle, but something like a mask.

  3. How should the tests be ? I have absolutely no clue what they should check for this.

@oke-aditya oke-aditya changed the title [WIP] Adds utility to draw bounding boxes Adds utility to draw bounding boxes Oct 11, 2020
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @oke-aditya! About your questions:

  1. IMO this basically reduces to: do we allow batch processing or not. Given that we cant parallelize this due to our PIL usage I'd say we don't allow it. Passing batches would internally mean we use a for loop anyway and thus the user has no real advantage. I think we can simply try to squeeze(0) if we encounter an image with 4 dimensions and handle the error if this fails.
  2. Yes, we should return the tensor. Why wouldn't we or more importantly: why would a user use this function if he gets no results back?
  3. It can only be transparent if we use a forth channel for the alpha, i.e. change the image from RGB to RGBA. I'm not sure if this is a good idea. Is it common to fill the bounding boxes? I've never seen it before, which does not mean it does not exist. Otherwise I see no point in adding a fill option.
  4. I agree testing this will not be easy, but I think we can cover some basic stuff. For example you can start with an all white image, draw a bounding box and check if only the pixels you would expect are changed. You can also test the colors with this.

In addition to your questions, I have some other remarks below. The linter is also failing, but lets postpone this until we have the functionality right.

torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
Comment on lines 186 to 189
if colors is None:
draw.rectangle(bbox, width=width)
else:
draw.rectangle(bbox, width=width, outline=colors[label])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear to set colors = {} if it is None and use colors.get(label) here. With this we would have no branching and would use the default color if a label is not included in colors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colors is optional, and white colored boxes will be drawn if it is None.

Colors is very tricky parameter. PIL will throw error for unsupported color though.

If we have to handle for unsupported color we might need to catch exception from PIL and then revert to default color.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use getrgb() to parse the colors before we enter the loop and handle the exception there. In general, we should not constrict colors to strings, but also allow int triplets.

torchvision/utils.py Outdated Show resolved Hide resolved
Comment on lines 193 to 195
else:
if draw_labels is True:
draw.text((bbox[0], bbox[1]), label_names[int(label)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above: can we maybe use dict.get() to assign a default label if it is not present in label_names?

torchvision/utils.py Show resolved Hide resolved
@oke-aditya
Copy link
Contributor Author

Hey, @pmeier I too gave a thought on above points. Let me summarize a few pointers.

  1. I think we can handle (B, C, H, W) situation using squeeze(0). Also raise an error if B>1. I think this is just a simply extension to present functionality, and that should suffice. (Lets see what users think)

  2. Let's return the tensor. 👍

  3. Filled bounding box makes little sense to me. Stuff would be completely invisible inside box.
    Also for segmentation tasks, we should probably have another utility draw_masks like we have this. That will draw only Instance segmentation / semantic segmenation masks for tasks.

  4. The colors, width, draw labels these parameters can be optimized further.
    E.g Labels should probably not be drawn beyond the image. If the label is too long, it should be split in 2 lines etc.
    As you pointed width should be something in ratio of image.
    Also colors should restore to default if there is error. Can we automatically assign them to be class specific colors? if possible.

I guess there is too much scope in these improvements, but we should first have tests, proper documentation and minimal implementation working.

@pmeier
Copy link
Collaborator

pmeier commented Oct 12, 2020

I agree with 1. - 3. IMO we should include the proper color handling in the original PR. The automatic line with adaptation as well as the label drawing could be done in separate PRs.

@codecov
Copy link

codecov bot commented Oct 12, 2020

Codecov Report

Merging #2785 into master will decrease coverage by 0.09%.
The diff coverage is 48.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2785      +/-   ##
==========================================
- Coverage   73.22%   73.13%   -0.10%     
==========================================
  Files          96       96              
  Lines        8446     8473      +27     
  Branches     1320     1329       +9     
==========================================
+ Hits         6185     6197      +12     
- Misses       1859     1868       +9     
- Partials      402      408       +6     
Impacted Files Coverage Δ
torchvision/utils.py 60.24% <48.27%> (-7.62%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42e7f1f...1aa1b03. Read the comment docs.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 12, 2020

@pmeier I have handled the image batch issue now. But there are few doubts.

I will try to fix up the colors in this PR. There are lot of edge cases to think through. Below are my thoughts,

  1. User passes no colors -> (Handled already) We use PIL's default and draw all boxes with same colors
  2. User passes colors for only specific classes -> Not sure, I think we should through error or draw others with default?
  3. User passes colors for all classes with names of colors in string -> (Handled already) we draw as they need
  4. User passes colors for all classes with names of colors in rgb tuple -> Should probably convert his and work like 3.

Same doubt is about label names.

  1. User passes no label names : - (Handled already) We simply draw the class number as text
  2. User passes only some label names : - Not sure here, this is not probably how people should use.
  3. User passes all label names : - (Handled already) We draw the label texts.

The problem arises when case 2 in colors and case 2 in label names both occur together. Coz this lead to some unexpected conditions.
Maybe that label has a label name or not. Again making this API super complex to maintain.

I feel a user should either give everything needed or avoid at all. We can raise Errors to ensure user passes correct content.
This would be easier to maintain and use. Since tests are really complicated for this already.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @oke-aditya

Thanks for the PR!

I've left a few comments, let me know what you think.

On a more high level, I think we are on the right direction but we should think a bit more on a few aspects of the API so that we can cover most use-cases for the users.
Also, I wouldn't want to hurry this into the 0.8.0 release, as the deadline is very fast approaching, but I think it would be better to have it for next release (0.9.0).

Let me know what you think.

torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 13, 2020

Also, I wouldn't want to hurry this into the 0.8.0 release, as the deadline is very fast approaching, but I think it would be better to have it for next release (0.9.0)

I too think the same.
We should probably discuss more about this API, since in future possibly we will extend utlitities to visualize segmentation outputs too.
I think this would be half baked till 0.8.0 release (with release being probably this week) and we can provide this feature in next one.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 13, 2020

Let me write here how this API works with torchvision models right now

# We need to specify all 91 COCO clasess I'm keeping it short
label_names = ['background', 'person', 'bicycle']

# Again all 91 are needed.
colors = {1: "blue", 2: "aqua"}

image = Image.open(img_path)
image = T.ToTensor()(image)

# For detection model
image = torch.unsqueeze(image, 0)

model = fasterrcnn_resnet50_fpn(pretrained=True)
model = model.eval()
out = model(image)

boxes = out[0]["boxes"]
labels = out[0]["labels"]

img_drawn = draw_bounding_boxes(image, boxes, labels, label_names, colors=colors)

@fmassa
Copy link
Member

fmassa commented Oct 13, 2020

Yes, let's chat a bit more about this after the release. I'm pretty busy today preparing the branch cut etc so I won't have much time to iterate today, but I gave this function a try and I faced a few issues / problem that I enumerate below:

  • The default colors being all white is a bit annoying, it would be preferable to have distinct default (maybe given by a fixed function like the one I used for maskrcnn-benchmark.
  • When I tried passing colors to the function it didn't seem to work, although I might have done something wrong
  • When I tried passing a uint8 image tensor to the function I got an error

I'll check back on this PR by the end of the week, but here are a few things I would like us to think about:

Labels

Now we need to pass two tensors for printing the labels. What I was originally thinking was to let the user directly specify the labels for each box, so that they can do arbitrary customization (including scores). This way, we don't need to expose a class_label argument to the function either.
Here is what I had in mind:

# let the user create the description for each box
description = [f'{CLASSES[cls_id]} : {s:02f}' for cls_id, sin zip(labels, scores)]
# now just pass it to the visualization function
draw_bounding_boxes(images, boxes, description

I'm not sure how much of a hassle for the user it would be to do this one-liner, but at least it makes things more generic.

Thoughts?

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 13, 2020

Let's not hurry over this. We can work on this after the release 😄

  • I think the color choice has to improve (colors remain major discussion). I think we can give good default colors for labels.
    Maybe something like distinct rgb values generated from label id ?

  • Colors should probably work, I have tried them locally and had attached output. Maybe I will share more code.

For the labels API which you proposed, I think the following.

  1. It can surely do much more job than the current one.
  2. It is quite complicated and doesn't seem intuitive, most probably users are going to face errors in it which will be hard to debug too. I'm not quite sure what else would someone plot apart from labels (description) and scores. This is quite more open-ended and I'm not sure if people will use the same one-linear (might not be obvious to all).

I guess trade-off question that, is this API for plug and play to torchvision models or a generic one ❓

I had in my mind it is plug and play (hence supporting image with 4 dims, allow to pass detection model outputs).
My thoughts were since this is a utility function for torchvision and not for computer vision, in general, it should adhere to this API only.

Currently.
Label names are optional and can contain a description of class/class name so it is quite flexible.
For scores, I propose a new optional parameter though.
I'm not sure what else a person might be interested to plot.

I guess there is lot to discuss over this (maybe I have misunderstood something). Lets catch this up after 0.8.0.

@fmassa
Copy link
Member

fmassa commented Nov 16, 2020

Hi @oke-aditya

Sorry for the delay in getting back to you.

In general, I still think that it is preferable to have a slightly more generic function, even if it requires the user to be a bit more verbose. I think it's an ok trade-off to be made, as it would allow more use-cases for the function. The function would become something like visualize_boxes_with_annotations or something like that, where the annotations for each box can be arbitrary and provided by the user. We just ensure that the annotations are located in a particular location wrt the bounding boxes.

If you don't have enough bandwidth to work on this refactoring for now it's ok, @datumbox agreed to help and to build on top of your PR so that we can get this functionality merged in torchvision soon. Otherwise @datumbox will help co-design / review this PR.

Let us know what you think.

@oke-aditya
Copy link
Contributor Author

Hi @fmassa.

I think this will take some significant changes, refactoring.
I'm happy both the ways.

I think @datumbox will have something great in his mind. Two people handling this might slow down it.
I will leave it to @datumbox if he would like to take-over completely or help to codesign and allow me to continue on this.

@datumbox
Copy link
Contributor

hi @oke-aditya, it's completely up to you!

We think that this PR is useful and we would like to merge it soon. If you think you have the time to make the changes discussed above, I'm happy to support. Else I'll take your PR and make the necessary changes, so that we can merge it ASAP. Just don't close your branch because I plan to make the changes in-place.

Let me know! :)

@oke-aditya
Copy link
Contributor Author

Looks like all of us want this 😄 .

@datumbox You can go ahead 🚀 . I won't close or delete this branch.

Let me know if you need access to this fork, etc. Super Eager to see this PR getting to master

@datumbox datumbox self-assigned this Nov 20, 2020
@datumbox datumbox changed the title Adds utility to draw bounding boxes [WIP] Adds utility to draw bounding boxes Nov 20, 2020
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I've made a few comments, let me know what you think

torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
colors: Optional[List[str]] = None,
labels: Optional[List[str]] = None,
width: int = 1,
font: Optional[ImageFont] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we won't be using this function in torchscript, I'm ok having the input type of the function to be PIL-specific

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not terribly excited about this TBH:

  • On one hand the method receives a uint8 tensor as input (not a PIL image) and hides completely any dependency on PIL. I would agree with earlier comments of yours that it's a bit odd that we expose ImageFont here.
  • On the other hand, using PIL's ImageFont gives the flexibility to the user to do whatever they want without having to deal on our side with the details on how to instantiate the object. It's surely is ugly though and makes for a weird API.

I could try to create a font parameter similar to PIL with description "A filename or file-like object containing a TrueType font." and a font_size. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look on the latest commit for an alternative to passing ImageFont. We can choose any of the two options, I'm OK with both.

torchvision/utils.py Outdated Show resolved Hide resolved
torchvision/utils.py Outdated Show resolved Hide resolved
test/test_utils.py Outdated Show resolved Hide resolved
boxes = torch.tensor([[0, 0, 100, 100], [0, 0, 0, 0],
[10, 15, 30, 35], [23, 35, 93, 95]], dtype=torch.float)
labels = ['a', 'b', 'c', 'd']
utils.draw_bounding_boxes(img, boxes, labels=labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a test checking that the color of the output image at pixel values out[:, 0, 0:100] == fillcolor etc, so that we know that we are masking the correct pixels in the image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should test pixels but I would rather test all functionalities including labels, fonts etc. I wonder if that's possible or if it will crate a flaky test due to differences on fonts across platforms. I'll give a try to test what I proposed on the earlier comment and see if this works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest code for the proposed approach of testing.

def set_rng_seed(seed):
torch.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was originally done on an intermediate commit where I was producing a random image and had to fix the seed. Though I switched to non-random to reduce the size, I think it's a good idea to move this method from test_models.py to commont_utils.py, so I kept the change in this PR.

@datumbox datumbox changed the title [WIP] Adds utility to draw bounding boxes Add utility to draw bounding boxes Nov 20, 2020
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot!

Comment on lines +92 to +93
if not os.path.exists(path):
Image.fromarray(result.permute(1, 2, 0).numpy()).save(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any particular reason why you use PIL to save the result, and not write_image? Although this is not really important as the file is committed to the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is worth changing.

draw.rectangle(bbox, width=width, outline=color)

if labels is not None:
txt_font = ImageFont.load_default() if font is None else ImageFont.truetype(font=font, size=font_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for a follow-up PR: we can move this to outside of the for loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this can move outside of the loop.

@fmassa fmassa merged commit 240210c into pytorch:master Nov 27, 2020
@datumbox
Copy link
Contributor

Lots of thanks to @oke-aditya and @sumanthratna for their thorough investigations and contributions on the final API and implementation.

@oke-aditya
Copy link
Contributor Author

That's so kind of you @datumbox . Not much from me, it was your great work to get this done.

vfdev-5 pushed a commit to Quansight/vision that referenced this pull request Dec 4, 2020
* initital prototype

* flake

* Adds documentation

* minimal working bboxes

* Adds label display

* adds colors :-)

* adds suggestions and fixes CI

* handles image of dim 4

* fixes image handling

* removes dev file

* adds suggested changes

* Updating the API.

* Update test.

* Implementing code review improvements.

* Further refactoring and adding test.

* Replace random to white to reduce size and change font on tests.

Co-authored-by: Vasilis Vryniotis <[email protected]>
@oke-aditya oke-aditya deleted the add_drawbox branch January 22, 2021 08:49
This was referenced Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bounding boxes in torchvision.utils
5 participants