-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bounding boxes in torchvision.utils #2556
Comments
Hey @sumanthratna this sounds like a reasonable request. Lets wait for @fmassa to approve this, but I think we can move forward with this. Would you like to send a PR? |
Sure @pmeier! Here are the solutions I'm thinking of:
What do you think is the best option? I'm personally leaning towards the second option. |
Hi, We do want to have functions for drawing bounding boxes in images, but we need to come up with requirements and API. If we plot a rectangle, we would probably also want to support plotting text, selecting colors, line width, etc. Additionally, should the API support a single box at a time or a batch of boxes? torchvision currently don't depend on OpenCV, and we would like to keep it that way. So the box drawing function would need to depend on either: PIL, matplotlib or be natively implemented in torchvision I would not be very concerned about the overhead of this function (converting a Tensor to an ndarray is cheap, and the same for a PIL.Image), plus I wouldn't expect this to be on a critical path so it's ok if it's not super efficient. |
If we only go for drawing boxes, I think this can be done straight forward in
This will be a problem in
I'm voting for multiple boxes to avoid a color conflict between boxes. If we only plot one box at a time either the user has to manually specify a new color for each box or we need to somehow track which colors we have already used. |
@fmassa those are good questions that I didn't consider, and I definitely agree with what @pmeier has suggested.
Originally, I thought that we should leave it up to users to decide the color, to avoid any confusing behavior. However, I do think that batch drawing might make for cleaner code on the user-end. All in all, I think we should go with batch drawing. I'm still unsure if we want to deal with color conflicts, though. Sometimes a user will want to draw multiple boxes with the same class (see the image below, source): Here's what I've written as a workaround until this gets merged (I haven't tested it): def draw_bounding_box(
tensor,
bbox,
nrow=8,
padding=2,
normalize=False,
range=None,
scale_each=False,
pad_value=0,
fill=None,
outline=None,
width=1,
):
from torchvision.utils import make_grid
from PIL import Image, ImageDraw
grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=pad_value,
normalize=normalize, range=range, scale_each=scale_each)
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im)) It should be simple to convert this to a batch-bbox function. |
@sumanthratna I don't think we should couple this with |
That seems reasonable. I initially just copied Is this better? We might not need to unnormalize. def draw_bounding_box(
tensor,
bbox,
fill=None,
outline=None,
width=1,
):
from PIL import Image, ImageDraw
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im)) |
|
|
As the name implies, |
In my case, I want to be able to fill in bounding boxes because I want to be able to treat bounding boxes as segmentation masks. That is, by filling in boxes onto a black background, I can use
Looking at these three points against this use-case, I suppose we can remove (off the top of my head; untested) def draw_bounding_box(
tensor,
bboxes,
color=None,
width=1,
):
from PIL import Image, ImageDraw
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
draw = ImageDraw.Draw(im)
if len(color) == 3 and isinstance(color[0], int):
# bboxes is a sequence and color is a single (R, G, B) color
# TODO: check if Pillow accepts any other formats for the color kwarg
# TODO: this if-statement looks ugly
colors = [color]*len(bboxes)
else:
# bboxes and color are both sequences of the same len
colors = color
for bbox, color in zip(bboxes, colors):
draw.rectangle(bbox, outline=color, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im)) |
Yeah, I think your use-case is an edge-case that justifies a custom implementation on the user side. Two more remarks:
|
One thing to discuss about palettes is how we want to decide colors when a user wants to draw more than 10 classes. I'm tempted to make |
Why not? It sounds pretty sensible to me to raise an error if our defaults are not sufficient for the use case. Do you think users will hit this case often? |
Yup!
I'm not a fan of this idea because it feels like 10 is an arbitrary length of the palette (even though it's really not). I think these are our options:
What do you think? I'm fine with raising an error. I'm also wondering if we should scrap the |
From what we have now, I think this is the best option:
The color palette given in my second link has up to 12 colors. I think if someone really needs more, this is out of scope for us. I even think having 12 classes of bboxes at the same time is confusing enough that most people won't go for that.
|
That's a valid point; let's go with raising an error if the number of classes exceeds the length of the palette and there are no custom colors.
I agree with your statement about most people not needing custom colors. However, I'm unsure why the To me, using a dict for classes and bboxes but not colors doesn't totally make sense. If we want to separate arguments, I think we should have 3 args: |
The problem I see is that IMO most people will not touch the colors. Thus, if we go for your approach
you are forcing the user to handle the colors manually. If we go for the bbox signature we previously agreed on, the user has not to fiddle with the colors at all. I think its more convenient to pass bboxes = {"class0": (bbox0, bbox1), "class1": (bbox2,)} and maybe having to specify colors = {"class0": "red", "class1": "blue"} instead of always passing colors = {"class0": {"bboxes": (bbox0, bbox1), "color": "red"}, "class1": {"bboxes": (bbox2,), "color": "blue"}} |
That makes sense. I think we should go with a dict for colors then, because dicts don't preserve order so we can't exactly "align" a dict of def draw_bounding_boxes(tensor, bboxes, colors=None, width=1):
"""Draws the bounding boxes onto an image represented as a torch tensor.
tensor: torch.Tensor
bboxes: dict of object_class: list_of_bboxes_for_class
colors: dict of object_class: color_of_bboxes_for_class, optional
"""
pass The docstring is pretty bad, but @pmeier does this signature look okay? |
Don't forget the option to pass the bboxes as a sequence. BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]
def draw_bounding_boxes(
image: torch.Tensor,
bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
colors: Optional[Dict[str, Color]] = None,
width: int = 1,
) -> torch.Tensor:
pass I think the default behavior of
|
I like that, but I think we should add an if-statement or assert to make sure users don't pass in a sequence for |
Since we agree on the signature lets move on to the PR for now! If we hit any blocks down the road, lets solve them there. Just ping me when you are ready. |
Great! I'm actually very busy until August 29, but I'll definitely try to get a draft PR in after then. Ping me if I don't do it by September 3! |
Oh—shouldn't we have a bool kwarg for whether class labels should be drawn on the bbox borders (see the image I sent earlier)? That would make the signature: BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]
def draw_bounding_boxes(
image: torch.Tensor,
bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
colors: Optional[Dict[str, Color]] = None,
draw_labels: bool = False,
width: int = 1,
) -> torch.Tensor:
pass |
I think that is a fair point although I would go with |
Hey @pmeier and @sumanthratna Allow user to pass Bboxes which are outputs of our detection models. This will make it easier to plot and process, as most probably people are using this function for detection models in torchvision Also then the user can probably use the recently added This stays consistent with other utilities such as I'm slightly unsure for labels I think if we can again re-use detection models labels it would be nicer.
Hence I guess the API can be
I understand the point of the older API design. But this new one makes this utility useful. As this is a utility to use with torchvision, it should rather be useful to torchvision API. These are just small tweaks to take tensors instead of lists. Accept labels and keep colors to (int, str) dict to make it easier. I'm still thinking about the text. Maybe allow the user to pass This API is much simpler to use and makes it really intuitive to pass outputs of detection models itself. |
@oke-aditya Very good points! We didn't consider the detection API before, but I think you are right that |
Great Then, I probably need a fresh PR for this. Since I have to rewrite the logic :-) I will cite @sumanthratna in the code |
The function raises this error for me:
Input
Image is converted from PIL with Any suggestion? Thanks |
Hello @mfoglio I think there is a minor error in your image shape. It should probably be I think you have it Here is a minimal example that works fine. Hope this helps.
|
Hi @oke-aditya , Regarding the last PR about draw_bounding_boxes, I noticed that you don't enforce the same color per class. Thanks for your time |
@ElHouas Thanks for the question. I replied here #5127 (comment) |
🚀 Feature
I'd like to easily be able to draw a bounding box onto an image represented as a torch tensor.
Motivation
Using YOLO, I get a bunch of bounding boxes in an image. I want to be able to easily draw those onto a torch tensor (created via
torchvision.transforms.ToTensor
). This seems like a reasonable request because other bbox utilities such as NMS exist in torchvision.Pitch
Alternatives
I think the following works right now, but the
torch.Tensor
>PIL.Image
>torch.Tensor
conversion for a single operation per image makes me feel uncomfortable due to efficiency.Additional context
See
cv2.rectangle
(source) andPIL.ImageDraw.rectangle
(source).The text was updated successfully, but these errors were encountered: