-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ops to convert masks
to boxes
#3960
Comments
I think this could be useful, at least as a plotting utils. I'm not sure I fully understand the use of such util within a model though?
Yes the input should be boolean: models have different representations for floating masks as illustrated in the new example, so we can't have a unified approach for floating masks. |
I think it should under be They return the passed image tensors back This is a operation (of course is non-differentiable) hece suits better to The only models which have different representation are instance segmentation models. Is this utility essential for instance segmentation? (I'm not sure) |
Thanks for bringing this up @oke-aditya. This is a text-book example of utility we want to upstream from DeTR to TorchVision. I think it should be OK placing it in torchvision.ops.boxes along with the other box utilities ported already.
Yes please, good call. We need to be careful for numeric overflows. See #3383 for addressing a similar issues on other box ops. Edit: Also since we are porting things from DeTR, let's make sure we give credit by putting a reference to the original code. You can see examples of that already in our code. |
What do you think @datumbox about the segmentation masks? Should they be kept float as I would prefer to keep float masks. |
I think as per your original point, you would have to support I also see value to support for bool masks, as this will make it work seamlessly with Nicolas' Adding @fmassa in the discussion as he usually have good intuition around segmentation/detection. |
Has anyone started to work on this? If not, I'm more than happy to port it over. |
I think supporting boolean masks probably requires additional discussion but porting DeTR's |
I would really rather not start implementing a method that is only specific to a small subset of models. |
This is a general purpose utility unrelated to drawing. As part of the Batteries Included initiative, we are upstreaming some utils that exist on ecosystem projects such as DETR. I think the masks_to_boxes is a meaningful addition as it can be on specific datasets such as the CocoPanoptic dataset which I mentioned earlier. |
I understand it's unrelated to drawing. My point is that we should aim for this to be as generic as possible so that we can support all possible torchvision use-cases, instead of a subset of them. I might be missing something but the current proposed API seems specifically targeted towards a subset of models / datasets. |
Could you please elaborate on why this specific operator is not generic enough to be included? IMO a quick search shows that the generic and useful to be used across different use-cases and as a result it's ended up being copy-pasted over and over in multiple projects across FAIR. This makes them an excellent candidate for up streaming which is the key goal of Batteries Included. |
From our example https://pytorch.org/vision/stable/auto_examples/plot_visualization_utils.html, instance segmentation masks and semantic segmentation masks both rely on float values but they are encoded very differently and those float values don't mean the same thing. The "greatest common factor" of these 2 distinct representations is the boolean representation. |
Hi all, I think that having a function to convert from segmentation masks to bounding boxes would be very useful. In fact, in our object detection finetuning tutorial we re-implement a But I understand @NicolasHug point that there can be some confusion about what a |
@oke-aditya @0x00b1 please coordinate between the two of you who will send the PR to avoid duplication of effort. It's worth including tests for different dtypes to ensure that everything works properly (see here for examples). |
Hey @0x00b1 feel free to send a PR! Edit: |
@oke-aditya Great! I'll send one this afternoon. I'll include a gallery example. |
Quick question: this PR looks applicable only to (batched) 2D images. What about 3D images? I would also like to convert 3d segmentation masks to 3D bounding boxes. |
Could this be generalized to allow for the possibility of multiple, discrete boxes per mask, or would that be better suited for a separate function? The following example demonstrates the current vs. desired behavior import torch
from torchvision.ops import masks_to_boxes
# Generate dummy [1, 5, 5] mask
masks = torch.tensor([
[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 0],
]).unsqueeze(0)
# Current (assumed single box) : [[0, 1, 3, 4]]
# Desired (allow multiple) : [[[0, 1, 1, 2], [2, 3, 3, 4]]]
boxes = masks_to_boxes(masks)
print(boxes) Note in this case the return tensor should have rank 3 instead of 2 to represent |
|
@syed-javed Yes I've got one working now. The strategy is to iterate through each With the function below, you can reproduce my desired output. Please note my input tensor is slightly different, specifically boxes, scores = heatmap_to_bboxes(masks.squeeze().float())
# boxes: [[0, 1, 1, 2], [2, 3, 3, 4]]]
# scores: [[1, 1]] The function from copy import deepcopy
import torch
from torchvision.ops import batched_nms
def heatmap_to_bboxes(heatmap, pos_thres=0.5, nms_thres=0.5, score_thres=0.5):
"""Cluster heatmap into discrete bounding boxes
:param torch.Tensor[H, W] heatmap: Predicted probabilities
:param float pos_thres: Threshold for assigning probability to positive class
:param Optional[float] nms_thres: Threshold for non-max suppression (or ``None`` to skip)
:param Optional[float] score_thres: Threshold for final bbox scores (or ``None`` to skip)
:return Tuple[torch.Tensor]: Containing
* bboxes[N, C=4]: bounding box coordinates in ltrb format
* scores[N]: confidence scores (averaged across all pixels in the box)
"""
def get_roi(data, bounds):
"""Extract region of interest from a tensor
:param torch.Tensor[H, W] data: Original data
:param dict bounds: With keys for left, right, top, and bottom
:return torch.Tensor[H', W']: Subset of the original data
"""
compound_slice = (
slice(bounds['top'], bounds['bottom']),
slice(bounds['left'], bounds['right']))
return data[compound_slice]
def is_covered(x, y, bbox):
"""Determine whether a point is covered/inside a bounding box
:param int x: Point x-coordinate
:param int y: Point y-coordinate
:param torch.Tensor[int(4)] bbox: In ltrb format
:return bool: Whether all boundaries are satisfied
"""
left, top, right, bottom = bbox
bounds = [
x >= left,
x <= right,
y >= top,
y <= bottom]
return all(bounds)
# Determine indices of each positive pixel
heatmap_bin = torch.where(heatmap > pos_thres, 1, 0)
mask = torch.ones(heatmap.size()).type_as(heatmap)
idxs = torch.flip(torch.nonzero(heatmap_bin*mask), [1])
heatmap_height, heatmap_width = heatmap.shape
# Limit potential expansion to the heatmap boundaries
edge_names = ['left', 'top', 'right', 'bottom']
limits = {
'left': 0,
'top': 0,
'right': heatmap_width,
'bottom': heatmap_height}
bboxes = []
scores = []
# Iterate over positive pixels
for x, y in idxs:
# Skip if an existing bbox already covers this point
already_covered = False
for bbox in bboxes:
if is_covered(x, y, bbox):
already_covered = True
break
if already_covered:
continue
# Start by looking 1 row/column in every direction and iteratively expand the ROI from there
incrementers = {k: 1 for k in edge_names}
max_bounds = {
'left': deepcopy(x),
'top': deepcopy(y),
'right': deepcopy(x),
'bottom': deepcopy(y)}
while True:
# Extract the new, expanded ROI around the current (x, y) point
bounds = {
'left': max(limits['left'], x - incrementers['left']),
'top': max(limits['top'], y - incrementers['top']),
'right': min(limits['right'], x + incrementers['right'] + 1),
'bottom': min(limits['bottom'], y + incrementers['bottom'] + 1)}
roi = get_roi(heatmap_bin, bounds)
# Get the vectors along each edge
edges = {
'left': roi[:, 0],
'top': roi[0, :],
'right': roi[:, -1],
'bottom': roi[-1, :]}
# Continue if at least one new edge has more than ``pos_thres`` percent positive elements
# Also check whether ROI has reached the heatmap boundary
keep_going = False
for k, v in edges.items():
if v.sum()/v.numel() > pos_thres and limits[k] != max_bounds[k]:
keep_going = True
max_bounds[k] = bounds[k]
incrementers[k] += 1
# If none of the newly expanded edges were useful
# Then convert the maximum ROI to bbox and calculate its confidence
# Single pixel islands are ignored since they have zero width/height
if not keep_going:
final_roi = get_roi(heatmap, max_bounds)
if final_roi.numel() > 0:
bboxes.append([max_bounds[k] - 1 if i > 1 else max_bounds[k]
for i, k in enumerate(edge_names)])
scores.append(final_roi.mean())
break
# Type conversions and optional NMS + score filtering
bboxes = torch.tensor(bboxes).type_as(heatmap)
scores = torch.tensor(scores).type_as(heatmap)
if nms_thres is not None:
class_idxs = torch.zeros(bboxes.shape[0])
keep_idxs = batched_nms(bboxes.float(), scores, class_idxs, iou_threshold=nms_thres)
bboxes = bboxes[keep_idxs]
scores = scores[keep_idxs]
if score_thres is not None:
high_confid = scores > score_thres
bboxes = bboxes[high_confid]
scores = scores[high_confid]
return bboxes, scores |
🚀 Feature
A simple
torchvision.ops
to convert Segmentation masks to bounding boxes.Motivation
This has a few use-cases.
This makes it easier to use semantic segmentation datasets for object detection.
The pipeline can be easier. Also the bounding boxes are represented as
xyxy
intorchvision.ops
as a convention.So probably convert masks to
xyxy
format.The other use case is to make it easier in comparing performance of segmentation model vs detection model.
Let's Say that the detection model performs well for segmentation dataset. Then it would be better to go ahead with detection models as it is faster in real-time use-cases than to train a segmentation model.
New Pipeline
Pitch
Port the masks_to_boxes function from mDeTR.
masks_to_boxes was also used in DeTR.
Alternatives
The above function assumes masks of shape
(N, H, W)
->num_masks, Height, Width
. A floating tensor.IIRC, we used a boolean tensor in
draw_segmentation_masks
(After Nicolas refactored). So perhaps we should be using boolean tensor? Though I see no particular use case of this util being only valid for instance segmentation.Additional context
I can port this, we perhaps need a few tests to ensure it works fine.
Especially test for float16 overflow.
cc @datumbox @NicolasHug
The text was updated successfully, but these errors were encountered: