Ops to convert `masks` to `boxes` #3960

oke-aditya · 2021-06-03T18:46:40Z

🚀 Feature

A simple torchvision.ops to convert Segmentation masks to bounding boxes.

Motivation

This has a few use-cases.

This makes it easier to use semantic segmentation datasets for object detection.
The pipeline can be easier. Also the bounding boxes are represented as xyxy in torchvision.ops as a convention.
So probably convert masks to xyxy format.
The other use case is to make it easier in comparing performance of segmentation model vs detection model.
Let's Say that the detection model performs well for segmentation dataset. Then it would be better to go ahead with detection models as it is faster in real-time use-cases than to train a segmentation model.

New Pipeline

from torchvision.ops import masks_to_boxes, box_convert

class SegmentationToDetectionDataset(Dataset):
    def __getitem__(self, idx):
          boxes_xyxy = masks_to_boxes(segmentation_masks)

         # Now for any change of boxes to COCO Format.
          boxes_xywh = box_convert(boxes_xyxy, in_fmt="xyxy", out_fmt="xywh")
          return boxes_xywh

Pitch

Port the masks_to_boxes function from mDeTR.

masks_to_boxes was also used in DeTR.

Alternatives

The above function assumes masks of shape (N, H, W) -> num_masks, Height, Width. A floating tensor.
IIRC, we used a boolean tensor in draw_segmentation_masks (After Nicolas refactored). So perhaps we should be using boolean tensor? Though I see no particular use case of this util being only valid for instance segmentation.

Additional context

I can port this, we perhaps need a few tests to ensure it works fine.
Especially test for float16 overflow.

cc @datumbox @NicolasHug

The text was updated successfully, but these errors were encountered:

NicolasHug · 2021-06-04T08:52:17Z

I think this could be useful, at least as a plotting utils. I'm not sure I fully understand the use of such util within a model though?

So perhaps we should be using boolean tensor?

Yes the input should be boolean: models have different representations for floating masks as illustrated in the new example, so we can't have a unified approach for floating masks.

oke-aditya · 2021-06-04T09:15:22Z

I think it should under be ops and not utils as utils contain utilities which do not do any manipulation of any masks / boxes.

They return the passed image tensors back

This is a operation (of course is non-differentiable) hece suits better to torchvision.ops.

The only models which have different representation are instance segmentation models. Is this utility essential for instance segmentation? (I'm not sure)

datumbox · 2021-06-04T09:42:22Z

Thanks for bringing this up @oke-aditya.

This is a text-book example of utility we want to upstream from DeTR to TorchVision. I think it should be OK placing it in torchvision.ops.boxes along with the other box utilities ported already.

Especially test for float16 overflow.

Yes please, good call. We need to be careful for numeric overflows. See #3383 for addressing a similar issues on other box ops.

Edit: Also since we are porting things from DeTR, let's make sure we give credit by putting a reference to the original code. You can see examples of that already in our code.

oke-aditya · 2021-06-04T10:28:48Z

What do you think @datumbox about the segmentation masks? Should they be kept float as (num_masks, H, W) or should we change to adopt bool masks?

I would prefer to keep float masks.

datumbox · 2021-06-04T12:36:35Z

This makes it easier to use ~~semantic~~ segmentation datasets for object detection.

I think as per your original point, you would have to support (num_masks, H, W) if you were to convert a the masks of a specific dataset to boxes for object detection. See how DeTR uses the method on the CocoPanoptic dataset.

I also see value to support for bool masks, as this will make it work seamlessly with Nicolas' draw_segmentation_masks but it's unclear to me if that should happen on the same method or on a separate. I would personally start by porting DeTR's implementation, adding tests, handling overflows and take it from there.

Adding @fmassa in the discussion as he usually have good intuition around segmentation/detection.

0x00b1 · 2021-08-17T19:23:35Z

Has anyone started to work on this? If not, I'm more than happy to port it over.

datumbox · 2021-08-18T08:23:35Z

I think supporting boolean masks probably requires additional discussion but porting DeTR's masks_to_boxes method can be done now. @0x00b1 happy to review a PR about it. Just tag me. :)

NicolasHug · 2021-08-18T08:25:46Z

I would really rather not start implementing a method that is only specific to a small subset of models.
We should try to avoid the same scenario where we had to re-write the entire draw_segmentation_masks util: #3824

datumbox · 2021-08-18T08:30:36Z

This is a general purpose utility unrelated to drawing. As part of the Batteries Included initiative, we are upstreaming some utils that exist on ecosystem projects such as DETR. I think the masks_to_boxes is a meaningful addition as it can be on specific datasets such as the CocoPanoptic dataset which I mentioned earlier.

NicolasHug · 2021-08-18T08:41:03Z

I understand it's unrelated to drawing. My point is that we should aim for this to be as generic as possible so that we can support all possible torchvision use-cases, instead of a subset of them. I might be missing something but the current proposed API seems specifically targeted towards a subset of models / datasets.

datumbox · 2021-08-18T08:51:58Z

Could you please elaborate on why this specific operator is not generic enough to be included?

IMO a quick search shows that the generic and useful to be used across different use-cases and as a result it's ended up being copy-pasted over and over in multiple projects across FAIR. This makes them an excellent candidate for up streaming which is the key goal of Batteries Included.

NicolasHug · 2021-08-18T08:58:21Z

From our example https://pytorch.org/vision/stable/auto_examples/plot_visualization_utils.html, instance segmentation masks and semantic segmentation masks both rely on float values but they are encoded very differently and those float values don't mean the same thing. The "greatest common factor" of these 2 distinct representations is the boolean representation.

fmassa · 2021-08-18T09:20:39Z

Hi all,

I think that having a function to convert from segmentation masks to bounding boxes would be very useful.
As @oke-aditya pointed out, some datasets come only with segmentation masks by default, but Faster R-CNN family of models actually require the bounding boxes as well, so this would simplify things.

In fact, in our object detection finetuning tutorial we re-implement a masks_to_boxes function to be able to train Faster R-CNN on a custom dataset.

But I understand @NicolasHug point that there can be some confusion about what a mask is, so we should be very clear about the input representation. I think enforcing it is a boolean mask of (num_masks, H, W) shape is an ok trade-off.
The confusion arises from the fact that segmentation is an overloaded term, with semantic segmentation, instance segmentation and panoptic segmentation all meaning slightly different things.

datumbox · 2021-08-18T09:32:36Z

@oke-aditya @0x00b1 please coordinate between the two of you who will send the PR to avoid duplication of effort. It's worth including tests for different dtypes to ensure that everything works properly (see here for examples).

oke-aditya · 2021-08-18T10:13:24Z

Hey @0x00b1 feel free to send a PR!

Edit:
It would be nice to document a small example (probably it can be in gallery) demonstrating how this could be used with existing datasets. Since lot of downstream libraries depend, they can make use of this.

0x00b1 · 2021-08-18T13:50:55Z

@oke-aditya Great! I'll send one this afternoon. I'll include a gallery example.

RylanSchaeffer · 2021-08-28T00:01:32Z

Quick question: this PR looks applicable only to (batched) 2D images. What about 3D images? I would also like to convert 3d segmentation masks to 3D bounding boxes.

addisonklinke · 2022-01-25T18:59:10Z

Could this be generalized to allow for the possibility of multiple, discrete boxes per mask, or would that be better suited for a separate function? The following example demonstrates the current vs. desired behavior

import torch
from torchvision.ops import masks_to_boxes

# Generate dummy [1, 5, 5] mask
masks = torch.tensor([
    [0, 0, 0, 0, 0],
    [1, 1, 0, 0, 0],
    [1, 1, 0, 0, 0],
    [0, 0, 1, 1, 0],
    [0, 0, 1, 1, 0],
]).unsqueeze(0)

# Current (assumed single box) : [[0, 1, 3, 4]]
# Desired (allow multiple)     : [[[0, 1, 1, 2], [2, 3, 3, 4]]]
boxes = masks_to_boxes(masks)
print(boxes)

Note in this case the return tensor should have rank 3 instead of 2 to represent [num_masks, num_boxes, 4]

syed-javed · 2022-01-27T12:14:33Z

Could this be generalized to allow for the possibility of multiple, discrete boxes per mask, or would that be better suited for a separate function? The following example demonstrates the current vs. desired behavior
import torch
from torchvision.ops import masks_to_boxes

# Generate dummy [1, 5, 5] mask
masks = torch.tensor([
    [0, 0, 0, 0, 0],
    [1, 1, 0, 0, 0],
    [1, 1, 0, 0, 0],
    [0, 0, 1, 1, 0],
    [0, 0, 1, 1, 0],
]).unsqueeze(0)

# Current (assumed single box) : [[0, 1, 3, 4]]
# Desired (allow multiple)     : [[[0, 1, 1, 2], [2, 3, 3, 4]]]
boxes = masks_to_boxes(masks)
print(boxes)
Note in this case the return tensor should have rank 3 instead of 2 to represent [num_masks, num_boxes, 4]
Hello @addisonklinke , Do you have a solution that gives the desired output as you mentioned?

addisonklinke · 2022-01-27T16:34:14Z

@syed-javed Yes I've got one working now. The strategy is to iterate through each (x, y) location where there's a positive (i.e confidence > threshold) prediction. From those locations, iteratively expand outwards as long as each boundary edge has an average confidence greater than the threshold. Ignore points that overlap with a previously created box to speed up the iteration

With the function below, you can reproduce my desired output. Please note my input tensor is slightly different, specifically torch.FloatTensor[H, W] instead of torch.BoolTensor[N, H, W]. Also the return is a tuple of (boxes, scores) where scores is the average confidence of each region

boxes, scores = heatmap_to_bboxes(masks.squeeze().float())
# boxes: [[0, 1, 1, 2], [2, 3, 3, 4]]]
# scores: [[1, 1]]

The function

from copy import deepcopy
import torch
from torchvision.ops import batched_nms


def heatmap_to_bboxes(heatmap, pos_thres=0.5, nms_thres=0.5, score_thres=0.5):
    """Cluster heatmap into discrete bounding boxes

    :param torch.Tensor[H, W] heatmap: Predicted probabilities
    :param float pos_thres: Threshold for assigning probability to positive class
    :param Optional[float] nms_thres: Threshold for non-max suppression (or ``None`` to skip)
    :param Optional[float] score_thres: Threshold for final bbox scores (or ``None`` to skip)
    :return Tuple[torch.Tensor]: Containing
        * bboxes[N, C=4]: bounding box coordinates in ltrb format
        * scores[N]: confidence scores (averaged across all pixels in the box)
    """

    def get_roi(data, bounds):
        """Extract region of interest from a tensor

        :param torch.Tensor[H, W] data: Original data
        :param dict bounds: With keys for left, right, top, and bottom
        :return torch.Tensor[H', W']: Subset of the original data
        """
        compound_slice = (
            slice(bounds['top'], bounds['bottom']),
            slice(bounds['left'], bounds['right']))
        return data[compound_slice]

    def is_covered(x, y, bbox):
        """Determine whether a point is covered/inside a bounding box

        :param int x: Point x-coordinate
        :param int y: Point y-coordinate
        :param torch.Tensor[int(4)] bbox: In ltrb format
        :return bool: Whether all boundaries are satisfied
        """
        left, top, right, bottom = bbox
        bounds = [
            x >= left,
            x <= right,
            y >= top,
            y <= bottom]
        return all(bounds)

    # Determine indices of each positive pixel
    heatmap_bin = torch.where(heatmap > pos_thres, 1, 0)
    mask = torch.ones(heatmap.size()).type_as(heatmap)
    idxs = torch.flip(torch.nonzero(heatmap_bin*mask), [1])
    heatmap_height, heatmap_width = heatmap.shape

    # Limit potential expansion to the heatmap boundaries
    edge_names = ['left', 'top', 'right', 'bottom']
    limits = {
        'left': 0,
        'top': 0,
        'right': heatmap_width,
        'bottom': heatmap_height}
    bboxes = []
    scores = []

    # Iterate over positive pixels
    for x, y in idxs:

        # Skip if an existing bbox already covers this point
        already_covered = False
        for bbox in bboxes:
            if is_covered(x, y, bbox):
                already_covered = True
                break
        if already_covered:
            continue

        # Start by looking 1 row/column in every direction and iteratively expand the ROI from there
        incrementers = {k: 1 for k in edge_names}
        max_bounds = {
            'left': deepcopy(x),
            'top': deepcopy(y),
            'right': deepcopy(x),
            'bottom': deepcopy(y)}
        while True:

            # Extract the new, expanded ROI around the current (x, y) point
            bounds = {
                'left': max(limits['left'], x - incrementers['left']),
                'top': max(limits['top'], y - incrementers['top']),
                'right': min(limits['right'], x + incrementers['right'] + 1),
                'bottom': min(limits['bottom'], y + incrementers['bottom'] + 1)}
            roi = get_roi(heatmap_bin, bounds)

            # Get the vectors along each edge
            edges = {
                'left': roi[:, 0],
                'top': roi[0, :],
                'right': roi[:, -1],
                'bottom': roi[-1, :]}

            # Continue if at least one new edge has more than ``pos_thres`` percent positive elements
            # Also check whether ROI has reached the heatmap boundary
            keep_going = False
            for k, v in edges.items():
                if v.sum()/v.numel() > pos_thres and limits[k] != max_bounds[k]:
                    keep_going = True
                    max_bounds[k] = bounds[k]
                    incrementers[k] += 1

            # If none of the newly expanded edges were useful
            # Then convert the maximum ROI to bbox and calculate its confidence
            # Single pixel islands are ignored since they have zero width/height
            if not keep_going:
                final_roi = get_roi(heatmap, max_bounds)
                if final_roi.numel() > 0:
                    bboxes.append([max_bounds[k] - 1 if i > 1 else max_bounds[k] 
                                   for i, k in enumerate(edge_names)])
                    scores.append(final_roi.mean())
                break

    # Type conversions and optional NMS + score filtering
    bboxes = torch.tensor(bboxes).type_as(heatmap)
    scores = torch.tensor(scores).type_as(heatmap)
    if nms_thres is not None:
        class_idxs = torch.zeros(bboxes.shape[0])
        keep_idxs = batched_nms(bboxes.float(), scores, class_idxs, iou_threshold=nms_thres)
        bboxes = bboxes[keep_idxs]
        scores = scores[keep_idxs]
    if score_thres is not None:
        high_confid = scores > score_thres
        bboxes = bboxes[high_confid]
        scores = scores[high_confid]
    return bboxes, scores

datumbox mentioned this issue Jun 4, 2021

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

16 tasks

datumbox added new feature module: ops labels Jun 4, 2021

0x00b1 mentioned this issue Aug 18, 2021

masks_to_bounding_boxes op #4290

Merged

RylanSchaeffer mentioned this issue Aug 31, 2021

Generalize masks_to_boxes op to support N-dimensional masks to bounding boxes conversion #4339

Open

datumbox assigned 0x00b1 Sep 4, 2021

vadimkantorov mentioned this issue Sep 17, 2021

[feature request] [discussion] mask utils in core #4415

Open

datumbox closed this as completed in #4290 Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ops to convert `masks` to `boxes` #3960

Ops to convert `masks` to `boxes` #3960

oke-aditya commented Jun 3, 2021 •

edited

Loading

NicolasHug commented Jun 4, 2021

oke-aditya commented Jun 4, 2021

datumbox commented Jun 4, 2021 •

edited

Loading

oke-aditya commented Jun 4, 2021

datumbox commented Jun 4, 2021

0x00b1 commented Aug 17, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

fmassa commented Aug 18, 2021

datumbox commented Aug 18, 2021

oke-aditya commented Aug 18, 2021 •

edited

Loading

0x00b1 commented Aug 18, 2021 •

edited

Loading

RylanSchaeffer commented Aug 28, 2021

addisonklinke commented Jan 25, 2022

syed-javed commented Jan 27, 2022

addisonklinke commented Jan 27, 2022 •

edited

Loading

Ops to convert masks to boxes #3960

Ops to convert masks to boxes #3960

Comments

oke-aditya commented Jun 3, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

NicolasHug commented Jun 4, 2021

oke-aditya commented Jun 4, 2021

datumbox commented Jun 4, 2021 • edited Loading

oke-aditya commented Jun 4, 2021

datumbox commented Jun 4, 2021

0x00b1 commented Aug 17, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

datumbox commented Aug 18, 2021

NicolasHug commented Aug 18, 2021

fmassa commented Aug 18, 2021

datumbox commented Aug 18, 2021

oke-aditya commented Aug 18, 2021 • edited Loading

0x00b1 commented Aug 18, 2021 • edited Loading

RylanSchaeffer commented Aug 28, 2021

addisonklinke commented Jan 25, 2022

syed-javed commented Jan 27, 2022

addisonklinke commented Jan 27, 2022 • edited Loading

Ops to convert `masks` to `boxes` #3960

Ops to convert `masks` to `boxes` #3960

oke-aditya commented Jun 3, 2021 •

edited

Loading

datumbox commented Jun 4, 2021 •

edited

Loading

oke-aditya commented Aug 18, 2021 •

edited

Loading

0x00b1 commented Aug 18, 2021 •

edited

Loading

addisonklinke commented Jan 27, 2022 •

edited

Loading