-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] [discussion] mask utils in core #4415
Comments
Can you elaborate a bit, I'm not very experienced to understand the above fully. 😃 |
In semantic/instance segmentation context the segmentations are usually represented using some sort of masks:
Some representations are used for the ground truth in datasets and optimized for efficient storage, others are more convenient for learning targets, or for manipulating the masks in the code From these representations it is often needed to:
|
Just like bounding boxes, I think there are multiple formats for segmentation masks as you mentioned.
Unlike boxes, we cannot interchangeably convert to each other types. E.g. from binary (boolean) mask it won't be possible to get same RGB Label maps. Although vice versa conversion is feasible.
Just like in boxes, we assume boxes to be of Pascal VOC format Utility are provided to visualize boolean masks. See Also, See the recently added So if there is an utility code that can help in converting different masks to boolean tensors, it would suffice the need. cc @datumbox @NicolasHug as they would know use cases better 😃 |
Honestly, I think the most practical thing would be to have utility functions that allow the maximum flexibility to convert between all these formats: integer label maps, rgb label maps, binary masks (maybe even bit masks), RLE compression (and maybe some other simple compressed representations)
For visualization purposes, it may still make sense to support Boolean -> RGB via letting the user to provide the palette (+ having an utility function to generate palettes from HSL colorwheel), e.g. one can map 0th binary mask to the 0th color from the palette
For high-resolution images with a lot of objects, this can become a bottleneck memory-wise. I guess that's the reason why COCO uses RLE compression. Even if in torchvision this is the case, it is not the case for a lot of legacy and interop formats. I think it is very useful to support functions to convert between all of the formats as much as possible. Even for bounding boxes, there may be different ways of interpreting the boxes: https://ppwwyyxx.com/blog/2021/Where-are-Pixels/, so I think it's useful to have functions for conversion between xyxy to xywh and cxcyhalfwhalfh etc and maybe even accepting some argument specifying the coordinate frame (corners or pixel centers)
Maybe. But even if it pollutes some special |
Can you list out what other mask utilities would be beneficial in torchvision? I see Maybe we can refer to Detectron2 masks? https://github.com/facebookresearch/detectron2/blob/main/detectron2/structures/masks.py |
This already exists in legacy datasets such as Pascal, and I imagine this is the same in many other datasets from that epoch. So this is a very valid format for conversion. Direction integer label maps -> RGB label maps is also well defined even outside of purely visualization context. This conversion is needed to prepare the original "submission" files and use the original evaluation routines. So it may be good to rename this function or have a generic conversion function to redirect to it.
Why not, but it should be super-clear in the docs what coordinate frame is used in the context of the problem explained in https://ppwwyyxx.com/blog/2021/Where-are-Pixels/
I brought this up only as a source of relevant existing places that do a lot of this conversions and may be a source of inspiration of real-world needs. Even if they were ported to transform.py, it would be good to refactor some of them and bring them over to more unified and generic
I think overall it is good, but maybe an alternative could be to also have public "free functions" if the user does not want to use the classes (given that historically in pytorch support for Tensor subclasses/subtypes isn't very developed) |
Great. I agree with you about conversion formats. So can if I understand correctly. Or there are any other such free functions which are beneficial ? cc @datumbox as he would understand the ideas better. |
One other util from detectron2 - paste_masks_in_images |
This is already present in torchvision in roi_heads.py https://github.com/pytorch/vision/blob/main/torchvision/models/detection/roi_heads.py#L401 |
If they are equivalent, it would be best if detectron2 migrated to torchvision version for avoiding confusion between their functionality |
It's probably also worth to promote it to a higher-level namespace for more visibility and supportability |
Just checked it again. It seems that batching is not vectorized (though for binary mask format can be vectorized by scatter_reduce amin/amax modes) - but it would be useful, as extracting connected-components/superpixel stats about segments is useful (both from binary masks and from integer masks that have segment index) Most of box ops there unnecessarily do not support multiple batch dimensions. It can mostly be fixed by replacing Also, at ops level, it's super important for docs to explain which format are |
🚀 The feature
masks
toboxes
#3960 - scatter_reduce now supports amin/amax, so can be done in batched regimeMotivation, pitch
In detection/segmentation these utils are very frequent
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: