[Enhance] Mask2Former Instance Segm Only #7571

PeterVennerstrom · 2022-03-29T18:09:53Z

Motivation

MaskFormer and Mask2Former currently support panoptic segmentation, but not instance segmentation only. Minor changes enable training and evaluation using base/datasets/coco_instance.py.

Modification

detectors/maskformer.py:

returns list of tuples typical of instance segm only methods when num_stuff_classes == 0
renames pan specific show_result method to _show_pan_result and overrides base class show_result() only when num_stuff_classes > 0
gt_semantic_seg argument in forward_train() is made optional and defaults to None

dense_heads/maskformer_head.py:

img_metas added to preprocess_gt() arguments
preprocess_gt() adjusted to expand gt_semantic_seg to list of None(s) when None to work with multi_apply()

models/utils/panoptic_gt_processing.py:

accepts additional img_metas argument
pads gt_masks with pad shape taken from img_metas rather than gt_semantic_seg
returns early when gt_semantic_seg == None

configs/mask2former/:

two coco_instance.py based configs included for mask2former

codecov · 2022-03-29T20:31:29Z

Codecov Report

Merging #7571 (5e653f7) into dev (c32894e) will increase coverage by 0.34%.
The diff coverage is 95.45%.

❗ Current head 5e653f7 differs from pull request most recent head 6ab15c2. Consider uploading reports for the commit 6ab15c2 to get more accurate results

@@            Coverage Diff             @@
##              dev    #7571      +/-   ##
==========================================
+ Coverage   64.51%   64.85%   +0.34%     
==========================================
  Files         360      351       -9     
  Lines       29233    28491     -742     
  Branches     4954     4817     -137     
==========================================
- Hits        18859    18478     -381     
+ Misses       9370     9038     -332     
+ Partials     1004      975      -29

Flag	Coverage Δ
unittests	`64.83% <95.45%> (+0.34%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmdet/datasets/pipelines/loading.py	`57.83% <93.10%> (+9.23%)`	⬆️
mmdet/datasets/pipelines/__init__.py	`100.00% <100.00%> (ø)`
mmdet/models/dense_heads/maskformer_head.py	`96.83% <100.00%> (+0.04%)`	⬆️
mmdet/models/detectors/maskformer.py	`71.62% <100.00%> (+3.05%)`	⬆️
mmdet/models/utils/panoptic_gt_processing.py	`100.00% <100.00%> (+8.33%)`	⬆️
mmdet/models/detectors/yolox.py	`28.57% <0.00%> (-51.03%)`	⬇️
mmdet/core/bbox/assigners/sim_ota_assigner.py	`80.00% <0.00%> (-3.64%)`	⬇️
mmdet/models/roi_heads/mask_heads/maskiou_head.py	`87.35% <0.00%> (-2.30%)`	⬇️
mmdet/models/backbones/csp_darknet.py	`98.83% <0.00%> (-1.17%)`	⬇️
mmdet/models/dense_heads/yolox_head.py	`78.86% <0.00%> (-0.52%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c32894e...6ab15c2. Read the comment docs.

mmdet/models/utils/panoptic_gt_processing.py

configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_ins.py

mmdet/models/dense_heads/maskformer_head.py

tests/test_models/test_forward.py

configs/mask2former/mask2former_r50_lsj_8x2_50e_coco_ins.py

mmdet/models/detectors/maskformer.py

tests/test_models/test_dense_heads/test_mask2former_head.py

.pre-commit-config.yaml

configs/mask2former/mask2former_r50_lsj_8x2_50e_coco_ins.py

mmdet/models/detectors/maskformer.py

.pre-commit-config.yaml

mmdet/models/dense_heads/maskformer_head.py

mmdet/models/utils/panoptic_gt_processing.py

mmdet/models/detectors/maskformer.py

tests/test_models/test_forward.py

tests/test_models/test_dense_heads/test_mask2former_head.py

chhluo · 2022-04-01T16:35:13Z

Do you have the computing resource to train mask2former for instance segmentation?

mmdet/models/detectors/maskformer.py

PeterVennerstrom · 2022-04-01T16:47:43Z

Do you have the computing resource to train mask2former for instance segmentation?

Only on a 4x GPU machine with limited memory (batch size 1). A Mask2Former Swin-T model was trained for 50e which achieved:

The data pipeline included extra data augmentation and was missing:
dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)).

The Original Facebook implementation achieved AP 45.0.

chhluo · 2022-04-01T16:53:46Z

okay, I will help to train these two models in the next one or two weeks.

chhluo · 2022-04-02T06:52:31Z

please fix lint problems.

chhluo · 2022-04-10T12:43:13Z

I ran mask2former_r50 for two times, resutls are 43.1 and 43.2 (target 43.7), and ran mask2former_swin_tiny for once, result is 44.7 (target 45.0). I doubt that there may be difference between our code and the original code, such as data loading and processing for training. We should check them.

PeterVennerstrom · 2022-04-10T20:02:21Z

Found a small difference in our filter empty annotations code:

dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2))

The original implementation uses 1e-5 for min height and width. 1e-5 is the default argument in the function from Detection2.

Filter_empty_instances is called by the Mask2Former code here without changing the default arguments.

Sorry I missed that. I'll build the configs with the training commands for both Facebook Mask2Former and this implementation to look for other differences.

PeterVennerstrom · 2022-04-10T21:08:07Z

They left a comment suggesting the filter_empty_instances call happen after augmentation. I think the augmentations can create empty instances, especially considering how extreme the ratio_range=(0.1, 2.0) resize is. Our FilterAnnotations is placed after the resize/random crop like theirs.

Detectron2's filter_empty_instances function also filters empty masks. MMDet's FilterAnnotations does not consider empty masks when filtering.

chhluo · 2022-04-11T13:31:26Z

Found a small difference in our filter empty annotations code:

dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2))

The original implementation uses 1e-5 for min height and width. 1e-5 is the default argument in the function from Detection2.

Filter_empty_instances is called by the Mask2Former code here without changing the default arguments.

Sorry I missed that. I'll build the configs with the training commands for both Facebook Mask2Former and this implementation to look for other differences.

I think 1e-2 and 1e-5 are almost same for filtering bbox in a image of size 300~600.

chhluo · 2022-04-11T13:47:24Z

They left a comment suggesting the filter_empty_instances call happen after augmentation. I think the augmentations can create empty instances, especially considering how extreme the ratio_range=(0.1, 2.0) resize is. Our FilterAnnotations is placed after the resize/random crop like theirs.

Detectron2's filter_empty_instances function also filters empty masks. MMDet's FilterAnnotations does not consider empty masks when filtering.

In mmdet, mask is also filtered out, see

mmdetection/mmdet/datasets/pipelines/loading.py

Line 601 in 3e26931

keys = ('gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg')

. Mask will be filtered out if its bbox with width or height less than 1e-2.

In det2, they filter empty mask which has no true values. A mask has no true value if its bbox with width or height less than 1e-2.

So, I think these two filter ops are almost same.

PeterVennerstrom · 2022-04-11T15:26:34Z

Added a mask area test. Det2 includes instances if they meet either the bounding box size test or mask area test.

I think 1e-2 and 1e-5 are almost same for filtering bbox in a image of size 300~600.

That makes sense. With image_size=(1024, 1024) and ratio_range=(.1, 2) the images can have a side as small as 103 before padding. The Det2 inclusion of masks with positive area is more permissive, but only for instances with a mask with positive area of 1 and a bbox below the 1e-5 threshold.

I am wondering if Det2 included the extra mask test because something unexpected happens with the mask interpolation at small image scales and there are instances which meet the mask threshold, but not the bbox threshold.

chhluo · 2022-04-13T17:42:49Z

mmdet/datasets/pipelines/loading.py

+            keep += (w > self.min_gt_bbox_wh[0]) & (h > self.min_gt_bbox_wh[1])
+        if self.by_mask:
+            gt_masks = results['gt_masks']
+            keep += gt_masks.areas >= self.min_gt_mask_area


After this line https://github.com/facebookresearch/Mask2Former/blob/c233619f7ea011cd565174a1b211bde1f43e38db/mask2former/data/dataset_mappers/coco_instance_new_baseline_dataset_mapper.py#L177 , the empty mask with bbox (0,0,0,0). If there are one true value in mask, the width and height will be one instead of value less than one. So the 1e-2 and 1e-5 have same effect on filtering empty instance. And filtering by bbox, filtering by mask, even by both, these three ways are equivalent. After that line, the width and height of bbox are all intergers, not floats like 10.2.

That makes sense. No possibility of one true value in the mask the box is below the threshold since both implementations recompute boxes using the instance masks.

I ran the lastest version, 43.2 for mask2former_r50.

please add unit test

Is it better to use keep = keep & (gt_masks.areas >= self.min_gt_mask_area) to align with det2?

mask_former_instance_dataset_mapper.py uses 128 as the image pad value.

The Coco instance seg configs point at coco_instance_new_baseline_dataset_mapper.py instead. They create a padding mask, but I need to verify the actual value used to pad the image.

Checked the padding values in the images at the resnet backbone's forward method for each implementation. Just looked for repeated values.

print(img[0, :, 1023, 1023])

Det2: [0.0741, 0.2052, 0.4265]
Mmdet: [0., 0., 0.]

Added a config value to set the pad values to (128, 128, 128) like Det2.

Reversed the Norm > Pad order to Pad > Norm. Getting the same padding values: [0.0741, 0.2052, 0.4265]

mask_former_instance_dataset_mapper.py uses 128 as the image pad value.

The Coco instance seg configs point at coco_instance_new_baseline_dataset_mapper.py instead. They create a padding mask, but I need to verify the actual value used to pad the image.

Mask2Former actually used COCOInstanceNewBaselineDatasetMapper, not MaskFormerInstanceDatasetMapper

Yes, I first noticed the difference looking at MaskFormerInstanceDatasetMapper, but compared the implementation to Mmdet using COCOInstanceNewBaselineDatesetMapper.

There was a difference in padding between our implementation and theirs.

mmdet/datasets/pipelines/loading.py

chhluo · 2022-04-20T12:28:54Z

mmdet/datasets/pipelines/loading.py

+        for key in keys:
+            if key in results:
+                results[key] = results[key][keep]
+        if not tests[0].any():


tests[0].any() -> keep.any()

tests/test_data/test_pipelines/test_loading.py

chhluo · 2022-04-20T12:59:48Z

Please resolve the conflict. And I will run the lastest version in the next one or two days.

chhluo · 2022-04-24T07:59:19Z

one good news and one bad news. mask2former_swin_tiny reach the target (45.0 mask AP), mask2former_r50 got 43.0 mask AP, filaed to reach the target (43.7 mask AP). And I think doing LR decay in a little early maybe helpful. Besides, could you please add other config files like (r101, swin-s), so we can verify other model's performance.

PeterVennerstrom · 2022-04-24T22:12:31Z

Interesting swin-t achieved the target, but resnet50 did a bit worse than previous tests. It appears possible the different scores are due to noise or due to the padding value alignment.

Since resnet50 is not hitting the target I looked at the weights used, architecture and frozen batch normalization settings.

They use Torchvision resnet50 weights, converted to Det2 naming scheme. They link to resnet50-19c8e357.pth and the current Torchvision version installed with Torch 1.10 gives a file called resnet50-0676ba61.pth. Confirmed the keys and weights are identical in both versions. I couldn't find any architecture differences. I'm still looking at how they freeze batch norm, but it appears identical to Mmdet.

I see additional configs for panoptic in the latest dev branch. I'll replicate those for instance seg tomorrow. Training another swin and a resnet might isolate the issue to something resnet related if that is the case.

chhluo · 2022-05-27T06:15:16Z

@PeterVennerstrom unit test failed for mask2former instance segmentstion, because we removed the confile file for instance segmentation, please updated it. And sorry about that.

…_50e_coco-panoptic.py

…_r101_lsj_8x2_50e_coco-panoptic.py

…o.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py

…o mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py

…oco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py

… mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py

… mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py

* Mask2Former/MaskFormer instance only training/eval * obsolete config names * if cond is None fix * white space * fix tests * yapf formatting fix * semantic_seg None docstring * original config names * pan/ins unit test * show_result comment * pan/ins head unit test * redundant test * inherit configs * correct gpu # * revert version * BaseDetector.show_result comment * revert more versions * clarify comment * clarify comment * add FilterAnnotations to data pipeline * more complete Returns docstring * use pytest.mark.parametrize decorator * fix docstring formatting * lint * Include instances passing mask area test * Make FilterAnnotations generic for masks or bboxes * Duplicate assertion * Add pad config * Less hard coded padding setting * Clarify test arguments * Additional inst_seg configs * delete configs * Include original dev branch configs * Fix indent * fix lint error from merge conflict * Update .pre-commit-config.yaml * Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former_r101_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py * Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Create mask2former_r50_lsj_8x2_50e_coco.py * Create mask2former_r101_lsj_8x2_50e_coco.py * Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py * Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py * Update test_forward.py * remove gt_sem_seg Co-authored-by: Cedric Luo <[email protected]>

mm-assistant bot added the size/XS label Mar 29, 2022

mm-assistant bot assigned hhaAndroid Mar 29, 2022

RangiLyu assigned chhluo Mar 30, 2022

RangiLyu requested a review from chhluo March 30, 2022 11:48

chhluo reviewed Mar 31, 2022

View reviewed changes

chhluo changed the title ~~MaskFormer Instance Segm Only~~ [Enhance] MaskFormer Instance Segm Only Mar 31, 2022

chhluo reviewed Mar 31, 2022

View reviewed changes

configs/mask2former/mask2former_r50_lsj_8x2_50e_coco_ins.py Outdated Show resolved Hide resolved

mmdet/models/detectors/maskformer.py Show resolved Hide resolved

tests/test_models/test_dense_heads/test_mask2former_head.py Show resolved Hide resolved