Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Segmenter #955

Merged
merged 72 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
eaef5e8
segmenter: add model
rstrudel Oct 6, 2021
cc8e3f4
update
rstrudel Oct 6, 2021
d609009
readme: update
rstrudel Oct 6, 2021
785a9cb
config: update
rstrudel Oct 6, 2021
e0c0766
segmenter: update readme
rstrudel Oct 6, 2021
6bcec05
segmenter: update
rstrudel Oct 6, 2021
5342698
segmenter: update
rstrudel Oct 6, 2021
5570f10
segmenter: update
rstrudel Oct 6, 2021
48cb36b
configs: set checkpoint path to pretrain folder
rstrudel Oct 11, 2021
b93ad3a
segmenter: modify vit-s/lin, remove data config
rstrudel Oct 11, 2021
e8366e3
rreadme: update
rstrudel Oct 11, 2021
58f7bec
configs: transfer from _base_ to segmenter
rstrudel Oct 12, 2021
9deb2dc
configs: add 8x1 suffix
rstrudel Oct 12, 2021
6f5ecd5
configs: remove redundant lines
rstrudel Oct 12, 2021
4f5762f
configs: cleanup
rstrudel Oct 12, 2021
f5cb915
Merge branch 'open-mmlab:master' into dev_mmseg
rstrudel Nov 15, 2021
daf801e
Merge branch 'dev_mmseg' of github.com:rstrudel/mmsegmentation into s…
MengzhangLI Nov 25, 2021
28a29ce
first attempt
MengzhangLI Nov 25, 2021
57c294f
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 25, 2021
a9b805f
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 26, 2021
f7dcc4d
swipe CI error
MengzhangLI Nov 26, 2021
152ad13
Update mmseg/models/decode_heads/__init__.py
rstrudel Oct 11, 2021
646875f
segmenter_linear: use fcn backbone
rstrudel Nov 26, 2021
f08215e
segmenter_mask: update
rstrudel Nov 26, 2021
b654ffa
models: add segmenter vit
rstrudel Nov 26, 2021
91d1e0b
decoders: yapf+remove unused imports
rstrudel Nov 26, 2021
18d32ad
apply precommit
rstrudel Nov 26, 2021
1c26180
segmenter/linear_head: fix
rstrudel Nov 26, 2021
2bf5667
segmenter/linear_header: fix
rstrudel Nov 26, 2021
2438a9c
segmenter: fix mask transformer
rstrudel Nov 27, 2021
72d79f1
fix error
MengzhangLI Nov 27, 2021
2d0256d
segmenter/mask_head: use trunc_normal init
rstrudel Nov 27, 2021
f6c79f4
refactor segmenter head
MengzhangLI Dec 13, 2021
a501792
refactor segmenter head
MengzhangLI Dec 13, 2021
717781c
refactor segmenter head
MengzhangLI Dec 13, 2021
bd1ff37
Fetch upstream (#1)
rstrudel Dec 15, 2021
d425e02
resolove conflict
MengzhangLI Dec 16, 2021
f1ee73b
decode_head: switch from linear to fcn
rstrudel Dec 16, 2021
60f3694
fix init list formatting
rstrudel Dec 16, 2021
a9589fc
configs: remove variants, keep only vit-s on ade
rstrudel Dec 16, 2021
323178c
align inference metric of vit-s-mask
MengzhangLI Dec 20, 2021
b0a6920
configs: add vit t/b/l
rstrudel Dec 20, 2021
9ba1cdb
Update mmseg/models/decode_heads/segmenter_mask_head.py
rstrudel Dec 21, 2021
cb3b585
Update mmseg/models/decode_heads/segmenter_mask_head.py
rstrudel Dec 21, 2021
a443102
Update mmseg/models/decode_heads/segmenter_mask_head.py
rstrudel Dec 21, 2021
e43249d
Update mmseg/models/decode_heads/segmenter_mask_head.py
rstrudel Dec 21, 2021
3eec55c
Update mmseg/models/decode_heads/segmenter_mask_head.py
rstrudel Dec 21, 2021
1799bc3
model_converters: use torch instead of einops
rstrudel Dec 21, 2021
bf4f031
setup: remove einops
rstrudel Dec 21, 2021
83e3fce
segmenter_mask: fix missing imports
rstrudel Dec 21, 2021
da6fe6c
add necessary imported init funtion
MengzhangLI Dec 21, 2021
0ab294f
Merge branch 'dev_mmseg' of github.com:rstrudel/mmsegmentation into d…
MengzhangLI Dec 21, 2021
b55abe5
segmenter/seg-l: set resolution to 640
rstrudel Dec 29, 2021
77bd70e
segmenter/seg-l: fix test size
rstrudel Dec 29, 2021
d6166f9
fix vitjax2mmseg
MengzhangLI Jan 5, 2022
985d2cd
add README and unittest
MengzhangLI Jan 7, 2022
63b13cd
fix unittest
MengzhangLI Jan 7, 2022
bc6d5dd
add docstring
MengzhangLI Jan 11, 2022
a7a32ba
fix conflict error
MengzhangLI Jan 11, 2022
1e82fe0
refactor config and add pretrained link
MengzhangLI Jan 11, 2022
f9d7a8f
fix typo
MengzhangLI Jan 11, 2022
9be28df
add paper name in readme
MengzhangLI Jan 12, 2022
2f7b2fc
change segmenter config names
MengzhangLI Jan 15, 2022
d00afba
fix typo in readme
MengzhangLI Jan 17, 2022
c744a7f
fix typos in readme
MengzhangLI Jan 22, 2022
46d1970
fix segmenter typo
MengzhangLI Jan 24, 2022
cecbe20
fix segmenter typo
MengzhangLI Jan 24, 2022
1704a42
delete redundant comma in config files
MengzhangLI Jan 25, 2022
5b8decb
delete redundant comma in config files
MengzhangLI Jan 25, 2022
b2ee262
fix convert script
MengzhangLI Jan 26, 2022
59e7ac5
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Jan 26, 2022
b6e6af3
update lateset master version
MengzhangLI Jan 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ Supported methods:
- [x] [STDC (CVPR'2021)](configs/stdc)
- [x] [SETR (CVPR'2021)](configs/setr)
- [x] [DPT (ArXiv'2021)](configs/dpt)
- [x] [Segmenter (ICCV'2021)](configs/segmenter)
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)

Supported datasets:
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
- [x] [STDC (CVPR'2021)](configs/stdc)
- [x] [SETR (CVPR'2021)](configs/setr)
- [x] [DPT (ArXiv'2021)](configs/dpt)
- [x] [Segmenter (ICCV'2021)](configs/segmenter)
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)

已支持的数据集:
Expand Down
35 changes: 35 additions & 0 deletions configs/_base_/models/segmenter_vit-b16_mask.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# model settings
backbone_norm_cfg = dict(type='LN', eps=1e-6, requires_grad=True)
model = dict(
type='EncoderDecoder',
pretrained='pretrain/vit_base_p16_384.pth',
backbone=dict(
type='VisionTransformer',
img_size=(512, 512),
patch_size=16,
in_channels=3,
embed_dims=768,
num_layers=12,
num_heads=12,
drop_path_rate=0.1,
attn_drop_rate=0.0,
drop_rate=0.0,
final_norm=True,
norm_cfg=backbone_norm_cfg,
with_cls_token=True,
interpolate_mode='bicubic',
),
decode_head=dict(
type='SegmenterMaskTransformerHead',
in_channels=768,
channels=768,
num_classes=150,
num_layers=2,
num_heads=12,
embed_dims=768,
dropout_ratio=0.0,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
),
test_cfg=dict(mode='slide', crop_size=(512, 512), stride=(480, 480)),
)
73 changes: 73 additions & 0 deletions configs/segmenter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Segmenter

[Segmenter: Transformer for Semantic Segmentation](https://arxiv.org/abs/2105.05633)

MeowZheng marked this conversation as resolved.
Show resolved Hide resolved
## Introduction

<!-- [ALGORITHM] -->

<a href="https://github.com/rstrudel/segmenter">Official Repo</a>

<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.21.0/mmseg/models/decode_heads/segmenter_mask_head.py#L15">Code Snippet</a>

## Abstract

<!-- [ABSTRACT] -->

Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convolution-based methods, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/24582831/148507554-87eb80bd-02c7-4c31-b102-c6141e231ec8.png" width="70%"/>
</div>

```bibtex
@article{strudel2021Segmenter,
title={Segmenter: Transformer for Semantic Segmentation},
author={Strudel, Robin and Ricardo, Garcia, and Laptev, Ivan and Schmid, Cordelia},
journal={arXiv preprint arXiv:2105.05633},
year={2021}
}
```


## Usage

To use the pre-trained ViT model from [Segmenter](https://github.com/rstrudel/segmenter), it is necessary to convert keys.

We provide a script [`vitjax2mmseg.py`](../../tools/model_converters/vitjax2mmseg.py) in the tools directory to convert the key of models from [ViT-AugReg](https://github.com/rwightman/pytorch-image-models/blob/f55c22bebf9d8afc449d317a723231ef72e0d662/timm/models/vision_transformer.py#L54-L106) to MMSegmentation style.

```shell
python tools/model_converters/vitjax2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}
```

E.g.

```shell
python tools/model_converters/vitjax2mmseg.py \
Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz \
pretrain/vit_tiny_p16_384.pth
```

This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.

In our default setting, pretrained models and their corresponding [ViT-AugReg](https://github.com/rwightman/pytorch-image-models/blob/f55c22bebf9d8afc449d317a723231ef72e0d662/timm/models/vision_transformer.py#L54-L106) models could be defined below:

| pretrained models | original models |
| ------ | -------- |
|vit_tiny_p16_384.pth | ['vit_tiny_patch16_384'](https://storage.googleapis.com/vit_models/augreg/Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz) |
|vit_small_p16_384.pth | ['vit_small_patch16_384'](https://storage.googleapis.com/vit_models/augreg/S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz) |
|vit_base_p16_384.pth | ['vit_base_patch16_384'](https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz) |
|vit_large_p16_384.pth | ['vit_large_patch16_384'](https://storage.googleapis.com/vit_models/augreg/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz) |

## Results and models

### ADE20K

| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
| ------ | -------- | --------- | ---------- | ------- | -------- | --- | --- | -------------- | ----- |
| Segmenter-Mask | ViT-T_16 | 512x512 | 160000 | 1.21 | 27.98 | 39.99 | 40.83 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706-ffcf7509.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
| Segmenter-Linear | ViT-S_16 | 512x512 | 160000 | 1.78 | 28.07 | 45.75 | 46.82 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713-39658c46.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713.log.json) |
| Segmenter-Mask | ViT-S_16 | 512x512 | 160000 | 2.03 | 24.80 | 46.19 | 47.85 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706-511bb103.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
| Segmenter-Mask | ViT-B_16 |512x512 | 160000 | 4.20 | 13.20 | 49.60 | 51.07 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706-bc533b08.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706.log.json) |
| Segmenter-Mask | ViT-L_16 |640x640 | 160000 | 16.56 | 2.62 | 52.16 | 53.65 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750-7ef345be.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750.log.json) |
125 changes: 125 additions & 0 deletions configs/segmenter/segmenter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
Collections:
- Name: segmenter
Metadata:
Training Data:
- ADE20K
Paper:
URL: https://arxiv.org/abs/2105.05633
Title: 'Segmenter: Transformer for Semantic Segmentation'
README: configs/segmenter/README.md
Code:
URL: https://github.com/open-mmlab/mmsegmentation/blob/v0.21.0/mmseg/models/decode_heads/segmenter_mask_head.py#L15
Version: v0.21.0
Converted From:
Code: https://github.com/rstrudel/segmenter
Models:
- Name: segmenter_vit-t_mask_8x1_512x512_160k_ade20k
In Collection: segmenter
Metadata:
backbone: ViT-T_16
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 35.74
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 1.21
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 39.99
mIoU(ms+flip): 40.83
Config: configs/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-t_mask_8x1_512x512_160k_ade20k/segmenter_vit-t_mask_8x1_512x512_160k_ade20k_20220105_151706-ffcf7509.pth
- Name: segmenter_vit-s_linear_8x1_512x512_160k_ade20k
In Collection: segmenter
Metadata:
backbone: ViT-S_16
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 35.63
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 1.78
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 45.75
mIoU(ms+flip): 46.82
Config: configs/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_linear_8x1_512x512_160k_ade20k/segmenter_vit-s_linear_8x1_512x512_160k_ade20k_20220105_151713-39658c46.pth
- Name: segmenter_vit-s_mask_8x1_512x512_160k_ade20k
In Collection: segmenter
Metadata:
backbone: ViT-S_16
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 40.32
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 2.03
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 46.19
mIoU(ms+flip): 47.85
Config: configs/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-s_mask_8x1_512x512_160k_ade20k/segmenter_vit-s_mask_8x1_512x512_160k_ade20k_20220105_151706-511bb103.pth
- Name: segmenter_vit-b_mask_8x1_512x512_160k_ade20k
In Collection: segmenter
Metadata:
backbone: ViT-B_16
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 75.76
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 4.2
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 49.6
mIoU(ms+flip): 51.07
Config: configs/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k/segmenter_vit-b_mask_8x1_512x512_160k_ade20k_20220105_151706-bc533b08.pth
- Name: segmenter_vit-l_mask_8x1_512x512_160k_ade20k
In Collection: segmenter
Metadata:
backbone: ViT-L_16
crop size: (640,640)
lr schd: 160000
inference time (ms/im):
- value: 381.68
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (640,640)
Training Memory (GB): 16.56
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 52.16
mIoU(ms+flip): 53.65
Config: configs/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k/segmenter_vit-l_mask_8x1_512x512_160k_ade20k_20220105_162750-7ef345be.pth
43 changes: 43 additions & 0 deletions configs/segmenter/segmenter_vit-b_mask_8x1_512x512_160k_ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
_base_ = [
'../_base_/models/segmenter_vit-b16_mask.py',
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
'../_base_/schedules/schedule_160k.py'
]
optimizer = dict(lr=0.001, weight_decay=0.0)

img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(2048, 512),
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
# num_gpus: 8 -> batch_size: 8
samples_per_gpu=1,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
60 changes: 60 additions & 0 deletions configs/segmenter/segmenter_vit-l_mask_8x1_512x512_160k_ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
_base_ = [
'../_base_/models/segmenter_vit-b16_mask.py',
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
'../_base_/schedules/schedule_160k.py'
]

model = dict(
pretrained='pretrain/vit_large_p16_384.pth',
backbone=dict(
type='VisionTransformer',
img_size=(640, 640),
embed_dims=1024,
num_layers=24,
num_heads=16),
decode_head=dict(
type='SegmenterMaskTransformerHead',
in_channels=1024,
channels=1024,
num_heads=16,
embed_dims=1024),
test_cfg=dict(mode='slide', crop_size=(640, 640), stride=(608, 608)))

optimizer = dict(lr=0.001, weight_decay=0.0)

img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
crop_size = (640, 640)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='Resize', img_scale=(2048, 640), ratio_range=(0.5, 2.0)),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(2048, 640),
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
# num_gpus: 8 -> batch_size: 8
samples_per_gpu=1,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
_base_ = './segmenter_vit-s_mask_8x1_512x512_160k_ade20k.py'

model = dict(
decode_head=dict(
_delete_=True,
type='FCNHead',
in_channels=384,
channels=384,
num_convs=0,
dropout_ratio=0.0,
concat_input=False,
num_classes=150,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)))
Loading