This repo is the official implementation of "DeMT" as well as the follow-ups. It currently includes code and models for the following tasks:
02/10/2023
-
We will release the code of DeMT at the end of February.
-
Merged Code.
-
Released a series of models. Please look into the data scaling paper for more details.
02/07/2023
News
:
- The Thirty-Seventh Conference on Artificial Intelligence (AAAI2023) will be held in Washington, DC, USA., from February 7-14, 2023.
02/01/2023
- DeMT got accepted by AAAI 2023.
DeMT (the name DeMT
stands for Deformable Mixer Transformer for Multi-Task Learning of Dense
Prediction) is initially described in arxiv, which is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the
channel-aware mixing operator leveraged to allow communication among different channels (i.e., efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions.
DeMT achieves strong performance on PASCAL-Context (75.33 mIoU semantic segmentation
and 63.11 mIoU Human Segmentation
on test) and
and NYUD-v2 semantic segmentation (54.34 mIoU
on test), surpassing previous models by a large margin.
DeMT on NYUD-v2 dataset
model | backbone | #params | FLOPs | SemSeg | Depth | Noemal | Boundary | model checkpopint | log |
---|---|---|---|---|---|---|---|---|---|
DeMT | HRNet-18 | 4.76M | 22.07G | 39.18 | 0.5922 | 20.21 | 76.4 | Google Drive | log |
DeMT | Swin-T | 32.07M | 100.70G | 46.36 | 0.5871 | 20.60 | 76.9 | Google Drive | log |
DeMT(xd=2) | Swin-T | 36.6M | - | 47.45 | 0.5563 | 19.90 | 77.0 | Google Drive | log |
DeMT | Swin-S | 53.03M | 121.05G | 51.50 | 0.5474 | 20.02 | 78.1 | Google Drive | log |
DeMT | Swin-B | 90.9M | 153.65G | 54.34 | 0.5209 | 19.21 | 78.5 | Google Drive | log |
DeMT | Swin-L | 201.64M | -G | 56.94 | 0.5007 | 19.14 | 78.8 | Google Drive | log |
DeMT on PASCAL-Contex dataset
model | backbone | SemSeg | PartSeg | Sal | Normal | Boundary |
---|---|---|---|---|---|---|
DeMT | HRNet-18 | 59.23 | 57.93 | 83.93 | 14.02 | 69.80 |
DeMT | Swin-T | 69.71 | 57.18 | 82.63 | 14.56 | 71.20 |
DeMT | Swin-S | 72.01 | 58.96 | 83.20 | 14.57 | 72.10 |
DeMT | Swin-B | 75.33 | 63.11 | 83.42 | 14.54 | 73.20 |
@inproceedings{xyy2023DeMT,
title={DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction},
author={Xu, Yangyang and Yang, Yibo and Zhang, Lefei },
booktitle={Proceedings of the The Thirty-Seventh Conference on Artificial Intelligence (AAAI)},
year={2023}
}
Install
conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit=10.1 -c pytorch
conda install pytorch-lightning==1.1.8 -c conda-forge
conda install opencv==4.4.0 -c conda-forge
conda install scikit-image==0.17.2
Data Prepare
wget https://data.vision.ee.ethz.ch/brdavid/atrc/NYUDv2.tar.gz
wget https://data.vision.ee.ethz.ch/brdavid/atrc/PASCALContext.tar.gz
tar xfvz ./NYUDv2.tar.gz
tar xfvz ./PASCALContext.tar.gz
Train
To train DeMT model:
python ./src/main.py --cfg ./config/t-nyud/swin/siwn_t_DeMT.yaml --datamodule.data_dir $DATA_DIR --trainer.gpus 8
Evaluation
- When the training is finished, the boundary predictions are saved in the following directory: ./logger/NYUD_xxx/version_x/edge_preds/ .
- The evaluation of boundary detection use the MATLAB-based SEISM repository to obtain the optimal-dataset-scale-F-measure (odsF) scores.