This is the official implementation of our NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition, by Hao Liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang and Bo Ren.
03/02/2022: NomMer got accepted by CVPR 2022.
We propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer.
Image Classification on ImageNet-1K
Model | Pretrain | Resolution | acc@1 | #params | FLOPs |
---|---|---|---|---|---|
NomMer-T | IN-1K | 224 | 82.6 | 22M | 5.4G |
NomMer-S | IN-1K | 224 | 83.7 | 42M | 10.1G |
NomMer-B | IN-1K | 224 | 84.5 | 73M | 17.6G |
To evaluate a pre-trained NomMer
on ImageNet val, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py --eval \
--cfg <config-file> --batch-size <batch-size-per-gpu> --resume <checkpoint> --data-path <imagenet-path>
To train a NomMer
on ImageNet from scratch, run:
python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345 main.py \
--cfg <config-file> --data-path <imagenet-path> --batch-size <batch-size-per-gpu> --output <output-directory>
If you find NomMer useful in your research, please consider citing:
@InProceedings{Liu_2022_CVPR,
author = {Liu, Hao and Jiang, Xinghua and Li, Xin and Bao, Zhimin and Jiang, Deqiang and Ren, Bo},
title = {NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {12073-12082}
}
Our codebase is built based on Swin-Transformer. We thank the authors for the nicely organized code!