pytorch · jdsgomes · Aug 10, 2022 · Jul 7, 2022 · Jul 7, 2022 · Jul 7, 2022
diff --git a/docs/source/models/swin_transformer.rst b/docs/source/models/swin_transformer.rst
@@ -3,16 +3,18 @@ SwinTransformer
 
 .. currentmodule:: torchvision.models
 
-The SwinTransformer model is based on the `Swin Transformer: Hierarchical Vision 
+The SwinTransformer models are based on the `Swin Transformer: Hierarchical Vision
 Transformer using Shifted Windows <https://arxiv.org/abs/2103.14030>`__
 paper.
+SwinTransformer V2 models are based on the `Swin Transformer V2: Scaling Up Capacity
+and Resolution <https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.pdf>`__
+paper.
 
 
 Model builders
 --------------
 
-The following model builders can be used to instantiate an SwinTransformer model. 
-`swin_t` can be instantiated with pre-trained weights and all others without. 
+The following model builders can be used to instantiate an SwinTransformer model (original and V2) with and without pre-trained weights.
 All the model builders internally rely on the ``torchvision.models.swin_transformer.SwinTransformer`` 
 base class. Please refer to the `source code
 <https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py>`_ for
@@ -25,3 +27,6 @@ more details about this class.
     swin_t
     swin_s
     swin_b
+    swin_v2_t
+    swin_v2_s
+    swin_v2_b
diff --git a/references/classification/README.md b/references/classification/README.md
@@ -236,6 +236,17 @@ Note that `--val-resize-size` was optimized in a post-training step, see their `
 
 
 
+### SwinTransformer V2
+```
+torchrun --nproc_per_node=8 train.py\
+--model $MODEL --epochs 300 --batch-size 128 --opt adamw --lr 0.001 --weight-decay 0.05 --norm-weight-decay 0.0  --bias-weight-decay 0.0 --transformer-embedding-decay 0.0 --lr-scheduler cosineannealinglr --lr-min 0.00001 --lr-warmup-method linear  --lr-warmup-epochs 20 --lr-warmup-decay 0.01 --amp --label-smoothing 0.1 --mixup-alpha 0.8 --clip-grad-norm 5.0 --cutmix-alpha 1.0 --random-erase 0.25 --interpolation bicubic --auto-augment ta_wide --model-ema --ra-sampler --ra-reps 4  --val-resize-size 256 --val-crop-size 256 --train-crop-size 256 
+```
+Here `$MODEL` is one of `swin_v2_t`, `swin_v2_s` or `swin_v2_b`.
+Note that `--val-resize-size` was optimized in a post-training step, see their `Weights` entry for the exact value.
+
+
+
+
 ### ShuffleNet V2
 ```
 torchrun --nproc_per_node=8 train.py \

diff --git a/test/expect/ModelTester.test_swin_v2_b_expect.pkl b/test/expect/ModelTester.test_swin_v2_b_expect.pkl
diff --git a/test/expect/ModelTester.test_swin_v2_s_expect.pkl b/test/expect/ModelTester.test_swin_v2_s_expect.pkl
diff --git a/test/expect/ModelTester.test_swin_v2_t_expect.pkl b/test/expect/ModelTester.test_swin_v2_t_expect.pkl
diff --git a/test/test_models.py b/test/test_models.py
@@ -332,6 +332,9 @@ def _check_input_backprop(model, inputs):
     "swin_t",
     "swin_s",
     "swin_b",
+    "swin_v2_t",
+    "swin_v2_s",
+    "swin_v2_b",
 ]
 for m in slow_models:
     _model_params[m] = {"input_shape": (1, 3, 64, 64)}