We release the code and trained models of our paper Gate-Shift-Fuse for Video Action Recognition. If you find our work useful for your research, please cite
@article{gsf,
title={{Gate-Shift-Fuse for Video Action Recognition}},
author={Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald},
journal={{IEEE Transactions on Pattern Analysis and Machine Intelligence}},
year={2023}
doi={10.1109/TPAMI.2023.3268134}
}
- Python 3.5
- PyTorch 1.7+
Please follow the instructions in GSM repo for data preparation.
python main.py --dataset something-v1 --split val --arch bninception --num_segments 8 --consensus_type avg \
--batch-size 32 --iter_size 1 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 60 \
--eval-freq 5 --gd 20 -j 16 \
--with_amp --gsf --gsf_ch_ratio 100
python test_models.py something-v1 CHECKPOINT_FILE \
--arch bninception --crop_fusion_type avg --test_segments 8 \
--input_size 0 --test_crops 1 --num_clips 1 \
--with_amp -j 8 --save_scores --gsf --gsf_ch_ratio 100
To evaluate using 2 clips and 3 crops, change --test_crops 1
to --test_crops 3
and --num_clips 1
to --num_clips 2
.
Backbone | No. of frames | SS-v1 Top-1 Accuracy (%) |
---|---|---|
BNInception | 16 | 50.63 |
InceptionV3 | 16 | 53.13 |
ResNet50 | 16 | 51.54 |
All pretrained weights can be downloaded from here.
This implementation is built upon the TRN-pytorch codebase which is based on TSN-pytorch. We thank Yuanjun Xiong and Bolei Zhou for releasing TSN-pytorch and TRN-pytorch repos.