Skip to content
/ STTS Public

Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.

License

Notifications You must be signed in to change notification settings

wdrink/STTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official PyTorch implementation of STTS, from the following paper:

Efficient Video Transformers with Spatial-Temporal Token Selection, ECCV 2022.

Junke Wang*,Xitong Yang*, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang.

Fudan University, University of Maryland, BirenTech Research


We present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples.

Model Zoo

MViT with STTS on Kinetics-400

name acc@1 FLOPs model
MViT-T00.9-S40.9 78.1 56.4 model
MViT-T00.8-S40.9 77.9 47.2 model
MViT-T00.6-S40.9 77.5 38.1 model
MViT-T00.5-S40.7 76.6 23.3 model
MViT-T00.4-S40.6 75.6 12.1 model

VideoSwin with STTS on Kinetics-400

name acc@1 FLOPs model
VideoSwin-T00.9 81.9 252.5 model
VideoSwin-T00.8 81.6 223.4 model
VideoSwin-T00.6 81.4 181.4 model
VideoSwin-T00.5 81.1 121.6 model
VideoSwin-T00.4 80.7 91.4 model

Installation

Please check MViT and VideoSwin for installation instructions and data preparation.

Training and Evaluation

MViT

For both training and evaluation with MViT as backbone, you could use:

cd MViT

python tools/run_net.py --cfg path_to_your_config

For example, to evaluate MViT-T00.6-S40.9, run:

python tools/run_net.py --cfg configs/Kinetics/t0_0.6_s4_0.9.yaml

VideoSwin

For training, you could use:

cd VideoSwin

bash tools/dist_train.sh path_to_your_config $NUM_GPUS --checkpoint path_to_your_checkpoint --validate --test-last

while for evaluation, you could use:

bash tools/dist_test.sh path_to_your_config path_to_your_checkpoint $NUM_GPUS --eval top_k_accuracy

For example, to evaluate VideoSwin-T00.9 on a single node with 8 gpus, run:

cd VideoSwin

bash tools/dist_test.sh configs/Kinetics/t0_0.875.py ./checkpoints/t0_0.875.pth 8 --eval top_k_accuracy

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{wang2021efficient,
  title={Efficient video transformers with spatial-temporal token selection},
  author={Wang, Junke and Yang, Xitong and Li, Hengduo and Li, Liu and Wu, Zuxuan and Jiang, Yu-Gang},
  booktitle={ECCV},
  year={2022}
}

About

Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published