Skip to content

Latest commit

 

History

History
28 lines (26 loc) · 4.2 KB

video-transformer.md

File metadata and controls

28 lines (26 loc) · 4.2 KB

Video Transformer

No. Model Name Title Links Pub. Organization Release Time
1 TimeSformer Is Space-Time Attention All You Need for Video Understanding? paper code arXiv Facebook AI 24 Feb 2021
2 Video Transformer Video Transformer Network paper arXiv Theator 1 Feb 2021
3 ViViT ViViT: A Video Vision Transformer paper arXiv Google AI 29 Mar 2021
4 VideoGPT VideoGPT: Video Generation using VQ-VAE and Transformers paper code arXiv UC Berkeley 20 Apr 2021
5 VIMPAC VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning paper code arXiv UNC 21 June 2021
6 - Self-supervised Video Representation Learning by Context and Motion Decoupling paper CVPR 2021 Alibaba 2 April 2021
7 VideoLightFormer VideoLightFormer: Lightweight Action Recognition using Transformers paper arXiv the university of shefield 1 Jul 2021
8 Video Swin Transformer Video Swin Transformer paper code arXiv MSRA 24 Jun 2021
9 ST Swin Long-Short Temporal Contrastive Learning of Video Transformers paper arXiv Facebook AI 17 Jun 2021
10 X-ViT Space-time Mixing Attention for Video Transformer paper arXiv Samsung AI Cambridge 11 Jun 2021
11 OCVT Generative Video Transformer: Can Objects be the Words? paper ICML 2021 Rutgers University 20 Jul 2021
12 - An Image is Worth 16x16 Words, What is a Video Worth? paper code arXiv Alibaba 27 May 2021
13 SCT Shifted Chunk Transformer for Spatio-Temporal Representational Learning paper arXiv Kuaishou Technology 26 Aug 2021
14 - Evaluating Transformers for Lightweight Action Recognition paper arXiv University of Sheffield 18 Nov 2021
15 DualFormer DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition paper arXiv Sea AI Lab 9 Dec 2021
16 BEVT BEVT: BERT Pretraining of Video Transformers paper arXiv Shanghai Key Lab of Intelligent Information Processing 2 Dec 2021
17 - Efficient Video Transformers with Spatial-Temporal Token Selection paper arXiv Shanghai Key Lab of Intelligent Information Processing 23 Nov 2021
18 - Lite Vision Transformer with Enhanced Self-Attention paper code arXiv Johns Hopkins University 20 Dec 2021
19 MViT Multiscale Vision Transformers paper code ICCV 2021 Facebook 22 Apr 2021
20 Uniformer Uniformer: Unified Transformer For Efficient Spatiotemporal Representation Learning paper code arXiv Chinese Academy of Sciences 12 Jan 2022
21 MaskFeat Masked Feature Prediction for Self-Supervised Visual Pre-Training paper arXiv Facebook AI 16 Dec 2021
22 MTV Multiview Transformers for Video Recognition paper arXiv Google 20 Jan 2022
23 MeMViT MeMViT : Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition paper arXiv Facebook AI Research 20 Jan 2022