Skip to content

chaoshengt/Transformer-in-Vision

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Transformer-in-Vision

Some recent Transformer-based CV works. Welcome to comment/contribute!

Updating.

Resource

Survery:

  • (arXiv 2020.9) Efficient Transformers: A Survey, PDF

  • (arXiv 2020.1) Transformers in Vision: A Survey, PDF

Recent Papers

  • (ICLR'21) UPDET: UNIVERSAL MULTI-AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS, [Paper], [Code]

  • (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection, [Paper], [Code]

  • (ICLR'21) LAMBDANETWORKS: MODELING LONG-RANGE INTERACTIONS WITHOUT ATTENTION, [Paper], [Code]

  • (ICLR'21) SUPPORT-SET BOTTLENECKS FOR VIDEO-TEXT REPRESENTATION LEARNING, [Paper]

  • (ICLR'21) COLORIZATION TRANSFORMER, [Paper], [Code]

  • (ECCV'20) Multi-modal Transformer for Video Retrieval, [Paper]

  • (ECCV'20) Connecting Vision and Language with Localized Narratives, [Paper]

  • (ECCV'20) DETR: End-to-End Object Detection with Transformers, [Paper], [Code]

  • (CVPR'20) Multi-Modality Cross Attention Network for Image and Sentence Matching, [Paper], [Page]

  • (CVPR'20) Learning Texture Transformer Network for Image Super-Resolution, [Paper], [Code]

  • (CVPR'20) Speech2Action: Cross-modal Supervision for Action Recognition, [Paper]

  • (ICPR'20) Transformer Encoder Reasoning Network, [Paper], [Code]

  • (EMNLP'19) Effective Use of Transformer Networks for Entity Tracking, [Paper], [Code]

  • (arXiv 2021.02) TransGAN: Two Transformers Can Make One Strong GAN, [Paper], [Code]

  • (arXiv 2021.02) END-TO-END AUDIO-VISUAL SPEECH RECOGNITION WITH CONFORMERS, [Paper]

  • (arXiv 2021.02) Is Space-Time Attention All You Need for Video Understanding? [Paper]

  • (arXiv 2021.02) Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling, [Paper], [Code]

  • (arXiv 2021.02) Video Transformer Network, [Paper]

  • (arXiv 2021.02) Training Vision Transformers for Image Retrieval, [Paper]

  • (arXiv 2021.02) Relaxed Transformer Decoders for Direct Action Proposal Generation, [Paper], [Code]

  • (arXiv 2021.02) TransReID: Transformer-based Object Re-Identification, [Paper]

  • (arXiv 2021.02) Improving Visual Reasoning by Exploiting The Knowledge in Texts, [Paper]

  • (arXiv 2021.01) Fast Convergence of DETR with Spatially Modulated Co-Attention, [Paper]

  • (arXiv 2021.01) Dual-Level Collaborative Transformer for Image Captioning, [Paper]

  • (arXiv 2021.01) SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation (arXiv 2021.1), [Paper]

  • (arXiv 2021.01) CPTR: FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING, [Paper]

  • (arXiv 2021.01) Trans2Seg: Transparent Object Segmentation with Transformer, [Paper], [Code]

  • (arXiv 2021.01) Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network, [Paper], [Code]

  • (arXiv 2021.01) Trear: Transformer-based RGB-D Egocentric Action Recognition, [Paper]

  • (arXiv 2021.01) Learn to Dance with AIST++: Music Conditioned 3D Dance Generation, [Paper], [Page]

  • (arXiv 2021.01) Spherical Transformer: Adapting Spherical Signal to CNNs, [Paper]

  • (arXiv 2021.01) Are We There Yet? Learning to Localize in Embodied Instruction Following, [Paper]

  • (arXiv 2021.01) VinVL: Making Visual Representations Matter in Vision-Language Models, [Paper]

  • (arXiv 2021.01) Bottleneck Transformers for Visual Recognition, [Paper]

  • (arXiv 2021.01) Investigating the Vision Transformer Model for Image Retrieval Tasks, [Paper]

  • (arXiv 2021.01) ADDRESSING SOME LIMITATIONS OF TRANSFORMERS WITH FEEDBACK MEMORY, [Paper]

  • (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, [Paper], [Code]

  • (arXiv 2021.01) TrackFormer: Multi-Object Tracking with Transformers, [Paper]

  • (arXiv 2021.01) VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search, [Paper]

  • (arXiv 2021.01) Line Segment Detection Using Transformers without Edges, [Paper]

  • (arXiv 2021.01) Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers, [Paper]

  • (arXiv 2020.12) Accurate Word Representations with Universal Visual Guidance, [Paper]

  • (arXiv 2020.12) DETR for Pedestrian Detection, [Paper]

  • (arXiv 2020.12) Transformer Interpretability Beyond Attention Visualization, [Paper], [Code]

  • (arXiv 2020.12) PCT: Point Cloud Transformer, [Paper]

  • (arXiv 2020.12) TransPose: Towards Explainable Human Pose Estimation by Transformer, [Paper]

  • (arXiv 2020.12) Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, [Paper], [Code]

  • (arXiv 2020.12) Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry, [Paper]

  • (arXiv 2020.12) Transformer for Image Quality Assessment, [Paper], [Code]

  • (arXiv 2020.12) TransTrack: Multiple-Object Tracking with Transformer, [Paper], [Code]

  • (arXiv 2020.12) 3D Object Detection with Pointformer, [Paper]

  • (arXiv 2020.12) Training data-efficient image transformers & distillation through attention, [Paper]

  • (arXiv 2020.12) Toward Transformer-Based Object Detection, [Paper]

  • (arXiv 2020.12) SceneFormer: Indoor Scene Generation with Transformers, [Paper]

  • (arXiv 2020.12) Point Transformer, [Paper]

  • (arXiv 2020.12) End-to-End Human Pose and Mesh Reconstruction with Transformers, [Paper]

  • (arXiv 2020.12) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting, [Paper]

  • (arXiv 2020.12) Pre-Trained Image Processing Transformer, [Paper]

  • (arXiv 2020.12) Taming Transformers for High-Resolution Image Synthesis, [Paper], [Code]

  • (arXiv 2020.11) End-to-end Lane Shape Prediction with Transformers, [Paper], [Code]

  • (arXiv 2020.11) UP-DETR: Unsupervised Pre-training for Object Detection with Transformers, [Paper]

  • (arXiv 2020.11) End-to-End Video Instance Segmentation with Transformers, [Paper]

  • (arXiv 2020.11) Rethinking Transformer-based Set Prediction for Object Detection, [Paper]

  • (arXiv 2020.11) General Multi-label Image Classification with Transformers, [[Paper]](https://arxiv.org/pdf/2011.14027}

  • (arXiv 2020.11) End-to-End Object Detection with Adaptive Clustering Transformer, [Paper]

  • (arXiv 2020.10) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, [Paper], [Code]

  • (arXiv 2020.07) Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks, [Paper], [Code]

  • (arXiv 2020.07) Feature Pyramid Transformer, [Paper], [Code]

  • (arXiv 2020.06) Visual Transformers: Token-based Image Representation and Processing for Computer Vision, [Paper]

  • (arXiv 2019.08) LXMERT: Learning Cross-Modality Encoder Representations from Transformers, [Paper], [Code]

TODO

  • V-L representation learning

About

Some recent Transformer-based CV works

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published