Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
DirtyHarryLYL authored Feb 11, 2021
1 parent a63852b commit 026d3ca
Showing 1 changed file with 98 additions and 1 deletion.
99 changes: 98 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,99 @@
# Transformer-in-Vision
Some recent Transformer-based CV works
Some recent Transformer-based CV works. Welcome to comment or contribute!

## Resource
- Attention is all you need, [Paper](https://arxiv.org/pdf/1706.03762.pdf)

- OpenAI CLIP [Page](https://openai.com/blog/clip/), [Paper](https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf), [Code](https://github.com/openai/CLIP)

- [huggingface/transformers](https://github.com/huggingface/transformers)

- [Kyubyong/transformer](https://github.com/Kyubyong/transformer), TF

- [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch), Torch

- [krasserm/fairseq-image-captioning](https://github.com/krasserm/fairseq-image-captioning)

- [PyTorch Transformers Tutorials](https://github.com/abhimishra91/transformers-tutorials)

- [ictnlp/awesome-transformer](https://github.com/ictnlp/awesome-transformer)

- [basicv8vc/awesome-transformer](https://github.com/basicv8vc/awesome-transformer)

- [dk-liang/Awesome-Visual-Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer)

## Surery:
- (arXiv 2020.9) Efficient Transformers: A Survey, [PDF](https://arxiv.org/pdf/2009.06732.pdf)

- (arXiv 2020.1) Transformers in Vision: A Survey, [PDF](https://arxiv.org/pdf/2101.01169.pdf)

## Recent Papers
- (ICLR'21) UPDET: UNIVERSAL MULTI-AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS, [Paper](https://arxiv.org/pdf/2101.08001.pdf), [Code](https://github.com/hhhusiyi-monash/UPDeT)

- (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection, [Paper](https://arxiv.org/pdf/2010.04159), [Code](https://github.com/fundamentalvision/Deformable-DETR)

- (ICLR'21) SUPPORT-SET BOTTLENECKS FOR VIDEO-TEXT REPRESENTATION LEARNING, [Paper](https://arxiv.org/pdf/2010.02824.pdf)

- (ICLR'21) COLORIZATION TRANSFORMER, [Paper](https://arxiv.org/pdf/2102.04432.pdf), [Code](https://github.com/google-research/google-research/tree/master/coltran)

- (ECCV'20) Multi-modal Transformer for Video Retrieval, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490205.pdf)

- (ECCV'20) Connecting Vision and Language with Localized Narratives, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500630.pdf)

- (ECCV'20) DETR: End-to-End Object Detection with Transformers, [Paper](https://arxiv.org/pdf/2005.12872), [Code](https://github.com/facebookresearch/detr)

- (CVPR'20) Multi-Modality Cross Attention Network for Image and Sentence Matching, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wei_Multi-Modality_Cross_Attention_Network_for_Image_and_Sentence_Matching_CVPR_2020_paper.pdf), [Page](https://www.robots.ox.ac.uk/~vgg/research/speech2action/)

- (CVPR'20) Learning Texture Transformer Network for Image Super-Resolution, [Paper](https://arxiv.org/pdf/2006.04139), [Code](https://github.com/researchmm/TTSR)

- (CVPR'20) Speech2Action: Cross-modal Supervision for Action Recognition, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Nagrani_Speech2Action_Cross-Modal_Supervision_for_Action_Recognition_CVPR_2020_paper.pdf)

- (ICPR'20) Transformer Encoder Reasoning Network, [Paper](https://arxiv.org/pdf/2004.09144.pdf), [Code](https://github.com/mesnico/TERN)

- (EMNLP'19) Effective Use of Transformer Networks for Entity Tracking, [Paper](https://arxiv.org/pdf/1909.02635), [Code](https://github.com/aditya2211/transformer-entity-tracking)

- (arXiv 2021.02) Video Transformer Network, [Paper](https://arxiv.org/pdf/2102.00719.pdf)

- (arXiv 2021.02) Training Vision Transformers for Image Retrieval, [Paper](https://arxiv.org/pdf/2102.05644.pdf)

- (arXiv 2021.02) Relaxed Transformer Decoders for Direct Action Proposal Generation, [Paper](https://arxiv.org/pdf/2102.01894.pdf), [Code](https://github.com/MCG-NJU/RTD-Action)

- (arXiv 2021.02) TransReID: Transformer-based Object Re-Identification, [Paper](https://arxiv.org/pdf/2102.04378.pdf)

- (arXiv 2021.02) Improving Visual Reasoning by Exploiting The Knowledge in Texts, [Paper](https://arxiv.org/pdf/2102.04760.pdf)

- (arXiv 2021.01) Fast Convergence of DETR with Spatially Modulated Co-Attention, [Paper](https://arxiv.org/pdf/2101.07448.pdf)

- (arXiv 2021.01) Dual-Level Collaborative Transformer for Image Captioning, [Paper](https://arxiv.org/pdf/2101.06462.pdf)

- (arXiv 2021.01) SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation (arXiv 2021.1), [Paper](https://arxiv.org/pdf/2101.08833.pdf)

- (arXiv 2021.01) CPTR: FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING, [Paper](https://arxiv.org/pdf/2101.10804.pdf)

- (arXiv 2021.01) Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network, [Paper](https://arxiv.org/pdf/2101.11562.pdf), [Code](https://github.com/YehLi/TDEN)

- (arXiv 2021.01) Trear: Transformer-based RGB-D Egocentric Action Recognition, [Paper](https://arxiv.org/pdf/2101.03904.pdf)

- (arXiv 2021.01) Learn to Dance with AIST++: Music Conditioned 3D Dance Generation, [Paper](https://arxiv.org/pdf/2101.08779), [Page](https://google.github.io/aichoreographer/;)

- (arXiv 2021.01) Spherical Transformer: Adapting Spherical Signal to CNNs, [Paper](https://arxiv.org/pdf/2101.03848.pdf)

- (arXiv 2021.01) Are We There Yet? Learning to Localize in Embodied Instruction Following, [Paper](https://arxiv.org/pdf/2101.03431.pdf)

- (arXiv 2021.01) VinVL: Making Visual Representations Matter in Vision-Language Models, [Paper](https://arxiv.org/pdf/2101.00529.pdf)

- (arXiv 2021.01) Bottleneck Transformers for Visual Recognition, [Paper](https://arxiv.org/pdf/2101.11605.pdf)

- (arXiv 2021.01) ADDRESSING SOME LIMITATIONS OF TRANSFORMERS WITH FEEDBACK MEMORY, [Paper](https://arxiv.org/pdf/2002.09402.pdf)

- (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, [Paper](https://arxiv.org/pdf/2101.11986.pdf), [Code](https://github.com/yitu-opensource/T2T-ViT)

- (arXiv 2021.01) Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers, [Paper](https://arxiv.org/pdf/2102.00529.pdf)

- (arXiv 2020.12) Accurate Word Representations with Universal Visual Guidance, [Paper](https://arxiv.org/pdf/2012.15086.pdf)

- (arXiv 2020.12) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting, [Paper](https://arxiv.org/pdf/2012.07436.pdf)

- (arXiv 2020.12) Taming Transformers for High-Resolution Image Synthesis, [Paper](https://arxiv.org/pdf/2012.09841.pdf), [Code](https://github.com/CompVis/taming-transformers)

- (arXiv 2020.07) Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks, [Paper](https://arxiv.org/pdf/2004.06165.pdf), [Code](https://github.com/microsoft/Oscar)

0 comments on commit 026d3ca

Please sign in to comment.