Update README.md

chaoshengt · Feb 11, 2021 · 026d3ca · 026d3ca
1 parent a63852b
commit 026d3ca
Showing 1 changed file with 98 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,2 +1,99 @@
 # Transformer-in-Vision
-Some recent Transformer-based CV works
+Some recent Transformer-based CV works. Welcome to comment or contribute!
+
+## Resource
+- Attention is all you need, [Paper](https://arxiv.org/pdf/1706.03762.pdf)
+
+- OpenAI CLIP [Page](https://openai.com/blog/clip/), [Paper](https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf), [Code](https://github.com/openai/CLIP)
+
+- [huggingface/transformers](https://github.com/huggingface/transformers)
+
+- [Kyubyong/transformer](https://github.com/Kyubyong/transformer), TF
+
+- [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch), Torch
+
+- [krasserm/fairseq-image-captioning](https://github.com/krasserm/fairseq-image-captioning)
+
+- [PyTorch Transformers Tutorials](https://github.com/abhimishra91/transformers-tutorials)
+
+- [ictnlp/awesome-transformer](https://github.com/ictnlp/awesome-transformer)
+
+- [basicv8vc/awesome-transformer](https://github.com/basicv8vc/awesome-transformer)
+
+- [dk-liang/Awesome-Visual-Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer)
+
+## Surery: 
+- (arXiv 2020.9) Efficient Transformers: A Survey, [PDF](https://arxiv.org/pdf/2009.06732.pdf)
+
+- (arXiv 2020.1) Transformers in Vision: A Survey, [PDF](https://arxiv.org/pdf/2101.01169.pdf)
+
+## Recent Papers
+- (ICLR'21) UPDET: UNIVERSAL MULTI-AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS, [Paper](https://arxiv.org/pdf/2101.08001.pdf), [Code](https://github.com/hhhusiyi-monash/UPDeT)
+
+- (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection, [Paper](https://arxiv.org/pdf/2010.04159), [Code](https://github.com/fundamentalvision/Deformable-DETR)
+
+- (ICLR'21) SUPPORT-SET BOTTLENECKS FOR VIDEO-TEXT REPRESENTATION LEARNING, [Paper](https://arxiv.org/pdf/2010.02824.pdf)
+
+- (ICLR'21) COLORIZATION TRANSFORMER, [Paper](https://arxiv.org/pdf/2102.04432.pdf), [Code](https://github.com/google-research/google-research/tree/master/coltran)
+
+- (ECCV'20) Multi-modal Transformer for Video Retrieval, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490205.pdf)
+
+- (ECCV'20) Connecting Vision and Language with Localized Narratives, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500630.pdf)
+
+- (ECCV'20) DETR: End-to-End Object Detection with Transformers, [Paper](https://arxiv.org/pdf/2005.12872), [Code](https://github.com/facebookresearch/detr)
+
+- (CVPR'20) Multi-Modality Cross Attention Network for Image and Sentence Matching, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wei_Multi-Modality_Cross_Attention_Network_for_Image_and_Sentence_Matching_CVPR_2020_paper.pdf), [Page](https://www.robots.ox.ac.uk/~vgg/research/speech2action/)
+
+- (CVPR'20) Learning Texture Transformer Network for Image Super-Resolution, [Paper](https://arxiv.org/pdf/2006.04139), [Code](https://github.com/researchmm/TTSR)
+
+- (CVPR'20) Speech2Action: Cross-modal Supervision for Action Recognition, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Nagrani_Speech2Action_Cross-Modal_Supervision_for_Action_Recognition_CVPR_2020_paper.pdf)
+
+- (ICPR'20) Transformer Encoder Reasoning Network, [Paper](https://arxiv.org/pdf/2004.09144.pdf), [Code](https://github.com/mesnico/TERN)
+
+- (EMNLP'19) Effective Use of Transformer Networks for Entity Tracking, [Paper](https://arxiv.org/pdf/1909.02635), [Code](https://github.com/aditya2211/transformer-entity-tracking)
+
+- (arXiv 2021.02) Video Transformer Network, [Paper](https://arxiv.org/pdf/2102.00719.pdf)
+
+- (arXiv 2021.02) Training Vision Transformers for Image Retrieval, [Paper](https://arxiv.org/pdf/2102.05644.pdf)
+
+- (arXiv 2021.02) Relaxed Transformer Decoders for Direct Action Proposal Generation, [Paper](https://arxiv.org/pdf/2102.01894.pdf), [Code](https://github.com/MCG-NJU/RTD-Action)
+
+- (arXiv 2021.02) TransReID: Transformer-based Object Re-Identification, [Paper](https://arxiv.org/pdf/2102.04378.pdf)
+
+- (arXiv 2021.02) Improving Visual Reasoning by Exploiting The Knowledge in Texts, [Paper](https://arxiv.org/pdf/2102.04760.pdf)
+
+- (arXiv 2021.01) Fast Convergence of DETR with Spatially Modulated Co-Attention, [Paper](https://arxiv.org/pdf/2101.07448.pdf)
+
+- (arXiv 2021.01) Dual-Level Collaborative Transformer for Image Captioning, [Paper](https://arxiv.org/pdf/2101.06462.pdf)
+
+- (arXiv 2021.01) SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation (arXiv 2021.1), [Paper](https://arxiv.org/pdf/2101.08833.pdf)
+
+- (arXiv 2021.01) CPTR: FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING, [Paper](https://arxiv.org/pdf/2101.10804.pdf)
+
+- (arXiv 2021.01) Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network, [Paper](https://arxiv.org/pdf/2101.11562.pdf), [Code](https://github.com/YehLi/TDEN)
+
+- (arXiv 2021.01) Trear: Transformer-based RGB-D Egocentric Action Recognition, [Paper](https://arxiv.org/pdf/2101.03904.pdf)
+
+- (arXiv 2021.01) Learn to Dance with AIST++: Music Conditioned 3D Dance Generation, [Paper](https://arxiv.org/pdf/2101.08779), [Page](https://google.github.io/aichoreographer/;)
+
+- (arXiv 2021.01) Spherical Transformer: Adapting Spherical Signal to CNNs, [Paper](https://arxiv.org/pdf/2101.03848.pdf)
+
+- (arXiv 2021.01) Are We There Yet? Learning to Localize in Embodied Instruction Following, [Paper](https://arxiv.org/pdf/2101.03431.pdf)
+
+- (arXiv 2021.01) VinVL: Making Visual Representations Matter in Vision-Language Models, [Paper](https://arxiv.org/pdf/2101.00529.pdf)
+
+- (arXiv 2021.01) Bottleneck Transformers for Visual Recognition, [Paper](https://arxiv.org/pdf/2101.11605.pdf)
+
+- (arXiv 2021.01) ADDRESSING SOME LIMITATIONS OF TRANSFORMERS WITH FEEDBACK MEMORY, [Paper](https://arxiv.org/pdf/2002.09402.pdf)
+
+- (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, [Paper](https://arxiv.org/pdf/2101.11986.pdf), [Code](https://github.com/yitu-opensource/T2T-ViT)
+
+- (arXiv 2021.01) Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers, [Paper](https://arxiv.org/pdf/2102.00529.pdf)
+
+- (arXiv 2020.12) Accurate Word Representations with Universal Visual Guidance, [Paper](https://arxiv.org/pdf/2012.15086.pdf)
+
+- (arXiv 2020.12) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting, [Paper](https://arxiv.org/pdf/2012.07436.pdf)
+
+- (arXiv 2020.12) Taming Transformers for High-Resolution Image Synthesis, [Paper](https://arxiv.org/pdf/2012.09841.pdf), [Code](https://github.com/CompVis/taming-transformers)
+
+- (arXiv 2020.07) Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks, [Paper](https://arxiv.org/pdf/2004.06165.pdf), [Code](https://github.com/microsoft/Oscar)