forked from DirtyHarryLYL/Transformer-in-Vision
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a63852b
commit 026d3ca
Showing
1 changed file
with
98 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,99 @@ | ||
# Transformer-in-Vision | ||
Some recent Transformer-based CV works | ||
Some recent Transformer-based CV works. Welcome to comment or contribute! | ||
|
||
## Resource | ||
- Attention is all you need, [Paper](https://arxiv.org/pdf/1706.03762.pdf) | ||
|
||
- OpenAI CLIP [Page](https://openai.com/blog/clip/), [Paper](https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf), [Code](https://github.com/openai/CLIP) | ||
|
||
- [huggingface/transformers](https://github.com/huggingface/transformers) | ||
|
||
- [Kyubyong/transformer](https://github.com/Kyubyong/transformer), TF | ||
|
||
- [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch), Torch | ||
|
||
- [krasserm/fairseq-image-captioning](https://github.com/krasserm/fairseq-image-captioning) | ||
|
||
- [PyTorch Transformers Tutorials](https://github.com/abhimishra91/transformers-tutorials) | ||
|
||
- [ictnlp/awesome-transformer](https://github.com/ictnlp/awesome-transformer) | ||
|
||
- [basicv8vc/awesome-transformer](https://github.com/basicv8vc/awesome-transformer) | ||
|
||
- [dk-liang/Awesome-Visual-Transformer](https://github.com/dk-liang/Awesome-Visual-Transformer) | ||
|
||
## Surery: | ||
- (arXiv 2020.9) Efficient Transformers: A Survey, [PDF](https://arxiv.org/pdf/2009.06732.pdf) | ||
|
||
- (arXiv 2020.1) Transformers in Vision: A Survey, [PDF](https://arxiv.org/pdf/2101.01169.pdf) | ||
|
||
## Recent Papers | ||
- (ICLR'21) UPDET: UNIVERSAL MULTI-AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS, [Paper](https://arxiv.org/pdf/2101.08001.pdf), [Code](https://github.com/hhhusiyi-monash/UPDeT) | ||
|
||
- (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection, [Paper](https://arxiv.org/pdf/2010.04159), [Code](https://github.com/fundamentalvision/Deformable-DETR) | ||
|
||
- (ICLR'21) SUPPORT-SET BOTTLENECKS FOR VIDEO-TEXT REPRESENTATION LEARNING, [Paper](https://arxiv.org/pdf/2010.02824.pdf) | ||
|
||
- (ICLR'21) COLORIZATION TRANSFORMER, [Paper](https://arxiv.org/pdf/2102.04432.pdf), [Code](https://github.com/google-research/google-research/tree/master/coltran) | ||
|
||
- (ECCV'20) Multi-modal Transformer for Video Retrieval, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123490205.pdf) | ||
|
||
- (ECCV'20) Connecting Vision and Language with Localized Narratives, [Paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500630.pdf) | ||
|
||
- (ECCV'20) DETR: End-to-End Object Detection with Transformers, [Paper](https://arxiv.org/pdf/2005.12872), [Code](https://github.com/facebookresearch/detr) | ||
|
||
- (CVPR'20) Multi-Modality Cross Attention Network for Image and Sentence Matching, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wei_Multi-Modality_Cross_Attention_Network_for_Image_and_Sentence_Matching_CVPR_2020_paper.pdf), [Page](https://www.robots.ox.ac.uk/~vgg/research/speech2action/) | ||
|
||
- (CVPR'20) Learning Texture Transformer Network for Image Super-Resolution, [Paper](https://arxiv.org/pdf/2006.04139), [Code](https://github.com/researchmm/TTSR) | ||
|
||
- (CVPR'20) Speech2Action: Cross-modal Supervision for Action Recognition, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Nagrani_Speech2Action_Cross-Modal_Supervision_for_Action_Recognition_CVPR_2020_paper.pdf) | ||
|
||
- (ICPR'20) Transformer Encoder Reasoning Network, [Paper](https://arxiv.org/pdf/2004.09144.pdf), [Code](https://github.com/mesnico/TERN) | ||
|
||
- (EMNLP'19) Effective Use of Transformer Networks for Entity Tracking, [Paper](https://arxiv.org/pdf/1909.02635), [Code](https://github.com/aditya2211/transformer-entity-tracking) | ||
|
||
- (arXiv 2021.02) Video Transformer Network, [Paper](https://arxiv.org/pdf/2102.00719.pdf) | ||
|
||
- (arXiv 2021.02) Training Vision Transformers for Image Retrieval, [Paper](https://arxiv.org/pdf/2102.05644.pdf) | ||
|
||
- (arXiv 2021.02) Relaxed Transformer Decoders for Direct Action Proposal Generation, [Paper](https://arxiv.org/pdf/2102.01894.pdf), [Code](https://github.com/MCG-NJU/RTD-Action) | ||
|
||
- (arXiv 2021.02) TransReID: Transformer-based Object Re-Identification, [Paper](https://arxiv.org/pdf/2102.04378.pdf) | ||
|
||
- (arXiv 2021.02) Improving Visual Reasoning by Exploiting The Knowledge in Texts, [Paper](https://arxiv.org/pdf/2102.04760.pdf) | ||
|
||
- (arXiv 2021.01) Fast Convergence of DETR with Spatially Modulated Co-Attention, [Paper](https://arxiv.org/pdf/2101.07448.pdf) | ||
|
||
- (arXiv 2021.01) Dual-Level Collaborative Transformer for Image Captioning, [Paper](https://arxiv.org/pdf/2101.06462.pdf) | ||
|
||
- (arXiv 2021.01) SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation (arXiv 2021.1), [Paper](https://arxiv.org/pdf/2101.08833.pdf) | ||
|
||
- (arXiv 2021.01) CPTR: FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING, [Paper](https://arxiv.org/pdf/2101.10804.pdf) | ||
|
||
- (arXiv 2021.01) Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network, [Paper](https://arxiv.org/pdf/2101.11562.pdf), [Code](https://github.com/YehLi/TDEN) | ||
|
||
- (arXiv 2021.01) Trear: Transformer-based RGB-D Egocentric Action Recognition, [Paper](https://arxiv.org/pdf/2101.03904.pdf) | ||
|
||
- (arXiv 2021.01) Learn to Dance with AIST++: Music Conditioned 3D Dance Generation, [Paper](https://arxiv.org/pdf/2101.08779), [Page](https://google.github.io/aichoreographer/;) | ||
|
||
- (arXiv 2021.01) Spherical Transformer: Adapting Spherical Signal to CNNs, [Paper](https://arxiv.org/pdf/2101.03848.pdf) | ||
|
||
- (arXiv 2021.01) Are We There Yet? Learning to Localize in Embodied Instruction Following, [Paper](https://arxiv.org/pdf/2101.03431.pdf) | ||
|
||
- (arXiv 2021.01) VinVL: Making Visual Representations Matter in Vision-Language Models, [Paper](https://arxiv.org/pdf/2101.00529.pdf) | ||
|
||
- (arXiv 2021.01) Bottleneck Transformers for Visual Recognition, [Paper](https://arxiv.org/pdf/2101.11605.pdf) | ||
|
||
- (arXiv 2021.01) ADDRESSING SOME LIMITATIONS OF TRANSFORMERS WITH FEEDBACK MEMORY, [Paper](https://arxiv.org/pdf/2002.09402.pdf) | ||
|
||
- (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, [Paper](https://arxiv.org/pdf/2101.11986.pdf), [Code](https://github.com/yitu-opensource/T2T-ViT) | ||
|
||
- (arXiv 2021.01) Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers, [Paper](https://arxiv.org/pdf/2102.00529.pdf) | ||
|
||
- (arXiv 2020.12) Accurate Word Representations with Universal Visual Guidance, [Paper](https://arxiv.org/pdf/2012.15086.pdf) | ||
|
||
- (arXiv 2020.12) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting, [Paper](https://arxiv.org/pdf/2012.07436.pdf) | ||
|
||
- (arXiv 2020.12) Taming Transformers for High-Resolution Image Synthesis, [Paper](https://arxiv.org/pdf/2012.09841.pdf), [Code](https://github.com/CompVis/taming-transformers) | ||
|
||
- (arXiv 2020.07) Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks, [Paper](https://arxiv.org/pdf/2004.06165.pdf), [Code](https://github.com/microsoft/Oscar) |