1 |
ViT |
An image is worth 16 * 16 words: transformers for image recognition at scale |
paper code |
ICLR 2021 |
Google Brain |
22 Oct 2020 |
2 |
LeViT |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference |
paper |
arXiv |
/ |
2 Apr 2021 |
3 |
Swin Transformer |
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows |
paper code |
arXiv |
MSRA |
25 Mar 2021 |
4 |
DeiT Transformer |
Training data-efficient image transformers& distillation through attention |
paper code |
arXiv |
Facebook AI |
15 Jan 2021 |
5 |
Pyramid Vision Transformer |
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions |
paper code |
arXiv |
Nanjing University of Science and Technology |
24 Feb 2021 |
6 |
TNT |
Transformer in Transformer |
paper code |
arXiv |
Noah's Ark Lab |
27 Feb 2021 |
7 |
PiT |
Rethinking Spatial Dimensions of Vision Transformers |
paper code |
arXiv |
NAVER AI Lab |
30 Mar 2021 |
8 |
T2T-ViT |
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet |
paper code |
arXiv |
NUS |
22 Mar 2021 |
9 |
CPVT |
Conditional Positional Encodings for Vision Transformers |
paper code |
arXiv |
Meituan Inc |
18 Mar 2021 |
10 |
ViL |
Multi-Scale Vision Longformer:A New Vision Transformer for High-Resolution Image Encoding |
paper |
arXiv |
Microsoft Corporation |
29 Mar 2021 |
11 |
CoaT |
Co-Scale Conv-Attentional Image Transformer |
paper code |
arXiv |
University of California San Diego |
13 April 2021 |
12 |
CoaT |
Co-Scale Conv-Attentional Image Transformer |
paper code |
arXiv |
University of California San Diego |
13 April 2021 |
14 |
pruning |
Visual Transforemr Pruning |
paper |
arXiv |
Zhejiang University |
17 April 2021 |
15 |
ViL |
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding |
paper |
arXiv |
Microsoft Corporation |
29 Mar 2021 |
16 |
M2TR |
M2TR: Multi-modal Multi-scale Transformersfor Deepfake Detection |
paper |
arXiv |
Fudan Univeristy |
21 Apr |
17 |
VisTransformer |
Visformer: The Vision-friendly Transformer |
paper code |
arXiv |
Beihang University |
26 April 2021 |
18 |
ConTNet |
ConTNet: Why not use convolution and transformer at the same time? |
paper code |
arXiv |
ByteDance AI Lab |
27 Apr 2021 |
19 |
Twins-SVT |
Twins: Revisiting the Design of Spatial Attention in Vision Transformers |
paper code |
arXiv |
Meituan Inc |
28 Apr 2021 |
20 |
LeViT |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference |
paper code |
arXiv |
Facebook |
6 May 2021 |
21 |
CoAtNet |
CoAtNet: Marrying Convolution and Attentionfor All Data Sizes |
paper |
arXiv |
Google Brain |
9 June, 2021 |
22 |
Focal Transformer |
Focal Self-attention for Local-Global Interactions in Vision Transformers |
paper |
Microsoft Research at Redmond |
1 Jul 2021 |
|
23 |
BEIT |
BEIT: BERT Pre-Training of Image Transformers |
paper |
arXiv |
Microsoft |
15 Jun 2021 |
24 |
ViT-G |
Scaling Vision Transformers |
paper |
arXiv |
google brain |
8 Jun 2021 |
25 |
- |
Efficient Training of Visual Transformers with Small-Size Datasets |
paper |
arXiv |
TFBK |
7 Jun 2021 |
26 |
PS-ViT |
Vision Transformer with Progressive Sampling |
paper code |
arXiv |
Centre for Perceptual and Interactive Intelligence |
3 Aug 2021 |
27 |
- |
Masked Autoencoders Are Scalable Vision Learners |
paper |
arXiv |
Facebook FAIR |
11 Nov 2021 |
28 |
Evo-ViT |
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer |
paper |
AAAI 2022 |
Chinese Academy of Sciences |
6 Dec 2021 |
29 |
ATS |
ATS: Adaptive Token Sampling For Efficient Vision Transformers |
paper |
arXiv |
Microsoft |
30 Nov 2021 |
30 |
AdaViT |
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition |
paper |
arXiv |
Fudan University |
30 Nov 2021 |
31 |
PeCo |
PeCo : Perceptual Codebook for BERT Pre-training of Vision Transformers |
paper code |
arXiv |
University of Science and Technology of China |
24 Nov 2021 |
32 |
DAT |
Vision Transformer with Deformable Attention |
paper code |
arXiv |
Tsinghua University |
3 Jan 2022 |