This repo contains the supported code and configuration files to reproduce semantic segmentaion results of Swin Transformer. It is based on mmsegmentaion.
19/07/2021 Trained Swin-B with UPerNet on Cityscapes dataset.
Backbone | Pretrain | Lr Schd | mIoU | #params | config | model |
---|---|---|---|---|---|---|
Swin-B | ImageNet-22K | 40K | 79.87 | 121M | config | drive |
Demo Swin-B: colab
Inference demo: colab
05/11/2021 Models for MoBY are released
04/12/2021 Initial commits
Backbone | Method | Crop Size | Lr Schd | mIoU | mIoU (ms+flip) | #params | FLOPs | config | log | model |
---|---|---|---|---|---|---|---|---|---|---|
Swin-T | UPerNet | 512x512 | 160K | 44.51 | 45.81 | 60M | 945G | config | github/baidu | github/baidu |
Swin-S | UperNet | 512x512 | 160K | 47.64 | 49.47 | 81M | 1038G | config | github/baidu | github/baidu |
Swin-B | UperNet | 512x512 | 160K | 48.13 | 49.72 | 121M | 1188G | config | github/baidu | github/baidu |
Notes:
- Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
- Access code for
baidu
isswin
.
Backbone | Method | Crop Size | Lr Schd | mIoU | mIoU (ms+flip) | #params | FLOPs | config | log | model |
---|---|---|---|---|---|---|---|---|---|---|
Swin-T | UPerNet | 512x512 | 160K | 44.06 | 45.58 | 60M | 945G | config | github/baidu | github/baidu |
Notes:
- The learning rate needs to be tuned for best practice.
- MoBY pre-trained models can be downloaded from MoBY with Swin Transformer.
Please refer to get_started.md for installation and dataset preparation.
# single-gpu testing
python tools/test.py <CONFIG_FILE> <SEG_CHECKPOINT_FILE> --eval mIoU
# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --eval mIoU
# multi-gpu, multi-scale testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --aug-test --eval mIoU
To train with pre-trained models, run:
# single-gpu training
python tools/train.py <CONFIG_FILE> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
For example, to train an UPerNet model with a Swin-T
backbone and 8 gpus, run:
tools/dist_train.sh configs/swin/upernet_swin_tiny_patch4_window7_512x512_160k_ade20k.py 8 --options model.pretrained=<PRETRAIN_MODEL>
Notes:
use_checkpoint
is used to save GPU memory. Please refer to this page for more details.- The default learning rate and training schedule is for 8 GPUs and 2 imgs/gpu.
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Image Classification: See Swin Transformer for Image Classification.
Object Detection: See Swin Transformer for Object Detection.
Self-Supervised Learning: See MoBY with Swin Transformer.