By Kaige Li, Qichuan Geng, Maoxian Wan, Xiaochun Cao, Senior Member, IEEE, and Zhong Zhou. This repository is an official implementation of the paper "Context and Spatial Feature Calibration for Real-Time Semantic Segmentation", which is under review. The full code will be released after review.
We have open sourced the original network files, but the overall project still needs to be refactored, which is what we will do in the future.
Comparison of inference speed and accuracy for real-time models on test set of Cityscapes.
- Towards Real-time Applications: S2FCN could be directly used for the real-time applications, such as autonomous vehicle and medical imaging.
- A Novel and Efficient Decoder: a Context and Spatial Decoder is introduced to deal with the problems of pixel-context mismatch and spatial feature misalignment via pooling-based and sampling-based attention mechanisms.
- More Accurate and Faster: CSFCN presents 78.7% mIOU with speed of 70.0 FPS on Cityscapes test set and 77.8% mIOU with speed of 179.2 FPS on CamVid test set.
- Our paper is undergoing a second peer review. In the meantime, we have prepared a preprint and will post a link to it soon. (Aug/06/2023)
- The overview, training logs, and some codes for CSFCN are available here. (Aug/08/2023)
- We reproduced the network files of SFANet and MGSeg for comparison with them. (Aug/16/2023)
- We validate the validity of our reproductions, which achieve comparable performance to those in the original paper. (Aug/19/2023)
- We’ve updated how to use TensorRT to accelerate network inference. (Sep/26/2023)
- We find the design of Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, CVPR 2016 could further improve performance, so we recommend using it to build networks. (Sep/27/2023)
- We open sourced pixelshuffle-based CSFCN and CSFCN-Tiny. Please note that some hyperparameters of the module still need to be standardized, which we are working on. (Nov/29/2023)
A demo of the segmentation performance of our proposed CSFCNs: Predictions of CSFCN-100 (left), CSFCN-75 (middle), and CSFCN-50 (right).
Cityscapes Stuttgart demo video #0
Cityscapes Stuttgart demo video #1
Cityscapes Stuttgart demo video #2
An overview of the basic architecture of our proposed Context and Spatial Feature Calibration Network (CSFCN).
🔔 We append 50, 75 and 100 after the network name to represent the input sizes of 512 × 1024, 768 × 1536 and 1024 × 2048, respectively.
Model (Cityscapes) | Val (% mIOU) | Test (% mIOU) | FPS (GTX 1080 Ti) | FPS (RTX 2080 Super Max-Q) |
---|---|---|---|---|
CSFCN-50 | 74.0 | 73.8 | 229.1 | 114.2 |
CSFCN-75 | 77.3 | 77.2 | 122.2 | 65.9 |
CSFCN-100 | 79.0 | 78.7 | 70.0 | 38.1 |
🔔 Our Cityscapes pre-trained CSFCN obtains 81.0% mIoU on the CamVid set.
Model (CamVid) | Val (% mIOU) | Test (% mIOU) | FPS (GTX 1080 Ti) | FPS (RTX 2080 Super Max-Q) |
---|---|---|---|---|
CSFCN | - | 77.8 | 179.2 | 101.8 |
CSFCN-P | - | 81.0 | 179.2 | 101.8 |
😺 The inference time is measured under torch 1.7.1, CUDA 11.0, and CUDNN 7.6.5 on a single NVIDIA GTX 1080Ti GPU card with a batch size 1.
😺 Our method can still maintain better real-time performance on RTX 2080 Super Max-Q.
For this project, we used python 3.8.5. We recommend setting up a new virtual environment:
python -m venv ~/venv/CSFCN
source ~/venv/CSFCN/bin/activate
In that environment, the requirements can be installed with:
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
- Download the Cityscapes and CamVid datasets and unzip them in
data/cityscapes
anddata/camvid
dirs.
- Download the ImageNet pretrained models and put them into
pretrained_models/imagenet/
dir. - For example, train the CSFCN on Cityscapes with batch size of 12 on one GPU (e.g., 3090):
python train.py --cfg configs/CSFCN_cityscapes.yaml
- Or train the CSFCN on Cityscapes using train and val sets simultaneously with batch size of 12 on one GPU:
python trainval.py --cfg configs/CSFCN_cityscapes_trainval.yaml
- Download the finetuned models for Cityscapes and CamVid and put them into
pretrained_models/cityscapes/
andpretrained_models/camvid/
dirs, respectively. - For example, evaluate the CSFCN on Cityscapes val set:
python tools/eval.py --cfg configs/CSFCN_cityscapes.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/CSFCN_best_model.pt
- Or, evaluate the CSFCN on CamVid test set:
python tools/eval.py --cfg configs/CSFCN_camvid.yaml \
TEST.MODEL_FILE pretrained_models/camvid/CSFCN_camvid_best_model.pt
- Generate the testing results of CSFCN on Cityscapes test set:
python tools/submit.py --cfg configs/CSFCN_cityscapes_trainval.yaml \
TEST.MODEL_FILE pretrained_models/cityscapes/CSFCN_trainval_best_model.pt
- If you have successfully installed TensorRT, you will automatically use TensorRT for the following latency tests (see function here).
- Or you can implement TensorRT inference based on the guidance of torch2trt. (recommended) 🔥
- Otherwise you will be switched to use Pytorch for the latency tests (see function here).
- Measure the inference speed of CSFCN-100 for Cityscapes:
python models/speed/CSFCN_speed.py --c 19 --r 1024 2048
- Measure the inference speed of CSFCN for CamVid:
python models/speed/CSFCN_speed.py --c 11 --r 720 960
- Put all your images in
samples/
and then run the command below using Cityscapes pretrained CSFCN for image format of .png:
python tools/custom.py --p '../pretrained_models/cityscapes/CSFCN_best_model.pth' --t '*.png'
You should end up seeing images that look like the following:
- Refactor and clean code
- Organize all codes and upload them
- Release complete config, network and training files
For fair comparison, we have reproduced some related methods.
In the future, we will open source the codes of more real-time semantic segmentation methods (e.g., MSFNet).
This project is based on the following open-source projects. We thank their authors for making the source code publically available.