TCM/rotated_object_detection at main · wenwenyu/TCM

History

Name		Name	Last commit message	Last commit date
parent directory ..
.circleci		.circleci
.dev_scripts		.dev_scripts
.github		.github
CLIP		CLIP
configs		configs
demo		demo
docker		docker
docs		docs
mmrotate		mmrotate
projects/example_project		projects/example_project
requirements		requirements
resources		resources
tests		tests
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config-zh-cn.yaml		.pre-commit-config-zh-cn.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model-index.yml		model-index.yml
requirements.txt		requirements.txt
scatter.ipynb		scatter.ipynb
setup.cfg		setup.cfg
setup.py		setup.py

README.md

Turning a CLIP Model into a Scene Text Spotter

This repository is build upon mmrotate 1.0.0.

DOTA Dataset

DOTA dataset, can be downloaded from here.

Usage

Environment

cuda 12.1
torch=2.1.0
torchvision=0.16.0
mmcv-full=2.1.0
mmdet=3.2.0
mmrotate=1.0.0rc1
clip=1.0

The code is based on mmrotate & CLIP. Please first install the mmcv-full and mmdet following the official guidelines (mmrotate), then install CLIP.

Dataset

Please following the mmrotate official guidelines to prepare the datasets accordingly.
Configure the dataset path in CLIP/config_TCM/TCM_dota.py.

Pre-trained CLIP Models

Download the pre-trained CLIP models (RN50.pt) and save them to the pretrained folder.
Configure the pre-trained CLIP models path in config file as

# model settings
ckpt_path = '/xxx/RN50.pt'

Training & Evaluation

To finetune the TCM model based on pretrained RN50.pt, please set the ckpt_path, then run:

python ./tools/train.py CLIP/config_TCM/rotated-fcosTCM-le90_r50_fpn_1x_dota.py --work-dir ./work_dirs/r-fcos-tcm

To evaluate the performance with checkpoint, run:

CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh work_dirs/r-fcos-tcm/rotated-fcosTCM-le90_r50_fpn_1x_dota.py work_dirs/r-fcos-tcm/epoch_12.pth 4

Results

Method	Data	AP50 (single scale)	Model
TCM-rotated-FCOS	DOTA	75.1%	config \| log \| weights
TCM-rotated-ATSS	DOTA	76.1%	config \| log \| weights
TCM-rotaed-retinanet	DOTA	70.99%	config \| log \| weights

Cites

If you find this project helpful for your research, please consider citing the paper

@inproceedings{Yu2023TurningAC,
  title={Turning a CLIP Model into a Scene Text Detector},
  author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

@article{Yu2024TurningAC,
  title={Turning a CLIP Model into a Scene Text Spotter},
  author={Wenwen Yu and Yuliang Liu and Xingkui Zhu and Haoyu Cao and Xing Sun and Xiang Bai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

Licence

This project is under the CC-BY-NC 4.0 license.

Acknowledges

The project partially based on MMRotate, CLIP, DenseCLIP. Thanks for their great works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rotated_object_detection

rotated_object_detection

README.md

Turning a CLIP Model into a Scene Text Spotter

DOTA Dataset

Usage

Environment

Dataset

Pre-trained CLIP Models

Training & Evaluation

Results

Cites

Licence

Acknowledges

Files

rotated_object_detection

Directory actions

More options

Directory actions

More options

Latest commit

History

rotated_object_detection

Folders and files

parent directory

README.md

Turning a CLIP Model into a Scene Text Spotter

DOTA Dataset

Usage

Environment

Dataset

Pre-trained CLIP Models

Training & Evaluation

Results

Cites

Licence

Acknowledges