Segment and Caption Anything

The repository contains the official implementation of "Segment and Caption Anything"

tl;dr

Despite the absence of semantic labels in the training data, SAM implies high-level semantics sufficient for captioning.
SCA (b) is a lightweight augmentation of SAM (a) with the ability to generate regional captions.
On top of SAM architecture, we add a fixed pre-trained language mode, and a optimizable lightweight hybrid feature mixture whose training is cheap and scalable.

News

[01/31/2024] Update the paper and the supp. Release code v0.0.2: bump transformers to 4.36.2, support mistral series, phi-2, zephyr; add experiments about SAM+Image Captioner+V-CoT, and more.
[12/05/2023] Release paper, code v0.0.1, and project page!

Environment Preparation

Please check docs/ENV.md.

Model Zoo

Please check docs/MODEL_ZOO.md

Gradio Demo

Please check docs/DEMO.md

Running Training and Inference

Please check docs/USAGE.md.

Experiments and Evaluation

Please check docs/EVAL.md

License

The trained weights are licensed under the Apache 2.0 license.

Acknowledgement

Deeply appreciate these wonderful open source projects: transformers, accelerate, deepspeed, detectron2, hydra, timm, gradio.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@inproceedings{huang2024segment,
  title={Segment and caption anything},
  author={Huang, Xiaoke and Wang, Jianfeng and Tang, Yansong and Zhang, Zheng and Hu, Han and Lu, Jiwen and Wang, Lijuan and Liu, Zicheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13405--13417},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Segment and Caption Anything

Environment Preparation

Model Zoo

Gradio Demo

Running Training and Inference

Experiments and Evaluation

License

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Segment and Caption Anything

Environment Preparation

Model Zoo

Gradio Demo

Running Training and Inference

Experiments and Evaluation

License

Acknowledgement

Citation