Please make sure ImageNet is downloaded to $DATASET/imagenet
directory. Then, you may create training dataset variants of ImageNet-Captions, LAIONet, YFCC-15M, and CC-12M by following the instructions in the data_preparation folder. TSV files containing image paths will be stored under $DATASET/imagenet-captions
, and corresponding class frequencies will be stored under freqs folder.
Evaluation is done on the ImageNet validation set, which is expected to be stored under $DATASET/imagenet/val
. Optionally, we also support evaluating on ImageNetV2 and ImageNet-100. Example commands to download these datasets are provided below.
export DATASET=../datasets
# Download ImageNetV2
mkdir $DATASET/imagenetv2 && cd $DATASET/imagenetv2
wget https://huggingface.co/datasets/vaishaal/ImageNetV2/blob/main/imagenetv2-matched-frequency.tar.gz
tar -xvf imagenetv2-matched-frequency.tar.gz
rm imagenetv2-matched-frequency.tar.gz
# Download ImageNet-100
mkdir $DATASET/imagenet100 && cd $DATASET/imagenet100
git clone https://github.com/danielchyeh/ImageNet-100-Pytorch.git && cd ImageNet-100-Pytorch
python generate_IN100.py --source_folder $DATASET/imagenet --target_folder $DATASET/imagenet100
rm -r $DATASET/imagenet100/ImageNet-100-Pytorch
Our CLIP experiments are conducted using a customized version of open_clip, which is included as submodule. If it is not cloned together with this repository, you may run the git submodule update --init --recursive
in the root directory to fetch it. Then, run the following commands to install it from source:
cd open_clip
make install
make install-training
We privide example scripts to replicate our experiments in the scripts folder. This includes investigations on caption descriptiveness (Sec. 3.2), data distribution, diversity, imbalance level (Sec. 3.4), data scale (Sec. 3.5) and open-world concepts (Sec. 3.6). It also supports explorations on few-shot and open-world recognition (Sec. 4.1). You may run them directly or modify them to suit your needs. Checkpoints and intermediate evaluation results are saved to the logs folder by default.
The metrics are already computed and saved during training. We also provide example scripts for re-evaluating trained checkpoints, e.g., for additional evaluation datasets or prompts. The following is an example:
export TEMPLATE_TYPE=openai # open_clip default option
bash scripts/zero_shot/run_openzlip_zs.sh $PATH_TO_RUN_FOLDER $TEMPLATE_TYPE
The results will be saved to the same directory as the checkpoint file.
Besides, we also support evaluating all pre-trained CLIP models provided by open_clip. Simply run bash scripts/zero_shot/run_all.sh
or python run_pretrained_openclip.py
to evaluate them, and check run_pretrained_openclip.py for details. The results will be saved to logs_pretrained by default.