Name		Name	Last commit message	Last commit date
parent directory ..
scripts		scripts
README.md		README.md
dataset.py		dataset.py
model.py		model.py
train_sup.py		train_sup.py
utils.py		utils.py

README.md

Supervised Learning Experiments

Supervised Learning Experiments
- Getting started
- Running
  - Training
  - Evaluation

Getting started

Data preparation

Please make sure ImageNet is downloaded to $DATASET/imagenet directory. Then, you may create training dataset variants of ImageNet-Captions, LAIONet, YFCC-15M, and CC-12M by following the instructions in the data_preparation folder. TSV files containing image paths will be stored under $DATASET/imagenet-captions, and corresponding class frequencies will be stored under freqs folder.

Evaluation is done on the ImageNet validation set, which is expected to be stored under $DATASET/imagenet/val. Optionally, we also support evaluating on ImageNetV2 and ImageNet-100. Example commands to download these datasets are provided below.

export DATASET=../datasets

# Download ImageNetV2
mkdir $DATASET/imagenetv2 && cd $DATASET/imagenetv2
wget https://huggingface.co/datasets/vaishaal/ImageNetV2/blob/main/imagenetv2-matched-frequency.tar.gz
tar -xvf imagenetv2-matched-frequency.tar.gz
rm imagenetv2-matched-frequency.tar.gz

# Download ImageNet-100
mkdir $DATASET/imagenet100 && cd $DATASET/imagenet100
git clone https://github.com/danielchyeh/ImageNet-100-Pytorch.git && cd ImageNet-100-Pytorch
python generate_IN100.py --source_folder $DATASET/imagenet --target_folder $DATASET/imagenet100
rm -r $DATASET/imagenet100/ImageNet-100-Pytorch

Environment setup

Nothing to take special care here. Basically just make sure PyTorch (>=2.0, with CUDA) is installed and there are at least 4 GPUs on your device.

Pre-trained heads

We have provided pre-extracted class embeddings for 1K ImageNet classes with different prompts and text encoders, check heads folder for details. You may also extract your own class embeddings using dump_clip_txt_features.py. Depending the text encoder you use, you may need to install corresponding libraries, e.g., clip, open_clip, and transformers.

Running

Training

We privide example scripts to replicate our experiments in the scripts folder. This includes investigations on vocabulary size (Sec. 3.3), data distribution (Sec. 3.4 & 3.5), and open-world concepts (Sec. 3.6). It also supports explorations on few-shot and open-world recognition (Sec. 4.1). You may run them directly or modify them to suit your needs. Checkpoints and intermediate evaluation results are saved to the output folder by default.

Evaluation

The metrics are already computed and saved during training. If you want to re-evaluate a trained model, you may run the following command:

bash scripts/eval.sh $PATH_TO_CHECKPOINT

The results will be saved to the same directory as the checkpoint file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exps_sup

exps_sup

README.md

Supervised Learning Experiments

Getting started

Data preparation

Environment setup

Pre-trained heads

Running

Training

Evaluation

Files

exps_sup

Directory actions

More options

Directory actions

More options

Latest commit

History

exps_sup

Folders and files

parent directory

README.md

Supervised Learning Experiments

Getting started

Data preparation

Environment setup

Pre-trained heads

Running

Training

Evaluation