Can OOD Object Detectors Learn from Foundation Models?

Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi^†

The University of Hong Kong

† corresponding author

European Conference on Computer Vision (ECCV) 2024

We would like to say YES to the title. We introduce SyncOOD to access open-world knowledge encapsulated within off-the-shelf foundation models by synthesizing meaningful OOD data.
SyncOOD provides an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects with annotation boxes via image editing.
The synthetic OOD samples are filtered and employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution(ID)/out-of-distribution(OOD) decision boundaries with minimal data usage.
Explore more in the paper: Can OOD Object Detectors Learn from Foundation Models? in ECCV 2024.

Quick Guide

This repository contains code of SyncOOD in two parts:

Train an OOD Detector for achieving state-of-the-art OOD object detection with synthetic data:
Synthesize Novel Samples for OOD object detection and more open-world tasks (plan to be public).

Abstract

Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.

Key Contributions

We investigate and unlock the potential of text-to-image generative models trained on large-scale open-set data for synthesizing OOD objects in object detection tasks.
We introduce an automated data curation process for obtaining controllable, annotated scene-level synthetic OOD images for OOD object detection, which utilizes LLMs for novel concept discovery and visual foundation models for data annotation and filtering.
We discover that maintaining ID/OOD image context consistency and obtaining more accurate OOD annotation bounding boxes are crucial for synthesized data to be effective in OOD object detection.
Comprehensive experiments on multiple benchmarks demonstrate the effectiveness of our method, as we significantly outperform existing state-of-the-art approaches while using minimal synthetic data.

Citation

If you find this work is useful, please consider citing:

@inproceedings{liu2025can,
  title={Can OOD Object Detectors Learn from Foundation Models?},
  author={Liu, Jiahui and Wen, Xin and Zhao, Shizhen and Chen, Yingxian and Qi, Xiaojuan},
  booktitle={European Conference on Computer Vision},
  pages={213--231},
  year={2025},
  organization={Springer}
}

Acknowledgements

This repository is based off of the work from Du et al (ICLR 2022) and Wilson et al (ICCV 2023). Please support their work.
This work is powered by Detectron2, Stable-Diffusion, ChatGPT, and Segment-Anything. Thanks to these projects.

Train an OOD Detector

We utilize synthetic Out-of-Distribution(OOD) samples and original In-Distribution(ID) samples to train a lightweight, plug-and-play OOD detector in a very efficient way, achieving state-of-the-art OOD object detection.
We mainly conduct the experiments on Ubuntu 20.04 with GeForce RTX 3090 GPUs.

1. Environment Setup

We mainly use Conda for installation and provide the enviroment files in requirements.txt and requirements.yml, you can choose one file for setting up the environment (we use the similar environment with Du et al (ICLR 2022) and Wilson et al (ICCV 2023)).

2. Datasets

Original Data

Here you should prepare:

Two ID datasets (PASCAL-VOC, BDD-100K).
Two OOD datasets (MS-COCO, OpenImages).

Download all the datasets (following the Dataset Preparation of the benckmark) in your own pre-defined data root path: DATASET_DIR. Your dataset structure should follow:

└── DATASET_DIR
	└── VOC_0712_converted
		|
		├── JPEGImages
		├── voc0712_train_all.json
		└── val_coco_format.json
	└── COCO
		|
		├── annotations
			├── xxx.json (the original json files)
			├── instances_val2017_ood_wrt_bdd_rm_overlap.json
			└── instances_val2017_ood_rm_overlap.json
		├── train2017
		└── val2017
	└── bdd100k
		|
		├── images
		├── val_bdd_converted.json
		└── train_bdd_converted.json
	└── OpenImages
		|
		├── coco_classes
		└── ood_classes_rm_overlap

Synthetic Data

Here you should prepare two synthetic OOD datasets for training:

SyncOOD_VOC: edited and processed from the above original dataset PASCAL-VOC.
SyncOOD_BDD: edited and processed from the above original dataset BDD-100K.

Here you can download our processed demo_datasets from our DataPage,
or synthesis and perpare your own synthetic OOD data with the pipeline of Synthesize Novel Samples.
Your dataset structure should be updated as:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
		|
		├── images
		└── info_raw.json
	└── SyncOOD_BDD
		|
		├── images
		└── info_raw.json

Data Pre-processing

Now you should pre-process the synthetic data infomation with your pre-defined data root path DATASET_DIR. Ensure we are in the path (from the root path: SyncOOD/) of data tools:

cd ./tools

Run the script with DATASET_DIR:

python align_ood_info.py --dataroot DATASET_DIR

When finishing, your dataset structure should be updated as:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
		|
		├── images
		├── info_raw.json
		└── info.json
	└── SyncOOD_BDD
		|
		├── images
		├── info_raw.json
		└── info.json

3. Base Object Detectors

Detector Checkpoints

We are training a plug-and-play OOD detector with off-the-shelf base object detectors (Faster R-CNN and VOS).
Here you can follow VOS repository to train your own base object detectors,
or download our well-trained base_detectors checkpoints from our DataPage.

Save all the checkpoints in a flexible detector root path: DETECTOR_DIR and follow the structure as:

└── DETECTOR_DIR
	└── frcnn_voc.pth
	└── vos_voc.pth
	└── frcnn_bdd.pth
	└── vos_bdd.pth

Detector Configs

Ensure we are in the path (from the root path: SyncOOD/) of base detectors:

cd ./detection/configs

Here please ensure which base object detector you would like to use (Faster R-CNN or VOS):

For VOC as ID dataset:
Modify the path of WEIGHTS in line4 in VOC-Detection/faster-rcnn/vanilla.yaml as:
DETECTOR_DIR/frcnn_voc.pth for Faster R-CNN or DETECTOR_DIR/vos_voc.pth for VOS.
For BDD as ID dataset:
Modify the path of WEIGHTS in line4 in BDD-Detection/faster-rcnn/vanilla.yaml as:
DETECTOR_DIR/frcnn_bdd.pth for Faster R-CNN or DETECTOR_DIR/vos_bdd.pth for VOS.

4. Feature Extraction

Feature extraction may consume a lot of disk space and memory, especially on the BDD100K dataset.
If you are using the checkpoints provided from our base_detectors, here you can download our extracted_features from our DataPage into the updated dataset structure and skip this step,
or follow the instructions to extract your own features:

Ensure we are in the path (from the root path: SyncOOD/) of feature extraction:

cd ./OOD_OBJ_DET

Firstly we extract ID features from original ID samples:

sh feature_extraction_id.sh

Set CUDA_VISIBLE_DEVICES with a GPU ID number (e.g. CUDA_VISIBLE_DEVICES=0);
Set --tdset with the ID dataset (--tdset VOC or --tdset BDD);
Set --dataset-dir as --dataset-dir DATASET_DIR with your pre-defined data root path DATASET_DIR.

Then we extract OOD features from synthetic OOD samples:

sh feature_extraction_ood.sh

Set CUDA_VISIBLE_DEVICES with a GPU ID number (e.g. CUDA_VISIBLE_DEVICES=0);
Set --tdset with the related ID dataset (--tdset VOC or --tdset BDD);
Set --dataset-dir as --dataset-dir DATASET_DIR with your pre-defined data root path DATASET_DIR.

Finally your updated dataset structure should be:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
	└── SyncOOD_BDD
	└── VOC_features
		|
		├── VOC-RCNN-RN50-id.hdf5
		└── VOC-RCNN-RN50-ood.hdf5
	└── BDD_features
		|
		├── BDD-RCNN-RN50-id.hdf5
		└── BDD-RCNN-RN50-ood.hdf5

5. Training the OOD detector

Ensure we are in the path (from the root path: SyncOOD/) of OOD detector training:

cd ./OOD_OBJ_DET

Then train an OOD detector:

sh train.sh

Set CUDA_VISIBLE_DEVICES with a GPU ID number (e.g. CUDA_VISIBLE_DEVICES=0);
Set --tdset with the ID dataset (--tdset VOC or --tdset BDD);
Set --dataset-dir as --dataset-dir DATASET_DIR with your pre-defined data root path DATASET_DIR.

The obtained OOD detector checkpoints are saved together with the extracted features, so the current data structure is:

└── DATASET_DIR
	└── VOC_0712_converted
	└── COCO
	└── bdd100k
	└── OpenImages
 	└── SyncOOD_VOC
	└── SyncOOD_BDD
	└── VOC_features
		|
		├── VOC-RCNN-RN50-id.hdf5
		├── VOC-RCNN-RN50-ood.hdf5
		└── VOC-RCNN-RN50-mlp.pth
	└── BDD_features
		|
		├── BDD-RCNN-RN50-id.hdf5
		├── BDD-RCNN-RN50-ood.hdf5
		└── BDD-RCNN-RN50-mlp.pth

6. Evaluation

Ensure we are in the path (from the root path: SyncOOD/) of evaluating the obtained OOD detectors:

sh evaluation.sh

Set CUDA_VISIBLE_DEVICES with a GPU ID number (e.g. CUDA_VISIBLE_DEVICES=0);
Set --tdset with the ID dataset (--tdset VOC or --tdset BDD);
Set --dataset-dir as --dataset-dir DATASET_DIR with you pre-defined data root path DATASET_DIR.
Set --mlp-path as --mlp-path DATASET_DIR/xxx_features/xxx-RCNN-RN50-mlp.pth with your pre-defined data root path DATASET_DIR and replace xxx with your ID dataset (VOC or BDD).

Finally you can get FPR95, AUROC, and AUPR of your OOD detector on two OOD dataset.

Synthesize Novel Samples

We aim to develop an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects and provide coco-format annotations to help 1) training OOD detectors and 2) exploring more general open-world tasks (comming soon).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can OOD Object Detectors Learn from Foundation Models?

Quick Guide

Abstract

Key Contributions

Citation

Acknowledgements

Train an OOD Detector

1. Environment Setup

2. Datasets

Original Data

Synthetic Data

Data Pre-processing

3. Base Object Detectors

Detector Checkpoints

Detector Configs

4. Feature Extraction

5. Training the OOD detector

6. Evaluation

Synthesize Novel Samples

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
OOD_OBJ_DET		OOD_OBJ_DET
detection/configs		detection/configs
pages		pages
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements.yml		requirements.yml

License

CVMI-Lab/SyncOOD

Folders and files

Latest commit

History

Repository files navigation

Can OOD Object Detectors Learn from Foundation Models?

Quick Guide

Abstract

Key Contributions

Citation

Acknowledgements

Train an OOD Detector

1. Environment Setup

2. Datasets

Original Data

Synthetic Data

Data Pre-processing

3. Base Object Detectors

Detector Checkpoints

Detector Configs

4. Feature Extraction

5. Training the OOD detector

6. Evaluation

Synthesize Novel Samples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages