Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

Results and Models

LVIS

We train the model on LVIS dataset with only base-category annotations, and validate the model on LVIS v1 val with both base and novel categories. The text prompts, provided by DetPro, used for LVIS dataset is same as in ViLD.

Model	mask AP_r / AP_c / AP_f / AP	bbox AP_r / AP_c / AP_f / AP	Config	Text Prompt	Download
DK-DETR	20.5 / 29.0 / 35.3 / 30.0	22.4 / 31.9 / 40.1 / 33.5	config	Google Drive	Google Drive \| BaiduYun

Generalization Ability

To demonstrate the generalization ability of the open-vocabulary object detection model, we directly evaluate the LVIS-trained model on COCO, Objects365 and Pascal VOC datasets.

Model	Dataset	AP	AP⁵⁰	AP⁷⁵	Config	Text Prompt	Download
DK-DETR	COCO	39.3	54.5	42.8	config	Google Drive	Google Drive \| BaiduYun
DK-DETR	Objects365	13.0	17.9	13.9	config	Google Drive	Google Drive \| BaiduYun
DK-DETR	Pascal VOC	-	71.1	61.3	config	Google Drive	Google Drive \| BaiduYun

Citation

@inproceedings{li2023distilling,
  title={Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection},
  author={Li, Liangqi and Miao, Jiaxu and Shi, Dahu and Tan, Wenming and Ren, Ye and Yang, Yi and Pu, Shiliang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6501--6510},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

Results and Models

LVIS

Generalization Ability

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

Results and Models

LVIS

Generalization Ability

Citation