We train the model on LVIS dataset with only base-category annotations, and validate the model on LVIS v1 val
with both base and novel categories. The text prompts, provided by DetPro, used for LVIS dataset is same as in ViLD.
Model | mask APr / APc / APf / AP | bbox APr / APc / APf / AP | Config | Text Prompt | Download |
---|---|---|---|---|---|
DK-DETR | 20.5 / 29.0 / 35.3 / 30.0 | 22.4 / 31.9 / 40.1 / 33.5 | config | Google Drive | Google Drive | BaiduYun |
To demonstrate the generalization ability of the open-vocabulary object detection model, we directly evaluate the LVIS-trained model on COCO, Objects365 and Pascal VOC datasets.
Model | Dataset | AP | AP50 | AP75 | Config | Text Prompt | Download |
---|---|---|---|---|---|---|---|
DK-DETR | COCO | 39.3 | 54.5 | 42.8 | config | Google Drive | Google Drive | BaiduYun |
DK-DETR | Objects365 | 13.0 | 17.9 | 13.9 | config | Google Drive | Google Drive | BaiduYun |
DK-DETR | Pascal VOC | - | 71.1 | 61.3 | config | Google Drive | Google Drive | BaiduYun |
@inproceedings{li2023distilling,
title={Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection},
author={Li, Liangqi and Miao, Jiaxu and Shi, Dahu and Tan, Wenming and Ren, Ye and Yang, Yi and Pu, Shiliang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={6501--6510},
year={2023}
}