Dataset and codebase for the ICCV2023 paper RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D.
- Referring expression comprehension & object tracking dataset on Ego4D
- 12,038 annotated clips of 41 hours total.
- 2FPS for annotation bboxes with two textual referring expressions for a single object.
- Objects can be out-of-frame of the first-person video (no-referred-object).
[paper][video][code][RefEgo dataset]
Annotations can be downloaded from RefEgo dataset [annotation]. See dataset/README.md for details.
[NEW] We decide to include the test split file in FPS2 in the updated annotation file. Please redownload the annotation if you have old ones.
MDETR-based models and checkpoints and are here. We also add a notebook for trying our model!
RefEgo dataset annotations (bounding boxes and texts) are distributed under CC BY-SA 4.0. Please also follow Ego4D license for videos and images.
@InProceedings{Kurita_2023_ICCV,
author = {Kurita, Shuhei and Katsura, Naoki and Onami, Eri},
title = {RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {15214-15224}
}