To download the ScanNet scans, see ScanNet for the instruction. To preprocess the datarequired for Referit3D challenge, visit ReferIt3D.
See ReferIt3D for more details.
Please down load the pre-trained weight at Huggingface.
The final required files are as follows:
MiKASA/
│────────── logs/ # Training logs
│────────── external_tools/
| │────────── pointnet2/
| └── ...
|────────── models/
|────────── scripts/
| │────────── train_referit3d.py
| └── ...
|────────── utils/
└── ...
All experiments were conducted using a single A100-80GB GPU.
- Ubuntu: 20.04
- CUDA: 11.7
- PyTorch: 1.13
- python: 3.7
For the dependencies please refer MVT. Additionally, please install easydict and pyyaml.
- To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++:
Note: To do this compilation also need: gcc5.4 or later.
cd external_tools/pointnet2
python setup.py install
- Firstly please specify the path of scannet_file, referit3D_file, and bert_pretrain_path in the config file.
- To train on either Nr3d or Sr3d dataset, use the following commands
python scripts/train_referit3d.py \
--log-dir $PATH_OF_LOG_AND_CHECKPOINT$ \
--config-file $PATH_OF_CONFIG$
- To evaluate on either Nr3d or Sr3d dataset, plese add the following arguments
--resume-path $PATH_OF_CHECKPOINT$ \
--mode evaluate \
@inproceedings{chang2024mikasa,
title={MiKASA: Multi-Key-Anchor \& Scene-Aware Transformer for 3D Visual Grounding},
author={Chang, Chun-Peng and Wang, Shaoxiang and Pagani, Alain and Stricker, Didier},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14131--14140},
year={2024}
}
The project is built based on the following repository: