Skip to content

Latest commit

 

History

History
83 lines (73 loc) · 3.06 KB

README.md

File metadata and controls

83 lines (73 loc) · 3.06 KB

MiKASA-3DVG (CVPR 24)

MiKASA

1. Data Preparation

1.1. ScanNet Data

To download the ScanNet scans, see ScanNet for the instruction. To preprocess the datarequired for Referit3D challenge, visit ReferIt3D.

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

See ReferIt3D for more details.

1.3. Pre-trained weight of Bert

Please down load the pre-trained weight at Huggingface.

1.4. Directory Structure

The final required files are as follows:

MiKASA/
│────────── logs/ # Training logs
│────────── external_tools/
|           │────────── pointnet2/
|           └── ...
|────────── models/
|────────── scripts/
|           │────────── train_referit3d.py
|           └── ...
|────────── utils/
└── ...

2. Environment

All experiments were conducted using a single A100-80GB GPU.

  • Ubuntu: 20.04
  • CUDA: 11.7
  • PyTorch: 1.13
  • python: 3.7

3. Installation

For the dependencies please refer MVT. Additionally, please install easydict and pyyaml.

  • To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.
    cd external_tools/pointnet2
    python setup.py install

5. Run

  • Firstly please specify the path of scannet_file, referit3D_file, and bert_pretrain_path in the config file.

5.1. Training

  • To train on either Nr3d or Sr3d dataset, use the following commands
    python scripts/train_referit3d.py \
    --log-dir $PATH_OF_LOG_AND_CHECKPOINT$ \
    --config-file $PATH_OF_CONFIG$

5.2. Evaluation

  • To evaluate on either Nr3d or Sr3d dataset, plese add the following arguments
    --resume-path $PATH_OF_CHECKPOINT$ \
    --mode evaluate \

6. Checkpoints

Link

Citation

@inproceedings{chang2024mikasa,
  title={MiKASA: Multi-Key-Anchor \& Scene-Aware Transformer for 3D Visual Grounding},
  author={Chang, Chun-Peng and Wang, Shaoxiang and Pagani, Alain and Stricker, Didier},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14131--14140},
  year={2024}
}

Credit

The project is built based on the following repository: