MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

To download the ScanNet scans, see ScanNet for the instruction. To preprocess the datarequired for Referit3D challenge, visit ReferIt3D.

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

See ReferIt3D for more details.

Nr3D (10.7MB)
Sr3D (19MB)
Sr3D+ (20MB)

1.3. Pre-trained weight of Bert

Please down load the pre-trained weight at Huggingface.

1.4. Directory Structure

The final required files are as follows:

MiKASA/
│────────── logs/ # Training logs
│────────── external_tools/
|           │────────── pointnet2/
|           └── ...
|────────── models/
|────────── scripts/
|           │────────── train_referit3d.py
|           └── ...
|────────── utils/
└── ...

2. Environment

All experiments were conducted using a single A100-80GB GPU.

Ubuntu: 20.04
CUDA: 11.7
PyTorch: 1.13
python: 3.7

3. Installation

For the dependencies please refer MVT. Additionally, please install easydict and pyyaml.

To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd external_tools/pointnet2
    python setup.py install

5. Run

Firstly please specify the path of scannet_file, referit3D_file, and bert_pretrain_path in the config file.

5.1. Training

To train on either Nr3d or Sr3d dataset, use the following commands

    python scripts/train_referit3d.py \
    --log-dir $PATH_OF_LOG_AND_CHECKPOINT$ \
    --config-file $PATH_OF_CONFIG$

5.2. Evaluation

To evaluate on either Nr3d or Sr3d dataset, plese add the following arguments

    --resume-path $PATH_OF_CHECKPOINT$ \
    --mode evaluate \

6. Checkpoints

Link

Citation

@inproceedings{chang2024mikasa,
  title={MiKASA: Multi-Key-Anchor \& Scene-Aware Transformer for 3D Visual Grounding},
  author={Chang, Chun-Peng and Wang, Shaoxiang and Pagani, Alain and Stricker, Didier},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14131--14140},
  year={2024}
}

Credit

The project is built based on the following repository:

ReferIt3D
MVT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

1.3. Pre-trained weight of Bert

1.4. Directory Structure

2. Environment

3. Installation

5. Run

5.1. Training

5.2. Evaluation

6. Checkpoints

Citation

Credit

Files

README.md

Latest commit

History

README.md

File metadata and controls

MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

1.3. Pre-trained weight of Bert

1.4. Directory Structure

2. Environment

3. Installation

5. Run

5.1. Training

5.2. Evaluation

6. Checkpoints

Citation

Credit