MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

To download the ScanNet scans, see ScanNet for the instruction. To preprocess the datarequired for Referit3D challenge, visit ReferIt3D.

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

See ReferIt3D for more details.

Nr3D (10.7MB)
Sr3D (19MB)
Sr3D+ (20MB)

1.3. Pre-trained weight of Bert

Please down load the pre-trained weight at Huggingface.

1.4. Directory Structure

The final required files are as follows:

MiKASA/
│────────── logs/ # Training logs
│────────── external_tools/
|           │────────── pointnet2/
|           └── ...
|────────── models/
|────────── scripts/
|           │────────── train_referit3d.py
|           └── ...
|────────── utils/
└── ...

2. Environment

All experiments were conducted using a single A100-80GB GPU.

Ubuntu: 20.04
CUDA: 11.7
PyTorch: 1.13
python: 3.7

3. Installation

For the dependencies please refer MVT. Additionally, please install easydict and pyyaml.

To use a PointNet++ visual-encoder you need to compile its CUDA layers for PointNet++: Note: To do this compilation also need: gcc5.4 or later.

    cd external_tools/pointnet2
    python setup.py install

5. Run

Firstly please specify the path of scannet_file, referit3D_file, and bert_pretrain_path in the config file.

5.1. Training

To train on either Nr3d or Sr3d dataset, use the following commands

    python scripts/train_referit3d.py \
    --log-dir $PATH_OF_LOG_AND_CHECKPOINT$ \
    --config-file $PATH_OF_CONFIG$

5.2. Evaluation

To evaluate on either Nr3d or Sr3d dataset, plese add the following arguments

    --resume-path $PATH_OF_CHECKPOINT$ \
    --mode evaluate \

6. Checkpoints

Link

Citation

@inproceedings{chang2024mikasa,
  title={MiKASA: Multi-Key-Anchor \& Scene-Aware Transformer for 3D Visual Grounding},
  author={Chang, Chun-Peng and Wang, Shaoxiang and Pagani, Alain and Stricker, Didier},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14131--14140},
  year={2024}
}

Credit

The project is built based on the following repository:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
analysis		analysis
config		config
data		data
external_tools		external_tools
fig		fig
in_out		in_out
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

1.3. Pre-trained weight of Bert

1.4. Directory Structure

2. Environment

3. Installation

5. Run

5.1. Training

5.2. Evaluation

6. Checkpoints

Citation

Credit

About

Releases

Packages

Languages

dfki-av/MiKASA-3DVG

Folders and files

Latest commit

History

Repository files navigation

MiKASA-3DVG (CVPR 24)

1. Data Preparation

1.1. ScanNet Data

1.2. Referit3D Linguistic Data (Nr3D/Sr3D/Sr3D+)

1.3. Pre-trained weight of Bert

1.4. Directory Structure

2. Environment

3. Installation

5. Run

5.1. Training

5.2. Evaluation

6. Checkpoints

Citation

Credit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages