GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Yushi Lan¹ Shangchen Zhou¹ Zhaoyang Lyu¹ Fangzhou Hong¹ Shuai Yang²
Bo Dai ³ Xingang Pan ¹ Chen Change Loy ¹

S-Lab, Nanyang Technological University¹;
Wangxuan Institute of Computer Technology, Peking University²;
Shanghai Artificial Intelligence Laboratory ³

GaussianAnything generates high-quality and editable surfel Gaussians through a cascaded native 3D diffusion pipeline, given single-view images or texts as the conditions.

GaussianAnything.mp4

For more visual results, go checkout our project page 📃

~~Codes coming soon 👊~~

This repository contains the official implementation of GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

[Project Page] • [arXiv] • [Gradio Demo]

Abstract

While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

📣 Updates

[28/Nov/2024] Release gradio demo (Huggingface ZeroGPU), which supports image-to-3D generation:

[27/Nov/2024] Release gradio demo (local version), which supports image-to-3D generation. Simply call python scripts/gradio_app_cascaded.py.

[27/Nov/2024] Support colored point cloud (2D Gaussians centers) and TSDF mesh export. Enabled by default--export_mesh True.

[24/Nov/2024] Inference code and checkpoint release.

[13/Nov/2024] Initial release.

🐪 TODO

Release inference code and checkpoints.
Release Training code.
Release pre-extracted latent codes for 3D diffusion training.
Release Gradio Demo (locally).
Release Gradio Demo (Huggingface ZeroGPU) for image-to-3D generation, check it here!
Release the evaluation code.
Lint the code.

Inference

setup the PyTorch environment (the same env as LN3Diff, ECCV 2024)

# download
git clone https://github.com/NIRVANALAN/GaussianAnything.git

# setup the pytorch+xformers+pytorch3d environment
conda create -n ga python=3.10
conda activate ga
pip install -r requirements.txt 
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

Then, install the 2DGS dependencies:

pip install "git+https://github.com/hbb1/diff-surfel-rasterization.git"
pip install "git+https://gitlab.inria.fr/bkerbl/simple-knn.git"

Gradio demo (Image-to-3D)

For image-to-3D generation with GaussianAnything, we have provided a gradio interface. After setting up the environment, please run python scripts/gradio_app_cascaded.py to launch the gradio locally. The code has been tested on V100 32GiB GPU.

Checkpoints

All diffusion checkpoints will be automatically loaded from huggingface.co/yslan/GaussianAnything.
The results will be directly dumped to ./logs, and can be modified by changing $logdir in the bash file accordingly.

To set the CFG score and random seed, please update $unconditional_guidance_scale$ and $seed$ in the bash file.

I23D (requires two stage generation):

set the $data_dir accordingly. For some demo image, please download from huggingfac.co/yslan/GaussianAnything/demo-img. We have also included the demo images shown in the paper in ./assets/demo-image-for-i23d/instantmesh and ./assets/demo-image-for-i23d/gso.

In the bash file, we set data_dir="./assets/demo-image-for-i23d/instantmesh" by default.

stage-1 (point cloud generation):

bash shell_scripts/release/inference/i23d/i23d-stage1.sh

The sparse point cloud wll be saved to, e.g., logs/i23d/stage-1/dino_img/house2-input/sample-0-0.ply. Note that $num_samples$ samples will be saved, which is set in the bash file.

Then, set the $stage_1_output_dir to the $logdir of the above stage.

stage-2 (2D Gaussians generation):

bash shell_scripts/release/inference/i23d/i23d-stage2.sh

In the output dir of each instance, e.g., ./logs/i23d/stage-2/dino_img/house2-input, the code dumped the colored point cloud extracted from the surfel Gaussians center (xyz+RGB) sample-0-0-gaussian-pcd.ply, as well as the TSDF mesh stage1ID_0-stage2ID-0-mesh.obj:

Both can be visualized by meshlab.

Text-2-3D (requires two stage generation):

Please update the caption for 3D generation in datasets/caption-forpaper.txt. T o change the number of samples to be generated, please change $num_samples in the bash file.

stage-1 (point cloud generation):

bash shell_scripts/release/inference/t23d/stage1-t23d.sh

then, set the $stage_1_output_dir to the $logdir of the above stage.

stage-2 (2D Gaussians generation):

bash shell_scripts/release/inference/t23d/stage2-t23d.sh

The results will be dumped to ./logs/t23d/stage-2

3D VAE Reconstruction:

To encode a 3D asset into the latent point cloud, please download the pre-trained VAE checkpoint from huggingfac.co/yslan/gaussiananything/ckpts/vae/model_rec1965000.pt to ./checkpoint/model_rec1965000.pt.

Then, run the inference script

bash shell_scripts/release/inference/vae-3d.sh

This will encode the mulit-view 3D renderings in ./assets/demo-image-for-i23d/for-vae-reconstruction/Animals/0 into the point-cloud structured latent code, and export them (along with the 2dgs mesh) in ./logs/latent_dir/. The exported latent code will be used for efficient 3D diffusion training.

Note that if you want to use the pre-extracted 3D latent codes, please check the following instructions.

Training (Flow Matching 3D Generation)

All the training is conducted on 8 A100 (80GiB) with BF16 enabled. For training on V100, please use FP32 training by setting --use_amp False in the bash file. Feel free to tune the $batch_size in the bash file accordingly to match your VRAM. To enable optimized RMSNorm, feel free to install Apex.

To facilitate reproducing the performance, we have uploaded the pre-extracted poind cloud-structured latent codes to the huggingfac.co/yslan/gaussiananything/dataset/latent.tar.gz (34GiB required). Please download the pre extracted point cloud latent codes, unzip and set the $mv_latent_dir in the bash file accordingly.

Text to 3D:

Please donwload the 3D caption from hugging face huggingfac.co/yslan/GaussianAnything/dataset/text_captions_3dtopia.json, and put it under dataset.

Note that if you want to train a specific class of Objaverse, just manually change the code at datasets/g_buffer_objaverse.py:3043.

stage-1 training (point cloud generation):

bash shell_scripts/release/train/stage2-t23d/t23d-pcd-gen.sh

stage-2 training (point cloud-conditioned KL feature generation):

bash shell_scripts/release/train/stage2-t23d/t23d-klfeat-gen.sh

(single-view) Image to 3D

Please download g-buffer dataset first.

stage-1 training (point cloud generation):

bash shell_scripts/release/train/stage2-i23d/i23d-pcd-gen.sh

stage-2 training (point cloud-conditioned KL feature generation):

bash shell_scripts/release/train/stage2-i23d/i23d-klfeat-gen.sh

BibTex

If you find our work useful for your research, please consider citing the paper:

@article{lan2024ga,
    title={GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation}, 
    author={Yushi Lan and Shangchen Zhou and Zhaoyang Lyu and Fangzhou Hong and Shuai Yang and Bo Dai and Xingang Pan and Chen Change Loy},
    eprint={2411.08033},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

🗞️ License

Distributed under the NTU S-Lab License. See LICENSE for more information.

❤️ Acknowledgement

Our flow-mathcing training code is built on SiT, and the rendering code is from 2DGS. The training dataset (Objaverse-V1) rendering is provided by G-Buffer Objaverse. Also, we thank all the 3D artists who have shared their created 3D assets under CC license for the academic use.

Also, this work is built on our previous work on the native 3D diffusion generative model, LN3Diff, ECCV 2024. Please feel free to have a check.

Contact

If you have any question, please feel free to contact us via [email protected] or Github issues.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
cldm		cldm
configs		configs
dataset		dataset
datasets		datasets
dit		dit
dnnlib		dnnlib
guided_diffusion		guided_diffusion
ldm		ldm
nsr		nsr
scripts		scripts
sgm		sgm
shell_scripts		shell_scripts
torch_utils		torch_utils
transport		transport
utils		utils
vit		vit
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
eval_pose.pt		eval_pose.pt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

[Project Page] • [arXiv] • [Gradio Demo]

Abstract

📣 Updates

🐪 TODO

Inference

setup the PyTorch environment (the same env as LN3Diff, ECCV 2024)

Gradio demo (Image-to-3D)

Checkpoints

I23D (requires two stage generation):

Text-2-3D (requires two stage generation):

3D VAE Reconstruction:

Training (Flow Matching 3D Generation)

Text to 3D:

(single-view) Image to 3D

BibTex

🗞️ License

❤️ Acknowledgement

Contact

About

Releases

Packages

Languages

License

NIRVANALAN/GaussianAnything

Folders and files

Latest commit

History

Repository files navigation

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

[Project Page] • [arXiv] • [Gradio Demo]

Abstract

📣 Updates

🐪 TODO

Inference

setup the PyTorch environment (the same env as LN3Diff, ECCV 2024)

Gradio demo (Image-to-3D)

Checkpoints

I23D (requires two stage generation):

Text-2-3D (requires two stage generation):

3D VAE Reconstruction:

Training (Flow Matching 3D Generation)

Text to 3D:

(single-view) Image to 3D

BibTex

🗞️ License

❤️ Acknowledgement

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages