Skip to content
/ Coin3D Public

[SIGGRAPH 2024] Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Notifications You must be signed in to change notification settings

zju3dv/Coin3D

Repository files navigation

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

ToDo List

  • Inference code and pretrained models.
  • Interactive workflow.
  • Training data.
  • Blender Addons

Preparation for inference

  1. Install packages in requirements.txt. We test our model on a A100-80G GPU with 11.8 CUDA and 2.0.1 pytorch.
conda create -n coin3d
conda activate coin3d
pip install -r requirements.txt
  1. Download checkpoints
mkdir ckpt
cd ckpt
wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/ViT-L-14.pt

wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/model.ckpt

Inference

  1. Make sure you have the following models.
Coin3D
|-- ckpt
    |-- ViT-L-14.pt
    |-- model.ckpt
  1. We provide a workflow that uses a custom mesh and text prompt to generate the input image. You can refer to this instruction.
  2. (Optional) Make sure the input image has a white background. Here we refer to SyncDreamer and use the following tools for foreground segmentation. Predict foreground mask as the alpha channel. We use Paint3D to segment the foreground object interactively. We also provide a script foreground_segment.py using carvekit to predict foreground masks and you need to first crop the object region before feeding it to foreground_segment.py. We may double check the predicted masks are correct or not.
python3 foreground_segment.py --input <image-file-to-input> --output <image-file-in-png-format-to-output>
  1. Using coarse proxy to control 3D generation of multi-view images.
python3 generate.py \
        --cfg configs/ctrldemo.yaml \
        --ckpt ckpt/model.ckpt \
        --input example/panda/input.png \
        --input_proxy example/panda/proxy.txt \
        --output output/custom \
        --sample_num 1 \
        --cfg_scale 2.0 \
        --elevation 30 \
        --ctrl_end_step 1.0 \
        --sampler ddim_demo

Explanation:

  • --cfg is the model configuration.
  • --ckpt is the checkpoint to load.
  • --input is the input image in the RGBA form. The alpha value means the foreground object mask.
  • --input_proxy is the input coarse proxy. The proxy contains 256 points by default. misc.ipynb contains code for using the coarse mesh sampling proxy.
  • --output is the output directory. Results would be saved to output/custom/0.png which contains 16 images of predefined viewpoints per png file.
  • --sample_num is the number of instances we will generate.
  • --cfg_scale is the classifier-free-guidance. 2.0 is OK for most cases.
  • --elevation is the elevation angle of the input image in degree. Need to be set to 30.
  • --ctrl_end_step is the timestamp of ending 3D control, from 0 to 1.0, usually set to 0.6 to 1.0.
  1. Run a NeuS or a NeRF for 3D reconstruction.
# train a neus
python3 train_renderer.py -i output/custom/0.png \
                         -n custom-neus \
                         -b configs/neus.yaml \
                         -l output/renderer 
# train a nerf
python3 train_renderer.py -i output/custom/0.png \
                         -n custom-nerf \
                         -b configs/nerf.yaml \
                         -l output/renderer

Explanation:

  • -i contains the multiview images generated by SyncDreamer. Since SyncDreamer does not always produce good results, we may need to select a good generated image set (from 0.png to 3.png) for reconstruction.
  • -n means the name. -l means the log dir. Results will be saved to <log_dir>/<name> i.e. output/renderer/custom-neus and output/renderer/custom-nerf.

Dataset

We train the model on the Objaverse LVIS dataset. The preprocessed data can be found here. We use the script for rendering multi-view images in SyncDreamer. The script of object extraction proxy can refer to misc.

Training

Please note that you need to set the data directory location in the config file.

target_dir: path/to/renderings-v1 # renderings of target views
input_dir: path/to/renderings-random # renderings of input views
proxy_dir: path/to/proxy_256 # proxys of target objects
python3 train_syncdreamer.py -b configs/coin3d_train.yaml \
                           --finetune_from ckpt/syncdreamer-pretrain.ckpt \
                           -l ./logs/coin3d \
                           -c ./ckpt/coin3d \
                           --gpus 0

Acknowledgement

We deeply appreciate the authors of the following repositories for generously sharing their code, which we have extensively utilized. Their contributions have been invaluable to our work, and we are grateful for their openness and willingness to share their expertise. Our project has greatly benefited from their efforts and dedication.

Citation

If you find this repository useful in your project, please cite the following work. :)

@article{dong2024coin3d,
  title={Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning},
  author={Dong, Wenqi and Yang, Bangbang and Ma, Lin and Liu, Xiao and Cui, Liyuan and Bao, Hujun and Ma, Yuewen and Cui, Zhaopeng},
  year={2024},
  eprint={2405.08054},
  archivePrefix={arXiv},
  primaryClass={cs.GR}
}

About

[SIGGRAPH 2024] Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published