Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Project Page | Video | Paper

ToDo List

Inference code and pretrained models.
Interactive workflow.
Training data.
Blender Addons

Preparation for inference

Install packages in requirements.txt. We test our model on a A100-80G GPU with 11.8 CUDA and 2.0.1 pytorch.

conda create -n coin3d
conda activate coin3d
pip install -r requirements.txt

Download checkpoints

mkdir ckpt
cd ckpt
wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/ViT-L-14.pt

wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/model.ckpt

Inference

Make sure you have the following models.

Coin3D
|-- ckpt
    |-- ViT-L-14.pt
    |-- model.ckpt

We provide a workflow that uses a custom mesh and text prompt to generate the input image. You can refer to this instruction.
(Optional) Make sure the input image has a white background. Here we refer to SyncDreamer and use the following tools for foreground segmentation. Predict foreground mask as the alpha channel. We use Paint3D to segment the foreground object interactively. We also provide a script foreground_segment.py using carvekit to predict foreground masks and you need to first crop the object region before feeding it to foreground_segment.py. We may double check the predicted masks are correct or not.

python3 foreground_segment.py --input <image-file-to-input> --output <image-file-in-png-format-to-output>

Using coarse proxy to control 3D generation of multi-view images.

python3 generate.py \
        --cfg configs/ctrldemo.yaml \
        --ckpt ckpt/model.ckpt \
        --input example/panda/input.png \
        --input_proxy example/panda/proxy.txt \
        --output output/custom \
        --sample_num 1 \
        --cfg_scale 2.0 \
        --elevation 30 \
        --ctrl_end_step 1.0 \
        --sampler ddim_demo

Explanation:

--cfg is the model configuration.
--ckpt is the checkpoint to load.
--input is the input image in the RGBA form. The alpha value means the foreground object mask.
--input_proxy is the input coarse proxy. The proxy contains 256 points by default. misc.ipynb contains code for using the coarse mesh sampling proxy.
--output is the output directory. Results would be saved to output/custom/0.png which contains 16 images of predefined viewpoints per png file.
--sample_num is the number of instances we will generate.
--cfg_scale is the classifier-free-guidance. 2.0 is OK for most cases.
--elevation is the elevation angle of the input image in degree. Need to be set to 30.
--ctrl_end_step is the timestamp of ending 3D control, from 0 to 1.0, usually set to 0.6 to 1.0.

Run a NeuS or a NeRF for 3D reconstruction.

# train a neus
python3 train_renderer.py -i output/custom/0.png \
                         -n custom-neus \
                         -b configs/neus.yaml \
                         -l output/renderer 
# train a nerf
python3 train_renderer.py -i output/custom/0.png \
                         -n custom-nerf \
                         -b configs/nerf.yaml \
                         -l output/renderer

Explanation:

-i contains the multiview images generated by SyncDreamer. Since SyncDreamer does not always produce good results, we may need to select a good generated image set (from 0.png to 3.png) for reconstruction.
-n means the name. -l means the log dir. Results will be saved to <log_dir>/<name> i.e. output/renderer/custom-neus and output/renderer/custom-nerf.

Dataset

We train the model on the Objaverse LVIS dataset. The preprocessed data can be found here. We use the script for rendering multi-view images in SyncDreamer. The script of object extraction proxy can refer to misc.

Training

Please note that you need to set the data directory location in the config file.

target_dir: path/to/renderings-v1 # renderings of target views
input_dir: path/to/renderings-random # renderings of input views
proxy_dir: path/to/proxy_256 # proxys of target objects

python3 train_syncdreamer.py -b configs/coin3d_train.yaml \
                           --finetune_from ckpt/syncdreamer-pretrain.ckpt \
                           -l ./logs/coin3d \
                           -c ./ckpt/coin3d \
                           --gpus 0

Acknowledgement

We deeply appreciate the authors of the following repositories for generously sharing their code, which we have extensively utilized. Their contributions have been invaluable to our work, and we are grateful for their openness and willingness to share their expertise. Our project has greatly benefited from their efforts and dedication.

Citation

If you find this repository useful in your project, please cite the following work. :)

@article{dong2024coin3d,
  title={Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning},
  author={Dong, Wenqi and Yang, Bangbang and Ma, Lin and Liu, Xiao and Cui, Liyuan and Bao, Hujun and Ma, Yuewen and Cui, Zhaopeng},
  year={2024},
  eprint={2405.08054},
  archivePrefix={arXiv},
  primaryClass={cs.GR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
blender_utils		blender_utils
configs		configs
example		example
externs		externs
ldm		ldm
media		media
meta_info		meta_info
raymarching		raymarching
renderer		renderer
workflow		workflow
CONDITION.md		CONDITION.md
README.md		README.md
foreground_segment.py		foreground_segment.py
generate.py		generate.py
misc.ipynb		misc.ipynb
requirements.txt		requirements.txt
train_diffusion.py		train_diffusion.py
train_renderer.py		train_renderer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Project Page | Video | Paper

ToDo List

Preparation for inference

Inference

Dataset

Training

Acknowledgement

Citation

About

Releases

Packages

Languages

zju3dv/Coin3D

Folders and files

Latest commit

History

Repository files navigation

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Project Page | Video | Paper

ToDo List

Preparation for inference

Inference

Dataset

Training

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages