Project Page | Video | Paper
- Inference code and pretrained models.
- Interactive workflow.
- Training data.
- Blender Addons
- Install packages in
requirements.txt
. We test our model on a A100-80G GPU with 11.8 CUDA and 2.0.1 pytorch.
conda create -n coin3d
conda activate coin3d
pip install -r requirements.txt
- Download checkpoints
mkdir ckpt
cd ckpt
wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/ViT-L-14.pt
wget https://huggingface.co/WenqiDong/Coin3D-v1/resolve/main/model.ckpt
- Make sure you have the following models.
Coin3D
|-- ckpt
|-- ViT-L-14.pt
|-- model.ckpt
- We provide a workflow that uses a custom mesh and text prompt to generate the input image. You can refer to this instruction.
- (Optional) Make sure the input image has a white background. Here we refer to SyncDreamer and use the following tools for foreground segmentation. Predict foreground mask as the alpha channel. We use Paint3D to segment the foreground object interactively.
We also provide a script
foreground_segment.py
usingcarvekit
to predict foreground masks and you need to first crop the object region before feeding it toforeground_segment.py
. We may double check the predicted masks are correct or not.
python3 foreground_segment.py --input <image-file-to-input> --output <image-file-in-png-format-to-output>
- Using coarse proxy to control 3D generation of multi-view images.
python3 generate.py \
--cfg configs/ctrldemo.yaml \
--ckpt ckpt/model.ckpt \
--input example/panda/input.png \
--input_proxy example/panda/proxy.txt \
--output output/custom \
--sample_num 1 \
--cfg_scale 2.0 \
--elevation 30 \
--ctrl_end_step 1.0 \
--sampler ddim_demo
Explanation:
--cfg
is the model configuration.--ckpt
is the checkpoint to load.--input
is the input image in the RGBA form. The alpha value means the foreground object mask.--input_proxy
is the input coarse proxy. The proxy contains 256 points by default. misc.ipynb contains code for using the coarse mesh sampling proxy.--output
is the output directory. Results would be saved tooutput/custom/0.png
which contains 16 images of predefined viewpoints perpng
file.--sample_num
is the number of instances we will generate.--cfg_scale
is the classifier-free-guidance.2.0
is OK for most cases.--elevation
is the elevation angle of the input image in degree. Need to be set to 30.--ctrl_end_step
is the timestamp of ending 3D control, from0
to1.0
, usually set to0.6
to1.0
.
- Run a NeuS or a NeRF for 3D reconstruction.
# train a neus
python3 train_renderer.py -i output/custom/0.png \
-n custom-neus \
-b configs/neus.yaml \
-l output/renderer
# train a nerf
python3 train_renderer.py -i output/custom/0.png \
-n custom-nerf \
-b configs/nerf.yaml \
-l output/renderer
Explanation:
-i
contains the multiview images generated by SyncDreamer. Since SyncDreamer does not always produce good results, we may need to select a good generated image set (from0.png
to3.png
) for reconstruction.-n
means the name.-l
means the log dir. Results will be saved to<log_dir>/<name>
i.e.output/renderer/custom-neus
andoutput/renderer/custom-nerf
.
We train the model on the Objaverse LVIS dataset. The preprocessed data can be found here. We use the script for rendering multi-view images in SyncDreamer. The script of object extraction proxy can refer to misc.
Please note that you need to set the data directory location in the config file.
target_dir: path/to/renderings-v1 # renderings of target views
input_dir: path/to/renderings-random # renderings of input views
proxy_dir: path/to/proxy_256 # proxys of target objects
python3 train_syncdreamer.py -b configs/coin3d_train.yaml \
--finetune_from ckpt/syncdreamer-pretrain.ckpt \
-l ./logs/coin3d \
-c ./ckpt/coin3d \
--gpus 0
We deeply appreciate the authors of the following repositories for generously sharing their code, which we have extensively utilized. Their contributions have been invaluable to our work, and we are grateful for their openness and willingness to share their expertise. Our project has greatly benefited from their efforts and dedication.
If you find this repository useful in your project, please cite the following work. :)
@article{dong2024coin3d,
title={Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning},
author={Dong, Wenqi and Yang, Bangbang and Ma, Lin and Liu, Xiao and Cui, Liyuan and Bao, Hujun and Ma, Yuewen and Cui, Zhaopeng},
year={2024},
eprint={2405.08054},
archivePrefix={arXiv},
primaryClass={cs.GR}
}