Mandi Zhao, Yijia Weng, Dominik Bauer, Shuran Song
Use conda environment with Python 3.9, and install packages from the provided .yaml
file
conda create -n real2code python=3.9
conda activate real2code
conda env update --file environment.yml --prune
This repo encapsulates multiple sub-modules of the Real2Code pipeline.
- Dataset: Overall, all modules use the same synthetic dataset of RGBD images, part-level meshes, and code snippets for joint structures for each object. We have released this dataset here, and provide processing & rendering utility scripts in
data_utils/
if you want to generate your own data. - Part-level 2D-Segmentation and 3D Shape Completion: With the same set of objects, we fine-tune a 2D SAM model for part-level segmentation and train a PointNet-based model for 3D shape completion. More details on each sub-module is further documented in the READMEs in part segmentation and shape completion.
- LLM Fine-tuning: We fine-tune a CodeLlama model on the code representations of our articulated objects. See this fork for LLM fine-tuning script.
- Real World Evaluation See
real_obj/
. We use DUSt3R to achieve reconstruction from multi-view pose-free RGB images, the DUSt3R-generated 3D pointmaps are provided in the real world dataset below.
Our dataset is built on top of PartNet-Mobilty assets, and the same set of objects are used for training and testing throughout our SAM fine-tuning, shape completion model training, and LLM fine-tuning modules. The full dataset will be released here: https://drive.google.com/drive/folders/1rkUP7NBRQX5h6ixJr9SvX0Vh3fhj1YqO?usp=drive_link
We have released the real objects data used for evaluating Real2Code. These are objects found in the common lab/household settings around Stanford campus. Raw data is captured using a LiDAR-equipped iPhone camera and the 3dScanner App
- Download: Google Drive Link
- Structure: each object folder is structured as follows:
Each
ls obj_id/ - raw/ - sam/ - a list of (id.jpg, id_mask.png, id_scene.npz),
id
corresponds to one 512x512 RGB image selected from the raw dataset, e.g.00000.jpg
;id_mask.png
is the foreground object mask obtained from prompting the SAM model with randomly sampled query points in the image margin area;id_scene.npz
is the globally-aligned 3D point-cloud obtained from DUSt3R.