In this repo we introduce multi-view conditioning for point-cloud diffusion, we test it in two pipelines: multiple synthetic views from text; multiple views from photos in the wild. We develop an evaluation dataset based on ShapeNet and ModelNet and propose a new metric to assess visually and analitically the overlap between two point clouds. This repo is based on the official implementation of Point-E.
Point-E is a diffusion model: a generative model that approximates a data distribution through noising (forward process) and denoising (backward process). The backward process is also named "sampling", as you start from a noisy point in the distribution and convert it back to signal with some conditional information. In Point-E, we start from a random point cloud of 1024 points and denoise it with images (an object photo) as conditioning signal.
Compared to other techniques in literature, such as Neural Radiance Fields, you can sample a point cloud with Point-E with a single gpu in 1-2 minutes. Sample quality is the price to pay, making this technique ideal for task where point clouds are best suited.
We extend conditioning for point cloud diffusion with multiple views. This tackles the problem of generating objects with duplicated faces, blurring in occluded parts and 3d consistency.
Each conditioning image is encoded with the pre-trained OpenAI CLIP, all the resulting embeddings are concatenated and fed as tokens into the denoising transformer.
See: mv_point_e/models/transformer.py
Original image: Nichol et al. 2022
With inspiration from Watson et al. 2022, a random conditioning image (from a given multi-view set) is fed to the denoising transformer at each diffusion denoising step.
See: sc_point_e/models/transformer.py
Original image: Watson et al. 2022
We use 3D-Diffusion from Watson et al. 2022 to generate 3d-consistent multiple views from a single, text-generated image (with stable diffusion 2). The current model is pre-trained on SRNCars, a ShapeNet version will be released soon (contribute here).
There are two variants for multi-view:
- Patch concatenation:
mv_point_e
- Stochastic conditioning:
sc_point_e
You can either:
- Rename the folder of version you choose to
point_e
and runpip install -e .
- Without installing a global package, import from the specific variant in your code, e.g. for
sc_point_e
:
from sc_point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from sc_point_e.diffusion.sampler import PointCloudSampler
from sc_point_e.models.download import load_checkpoint
from sc_point_e.models.configs import MODEL_CONFIGS, model_from_config
from sc_point_e.evals.feature_extractor import PointNetClassifier, get_torch_devices
from sc_point_e.evals.fid_is import compute_statistics
from sc_point_e.evals.fid_is import compute_inception_score
from sc_point_e.util.plotting import plot_point_cloud
- [1] Generating the textureless objects dataset (views, ground shapes).
- [2] Generating the complete textured objects dataset (views, ground shapes).
- [1] Text-to-3d with Stable Diffusion 2 + Inpainting (single view)
- [2] Text-to-3d with multiple rendered views from the SRNCars Dataset (multi-view)
- [3] Text-to-3d with multiple synthetic views from Stable Diffusion + 3D-Diffusion (Watson et al. 2022)
- [4] Text-to-3d from multiple photos "in the wild"
- [1] Dataset pre-processing and scores computation
- [2] A digression on the chosen metrics with experiments
- [3] Evaluating text-to-3D from multi view (patch concat.)
- [4] Comparing the chosen multi-view, text-to-3D methodologies
- [5, 6] Evaluating results on occluded object parts
- [7] Scores visualization and plotting
This dataset has been developed to assess the quality of the reconstructions from our multi-view models wrt. single-view Point-E. Through experimentation, we generated several datasets from the available sources ModelNet40, ShapeNetV2, ShapeNetV0. Specifically, the datasets generated from ModelNet40, ShapeNetV0 are textureless: we generated synthetic colouring through since RGB/grayscale values and sine functions.
The complete set of data can be found at this link.
Name | Samples | Source |
---|---|---|
ModelNet40, textureless | 40 | Google Drive |
ShapeNetv2, textureless | 55 | Google Drive |
Mixed, textureless | 190 | Google Drive |
Shapenet with textures | 650 | Google Drive |
OpenAI seed imgs/clouds | / | Google Drive |
OpenAI, COCO CLIP R-Precision evals | / | Google Drive |
Here you can find the generated clouds from the dataset ShapeNetv2 and ModelNet40 textureless comprehensive of the ground truth data, score and plot of the pairwise divergence distribution. More details are provided in the description.
Each sample in the dataset consists in a set of RGB, 256x256 V
views and a cloud of K
points sampled with PyTorch3D.
view: (N, V, 256, 256, 3)
cloud: (N, K, 3)
Further details on rendering:
- The light of the scene is fixed
- No reflections
- Two versions of the dataset:
- Fixed elevation and distance of the camera from the object, we took 6 pictures rotating around the object
- Fixed the distance of the camera from the object, we took 6 pictures changing stochastically the value of the elevation of the camera and rotating around the object
- We iterate this procedure on 25 different objects for each class in ShapeNet
- Each view is 256x256
You can see the pipeline for the generation of the ShapeNet dataset with textures here.
Concerning the set of views in the dataset produced from ShapeNetv2 and ModelNet40 textureless:
- The light of the scene is fixed
- No reflections
- We fixed the elevation and the distance of the camera from the object and we took 4 pictures rotating around the object
- We iterate this procedure on one object for each class in ShapeNetv2 and ModelNet40
- Each view is 512x512
You can check the pipeline for the generation of the ShapeNetv2 and ModelNet40 textureless dataset here with all the steps.
Here follows the directories structure:
<directories>
> shapenet_withTextures
>> eval_clouds.pickle
>> eval_views_fixed_elevation.pickle
>> eval_views_stochastic_elevation.pickle
> modelnet40_texrand_texsin
>> modelnet_csinrandn
>>> CLASS_MAP.pt
>>> images_obj.pt
>>> labels.pt
>>> points.pt
>> modelnet_texsin
>>> CLASS_MAP.pt
>>> images_obj.pt
>>> labels.pt
>>> points.pt
> shapenetv2_texrand_texsin
>> shapenetv2_csinrandn
>>> CLASS_MAP.pt
>>> images_obj.pt
>>> labels.pt
>>> points.pt
>> shapenetv2_texsin
>>> CLASS_MAP.pt
>>> images_obj.pt
>>> labels.pt
>>> points.pt
> shapenetv2_modelnet40_texrand_texsin
>> shapenet_modelnet_singleobject
>>> modelnet_csinrandn
>>>> CLASS_MAP.pt
>>>> images_obj.pt
>>>> labels.pt
>>>> points.pt
>>> modelnet_texsin
>>>> CLASS_MAP.pt
>>>> images_obj.pt
>>>> labels.pt
>>>> points.pt
>>> shapenet_csinrandn
>>>> CLASS_MAP.pt
>>>> images_obj.pt
>>>> labels.pt
>>>> points.pt
>>> shapenet_texsin
>>>> CLASS_MAP.pt
>>>> images_obj.pt
>>>> labels.pt
>>>> points.pt
> dataset_shapenet_modelnet_texsin_withgeneratedcloud
>> modelnet_texsin
>>> CLASS_MAP.pt
>>> eval_clouds_modelnet_300M.pickle
>>> images_obj.pt
>>> labels.pt
>>> modelnet_gencloud_300M
>>> points.pt
>> shapenet_texsin
>>> CLASS_MAP.pt
>>> eval_clouds_shapenet_300M.pickle
>>> images_obj.pt
>>> labels.pt
>>> shapenet_gencloud_300M
>>> points.pt
shapenet_withTextures
- list of the sampled cloud: eval_clouds.pickle # (n_img, ch, n_points) ch: 6, n_points: 4096
- list of gen views with fixed elevation: eval_views_fixed_elevation.pickle # (n_img, n_view, 256, 256, 3)
- list of gen views with stochastic elevation: eval_views_stochastic_elevation.pickle # (n_img, n_view, 256, 256, 3)
shapenetv2_modelnet40_texrand_texsin
- dictionary with {index: 'typeOfObject'}: CLASS_MAP.pt
- multiple viwes for each object: images_obj.pt # (n_img, n_view, 512, 512, 3)
- label for each object: labels.pt # (n_img,)
- ground truth point cloud: points.pt # (n_img,)
- tensor with the the generated pointcloud with point-e 300M:
ch: 6 (first 3 channel coord the others are the rgb colors of each point)
n_points: 4096 (generated points)
modelnet_gencloud_300M # (n_img, ch, n_points)
shapenet_gencloud_300M # (n_img, ch, n_points)
dataset_shapenet_modelnet_texsin_withgeneratedcloud
- dictionaries:
eval_clouds_modelnet_300M.pickle
eval_clouds_shapenet_300M.pickle
dictionary['nameOfTheObject'][index]
index 0: divergence_ground_single
index 1: divergence_ground_single_distribution_plot
index 2: divergence_ground_multi
index 3: divergence_ground_multi_distribution_plot
index 4: divergence_single_multi
index 5: divergence_single_multi_distribution_plot
index 6: ground_truth_pis
index 7: single_view_pis
index 8: multi_view_pis
index 9: ground_truth_point_cloud
index 10: single_view_point_cloud
index 11: multi_view_point_cloud
- import the files pt with torch
images_obj_views = torch.load(os.path.join(base_path,'images_obj.pt'))
- import the pickle file with the metrics or the shapenet_withTextures files
- more info in the notebook1 or notebook2.
dataset = 'shapenet'
base_path = os.path.join(dataset+"_texsin")
with open(os.path.join(base_path, 'eval_clouds_'+dataset+'_300M.pickle'), 'rb') as handle:
data = pickle.load(handle)
- Extending the dataset ShapeNet PSR
- Increasing the view resolution 512x512 or 1024x1024
- Diego Calanzone @diegocalanzone
- Riccardo Tedoldi @riccardotedoldi
- 0.1
- Initial Release
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Dataset ModelNet40
- Dataset ShapeNetCoreV2
- Dataset ShapeNetCore
- Shapenet
- Modelnet
- Dataset Shapenet_psr
- a6o: 3d-diffusion implementation
- OpenAI: official Point-E implementation
- RemBG: background removal, U^2 Net implementation
@misc{CalanzoneTedoldi2022,
title = {Generating point clouds from multiple views with Point-E},
author = {Diego Calanzone, Riccardo Tedoldi, Zeno Sambugaro},
year = {2023},
url = {http://github.com/halixness/stable-point-e}
}