Xin Fei*, Wenzhao Zheng
$\dagger$ , Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Jiwen Lu
Tsinghua University, UC Berkeley
*Work doen during an internship at UC Berkeley,
Most existing generalizable 3D Gaussian splatting methods (e.g., pixelSplat, MVSplat) assign a fixed number of Gaussians to each pixel, leading to inefficiency in capturing local geometry and overlap across views. Differently, our PixelGaussian dynamically adjusts the Gaussian distributions based on geometric complexity in a feed-forward framework. With comparable efficiency, PixelGaussian (trained using 2 views) successfully generalizes to various numbers of input views with adaptive Gaussian densities.
- [2024/10/25] Code release.
- [2024/10/25] Paper released on arXiv.
Given multi-view input images, we initialize 3D Gaussians using a lightweight image encoder and cost volume. Cascade Gaussian Adapter (CGA) then dynamically adapts both the distribution and quantity of Gaussians. By leveraging local image features, Iterative Gaussian Refiner (IGR) further refines Gaussian representations via deformable attention. Finally, novel views are rendered from the refined 3D Gaussians using rasterization-based rendering.
PixelGaussian achieves the best performance on the two representative datasets. Trained with 2 reference views, PixelGaussian can generalize to more views.
-
Please clone this project, create a conda virtual environment and install the requirements in
requirement.txt
. -
Download RealEstate10K, ACID datasets and corresponding assets following the instructions of pixelSplat
├── datasets
│ ├── re10k
│ ├── ├── train
│ ├── ├── ├── 000000.torch
│ ├── ├── ├── 000001.torch
│ ├── ├── ├── ...
│ ├── ├── test
│ ├── ├── ├── 000000.torch
│ ├── ├── ├── ...
- Please run the following command to start your training process, and choose a batch size that best matches your device for optimal performance.
python -m src.main +experiment=[re10k/acid] data_loader.train.batch_size=[batch_size]
Our code is based MVSplat and GaussianFormer and is also inspired by pixelSplat and SelfOcc.
If you find this project helpful, please consider citing the following paper:
@article{fei2024pixel,
title={PixelGaussian: Generalizable 3D Gaussian Reconstruction From Arbitrary Views},
author={Fei, Xin and Zheng, Wenzhao and Duan, Yueqi and Zhan, Wei and Tomizuka, Masayoshi and Keutzer, Kurt and Lu, Jiwen},
journal={arXiv preprint arXiv:2410.18979},
year={2024}
}