Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 3.46 KB

2408.05635.md

File metadata and controls

7 lines (4 loc) · 3.46 KB

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities since their data association usually relies on feature correspondences. Additionally, learning-based SLAM systems often fall short in terms of real-time performance and accuracy. Balancing real-time performance with dense 3D reconstruction capabilities is a challenging problem. In this paper, we propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting, for 3D scene representation and pose estimation. This technique leverages the real-time rendering performance of 3D Gaussian Splatting with rasterization and allows for differentiable optimization in real time through CUDA implementation. We also enable mesh reconstruction from 3D Gaussians for explicit dense 3D reconstruction. To estimate accurate camera poses, we utilize a rotation-translation decoupled strategy with inverse optimization. This involves iteratively updating both in several iterations through gradient-based optimization. This process includes differentiably rendering RGB, depth, and silhouette maps and updating the camera parameters to minimize a combined loss of photometric loss, depth geometry loss, and visibility loss, given the existing 3D Gaussian map. However, 3D Gaussian Splatting (3DGS) struggles to accurately represent surfaces due to the multi-view inconsistency of 3D Gaussians, which can lead to reduced accuracy in both camera pose estimation and scene reconstruction. To address this, we utilize depth priors as additional regularization to enforce geometric constraints, thereby improving the accuracy of both pose estimation and 3D reconstruction. We also provide extensive experimental results on public benchmark datasets to demonstrate the effectiveness of our proposed methods in terms of pose accuracy, geometric accuracy, and rendering performance.

传统的基于几何的SLAM系统由于其数据关联通常依赖于特征匹配,因此缺乏密集的3D重建能力。此外,基于学习的SLAM系统在实时性能和准确性方面往往表现不足。在保持实时性能的同时实现密集的3D重建能力是一个具有挑战性的问题。本文提出了一种实时RGB-D SLAM系统,该系统结合了一种新颖的视角合成技术——3D高斯点绘(3D Gaussian Splatting),用于3D场景表示和姿态估计。该技术利用3D高斯点绘结合光栅化的实时渲染性能,并通过CUDA实现实现实时的可微优化。我们还从3D高斯点绘中提取网格进行显式的密集3D重建。 为了估计精确的相机姿态,我们采用了一种旋转-平移解耦策略,并通过反向优化进行迭代更新。这包括通过梯度优化迭代更新多个循环中的RGB、深度和轮廓图像的可微渲染,并更新相机参数以最小化光度损失、深度几何损失和可见性损失的组合损失,基于现有的3D高斯图。然而,由于3D高斯点绘的多视角不一致性,3D高斯点绘(3DGS)在准确表示表面时存在困难,这可能导致相机姿态估计和场景重建的准确性降低。为了解决这一问题,我们利用深度先验作为额外的正则化以加强几何约束,从而提高姿态估计和3D重建的准确性。我们还在公共基准数据集上提供了广泛的实验结果,以展示我们提出的方法在姿态准确性、几何准确性和渲染性能方面的有效性。