Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the simulated results would become unnatural. The latter tends to formulate the video with minor motions and discontinuous frames, due to the absence of physical constraints in deformation learning. We think that video generative models are trained with real-world captured data, capable of judging physical phenomenon in simulation environments. To this end, we propose DreamPhysics in this work, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors. DreamPhysics supports both image- and text-conditioned guidance, optimizing physical parameters via score distillation sampling with frame interpolation and log gradient. Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions. Experimental results demonstrate that, by distilling the prior knowledge of video diffusion models, inaccurate physical properties can be gradually refined for high-quality simulation.
动态 3D 交互在近期的研究中引起了极大的兴趣,而创建此类 4D 内容仍然具有挑战性。一种解决方案是通过物理基础的模拟来动画化 3D 场景,另一种则是通过提炼视频生成模型来学习静态 3D 对象的形变。前者需要为目标对象分配精确的物理属性,否则模拟结果可能会变得不自然。后者由于在形变学习中缺乏物理约束,往往会导致视频中的微小运动和不连续的帧。我们认为视频生成模型是通过现实世界捕获的数据训练而成的,能够在模拟环境中判断物理现象。为此,我们在这项工作中提出了 DreamPhysics,它利用视频扩散先验估计 3D 高斯喷溅的物理属性。DreamPhysics 支持图像和文本条件指导,通过得分提炼采样与帧插值和对数梯度来优化物理参数。基于具有适当物理参数的材料点方法模拟器,我们的方法可以生成具有真实运动的 4D 内容。实验结果表明,通过提炼视频扩散模型的先验知识,可以逐渐完善不准确的物理属性,以实现高质量的模拟。