Skip to content

Latest commit

 

History

History

mpm

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Material Point Method (MPM) Benchmark

Introduction

Material Point Method (MPM) is widely used in physical simulations. The required computations for high-quality scenarios are intensive, and the achievable performance is critical for real-time applications. In this benchmark, we compare a Taichi implementation to its ported version in CUDA. The measurement is based on open-souce implementations. You can find the original Taichi implementation here and the CUDA implementation here.

The MPM application consists of four kernels: Reset, Particle to Grid (P2G), Update Grid, and Grid to Particle (G2P). In CUDA implementation, each of these is mapped to a CUDA kernel. In Taichi implementation, each of these kernel is mapped to an outermost for-loop within a Taichi kernel (the Mega-Kernel). Instead of explicitly mapping the problem size to each GPU thread as in CUDA code, Taichi automatically parallelizes the outermost for-loops. Consequently, the code is more concise and easier to read. Note that the CUDA implementation utilizes Eigen matrix and vector data type to benefit from its concise syntax. Both the CUDA and Taichi implementation utilize L1 cache instead of explicitly control shared memory.

Evaluation

We conduct performance evaluation on the following device.

Device Nvidia RTX 3080 (10GB)
FP32 performance 29700 GFLOPS
Memory bandwidth 760 GB/s
L2 cache capacity 5 MB
Driver version 470.57.02
CUDA version 11.4

Performance is measured in milliseconds per frame (ms), we run over different number of particles, for 2- and 3-dimensions, respectively.

Reproduction Steps

  • Pre-requisites
python3 -m pip install --upgrade taichi
python3 -m pip install matplotlib

If you want to compare with CUDA, make sure you have nvcc properly installed.

  • Run the benchmark and draw the plots
python3 plot_benchmark.py