Proceedings 2024 ESPResSo meetings

Proceedings of the 2024 ESPResSo meetings

2024-12-17

What to run where, and how long it takes

Hardware at the ICP:

1152 cores and 36 GPUs on the Ant cluster
500 cores and 40 GPUs on the HTCondor infrastructure

Benchmarks:

single-core performance: cores on HTCondor have 30% more performance than cores on Ant
multi-core performance: on HTCondor, performance stops improving after 4 cores for simulations with 10k particles, and after 8 cores for 100k particles, while for GPU simulations 1 core is usually sufficient

Parallel performance update

use shared memory parallelism on the node level (rather than distributed memory) to reduce footprint of ghost particles calculations
use struct-of-arrays for particle lists
LJ simulation prototype implemented using Kokkos+Cabana

Improve build system user experience

feature name change: CUDA -> ESPRESSO_CUDA, etc. (#4974)
reduce number of CMake configuration options, e.g. with a unified -D ESPRESSO_BUILD_WITH_ARCHS="scalar,avx2,cuda" option
unify Python classes for CPU, AVX2 and GPU kernels with an extra argument arch:
- arch="cpu:auto": default value, maximal portability, selects the fastest kernel available for the current hardware
- arch="cpu:avx2": selects AVX2 kernels if supported by the hardware (Intel, AMD), otherwise raise error
- arch="cpu:neon": selects Neon kernels if supported by the hardware (ARM), otherwise raise error
- arch="cpu:scalar": selects scalar kernels (no vectorization, slowest)
- arch="gpu:auto": selects the GPU kernels against which ESPResSo was built if a matching GPU is available, otherwise raise error
- arch="gpu:cuda": selects the CUDA GPU kernels if an Nvidia GPU is available, otherwise raise error
- arch="gpu:rocm": selects the ROCm GPU kernels if an AMD GPU is available, otherwise raise error

2024-11-05

Multi-GPU LB

for a column system, where each MPI rank contains a cubic slice of the column, optimal performance is achieved by orienting the column main axis along the z-direction
for a cuboid system, slicing along the z-direction is also the optimal communication pattern, since slicing along 2 or 3 directions introduces communication with extra partners
the default MPI Cartesian topology in ESPResSo is in descending order, for multi-GPU LB the user must manually set it to ascending order
due to padding of GPU fields, the memory footprint is minimized when the size of the rank-local LB domain in agrid units along the x-direction is an integer multiple of 64 (single-precision) or 32 (double-precision)
- the formula is relatively simple to derive and involves a stepwise function
- every time we enter a new step, increasing the size along the x-direction is essentially free since the new data replaces the existing padding, until we reach the next step in the curve

2024-09-24

Summer school tutorials report

Alex: validated, one missing link to the user guide
JN: validated, one missing link to the book chapter
Julian: validated
Sam: still a work in progress
not present: Keerthi, David

LB performance improvements

on CPU, the development branch of ESPResSo outperforms the 4.2.2 release (#4921)
on GPU, GDRcopy is needed to remove a performance bottleneck in multi-GPU simulations

2024-08-13

Coding day

progress was made during and after the coding day in improving the script interface, introducing the ZnDraw visualizer in tutorials, fixing a corner case of the Lees-Edwards collision operator in LB, and fixing regressions in the Python implementation of Monte Carlo
there is a recurring issue with the difficulty level of C++ tasks
the core team needs to improve onboarding of C++ developers

MetaTensor

MetaTensor integration in ESPResSo is challenging due to dependencies
need to find test cases based on current ML research done with ESPResSo

2024-07-23

Global variables progress report

now encapsulated: non-bonded and bonded interactions, collision detection, particle list, cluster structure analysis, OIF, IBM, auto-update accumulators, constraints, MPI-IO, MMM1D (#4950)
new API: several features now take an ESPResSo system as argument: Cluster Structure, MPI-IO
in the future, more features will take a system or particle slice as argument, e.g. Observables (#4954)

New propagation API

system.thermostat and system.integrator will be removed in favor of system.propagation
possible API: JSON data structure
- easy to read from a parameter file
- conveys the hierarchical nature of mixed propagation modes
- avoids the ambiguity of similarly named parameters, e.g. "gamma" for both Langevin and Brownian, but "gamma0" and "gammaV" for NpT
```
system.propagation.set(kT=1.,
                       translation={"Langevin": {"gamma": 1.}, "LB": {"gamma": 2.}},
                       rotation={"Euler"})
```
more details in #4953 and in an upcoming announcement on the mailing list

Coding day

Tuesday, August 6, 2024

2024-07-03

Migration to C++20 and CUDA 12

see mailing list for more details
version requirements of many dependencies were updated

Porting to ARM A64FX

2024-06-04

Multi-GPU support

experimental support for multi-GPU LB is underway
- requires a suitable CUDA-aware MPI library
- for now, use one GPU device per MPI rank
- long-term plan: use one GPU device and multiple OpenMP threads per MPI rank
planned removal/replacement of the GPU implementations of long-range solvers

ZnVis new features

vector field visualization (LB velocities)
bacterial growth simulation (non-constant number of particles)
raytracing of porous media
red blood cell transport in capillary blood vessels

2024-05-14

GPU LB performance improvements

LB GPU now works in parallel
CUDA-aware MPI is still a work-in-progress
work on multi-GPU support has just started
long-term plan: multi-GPU support with 1 GPU per MPI rank and multiple shared memory threads per MPI rank via OpenMP

Multi-system ESPResSo simulations

multi-system simulations are now possible for almost all features
- caveats: two systems cannot have particles with the same particle ids, Monte Carlo not yet supported
can be enabled with a one-liner change to system.py (see last commit in jngrad/multiverse)

2024-04-23

Planned work with Cabana

convert particle cells from AoS to SoA (#4754), i.e. one array per particle property
improves cache locality and CPU optimizations
use Cabana to hide optimizations

New ESPResSo requirements

bump all version requirements (#4905)
on ICP workstations, only need to update formatting and linter tools with pip3 install -r requirements.txt autopep8 pycodestyle pylint pre-commit

2024-04-02

Implementing pressure waves in LB

ModEMUS project: nanoparticle diffusion in hydrogel network, by Pablo Blanco (NTNU) (see Ma et al. 2018)
currently implemented with Langevin, plan is to use LB instead to improve accuracy
ultrasound streaming could be modeled with a gradient pressure via pressure boundary conditions

GPU LB

GPU LB with particle coupling implemented in #4734
requires CUDA>=12.0 to make double-precision atomicAdd() available
performance is degraded when using more than 1 MPI rank to communicate to the same GPU, need to look into CUDA-aware MPI

EESSI

pyMBE-dev/pyMBE now uses EESSI for CI/CD
- continuous integration PR: pyMBE-dev/pyMBE#1
- continuous delivery PR: pyMBE-dev/pyMBE#23
- deployed docs: https://pymbe-dev.github.io/pyMBE

2024-02-21

Bee 2.0 cluster

Thursday: HPC team meeting to discuss software stack
Monday: set up the software stack
Tuesday: online meeting with the company

MultiXscale review highlights

main objectives of the CoE MultiXscale:
- EESSI: "app store" for scientific software
- multiscale simulations with 3 pilot cases: helicopter blades turbulent flow, ultrasound imaging of living tissues, energy storage devices
- make software pre-exascale ready
- training on using these software
ongoing projects for the ICP:
- improve scaling efficiency of ESPResSo: LJ simulations have only 50% efficiency at 1024 cores
- teaching people how to use this software: CECAM Flagship Schools 1229 and 1324

Interactive generative modeling

ZnDraw+SiMGen project in collaboration with Gábor Csányi
demo available at https://zndraw.icp.uni-stuttgart.de/

LB boundaries bug report

LB boundaries in the waLBerla version of ESPResSo are broken when using 2 or more MPI ranks
the LB ghost layer doesn't contain any information about boundaries, so fluid can flow out into the neighboring LB domain, where it gets trapped into the boundary
more details can be found in the bug report #4859
the solution will require a ghost update after every call to a node/slice/shape boundary velocity setter function

2024-01-30

Propagation refactoring

it is now possible to choose the exact equations of motions solved for each particle
more combinations of integrators and thermostats are allowed
one can mix different types of virtual sites in the same simulation
a new Python interface needs to be designed (target for next meeting)

Software and hardware changes

CUDA 12.0 is now the default everywhere at the ICP
GTX 980 GPUs are being removed from ICP workstations (GTX 1000 and higher are needed for double-precision atomicAdd)
Python 3.12 changes the way we use the unittest module (#4852)
Python virtual environments becomes mandatory for pip install in Ubuntu 24.04
- user guide needs to reflect that change

2024-01-09

New live visualizer

ZnDraw bridge is currently being developed for ESPResSo
supports live visualization in the web browser
the plan is to re-implement all features available in the ESPResSo OpenGL visualizer

Dropping CUDA 11 support

CUDA 12 will be the default in Ubuntu 24.04, due April 2024
CUDA 12 is required for C++20, which makes contributing to ESPResSo significantly easier (#4846)
compute clusters and supercomputers where ESPResSo is currently used already provide compiler toolchains compatible with CUDA 12

Proceedings 2024 ESPResSo meetings

Proceedings of the 2024 ESPResSo meetings

2024-12-17

What to run where, and how long it takes

Parallel performance update

Improve build system user experience

2024-11-05

Multi-GPU LB

2024-09-24

Summer school tutorials report

LB performance improvements

other topics

2024-08-13

Coding day

MetaTensor

2024-07-23

Global variables progress report

New propagation API

Coding day

2024-07-03

Migration to C++20 and CUDA 12

Porting to ARM A64FX

2024-06-04

Multi-GPU support

ZnVis new features

2024-05-14

GPU LB performance improvements

Multi-system ESPResSo simulations

2024-04-23

Planned work with Cabana

New ESPResSo requirements

2024-04-02

Implementing pressure waves in LB

GPU LB

EESSI

2024-02-21

Bee 2.0 cluster

MultiXscale review highlights

Interactive generative modeling

LB boundaries bug report

2024-01-30

Propagation refactoring

Software and hardware changes

2024-01-09

New live visualizer

Dropping CUDA 11 support

Clone this wiki locally