-
Notifications
You must be signed in to change notification settings - Fork 101
MILC NERSC
maddyscientist edited this page Nov 24, 2021
·
2 revisions
The MILC NERSC RHMC benchmark concerning running a 2+1+1 flavor HISQ-improved staggered fermion simulation. There are four benchmarks, small, medium, large and x-large, with each subsequent benchmark 16x larger than the prior. We can thus strong scale by running the same benchmark on different process counts or weak scale by running the different benchmarks with the same local volume per process, e.g., running the large benchmark on 16x more GPUs than the medium benchmark.
Benchmark | Volume |
---|---|
Small | 18^3 x 36 |
Medium | 36^3 x 72 |
Large | 72^3 x 144 |
X-Large | 144^3 x 288 |
When running with MILC on QUDA, the following routines are offloaded to QUDA
- Multi-shift CG solver
- CG solver
- Gauge Force
- Fermion Force
- Gauge update
- Reunitarization
- The two-flavor determinant contribution is preconditioned by the strange quark
- All fermionic contributions are including using RHMC.
- A two-level time integration is used, with a second-order minimum norm integrator employed (Omelyan) on both levels. The gauge force is applied on the fine time scale, with all fermionic contributions on the coarse timescale.
- All fermionic contributions in the RHMC utilize a mixed-precision multi-shift CG algorithm, where the multi-shift solver is run in double-single precision, with per-shift refinement applied in double-half precision.
- The solves required as part of inline measurement at the end of each trajectory are performed using mixed-precision (double-half) CG.
The medium benchmark is suitable for scaling up to 16 GPUs.
Machine | Nodes | MPI processes | GPU | #GPU | Time (s) |
---|---|---|---|---|---|
Selene | 1 | 1 | NVIDIA A100-80 | 1 | 2260 |
Selene | 1 | 2 | NVIDIA A100-80 | 2 | 1319 |
Selene | 1 | 4 | NVIDIA A100-80 | 4 | 700 |
Selene | 1 | 8 | NVIDIA A100-80 | 8 | 394 |
The large benchmark is suitable for scaling up to 512 GPUs.
Machine | Nodes | MPI processes | GPU | #GPU | Time (s) |
---|---|---|---|---|---|
Selene | 4 | 32 | NVIDIA A100-80 | 32 | 1913 |
Selene | 8 | 64 | NVIDIA A100-80 | 64 | 1015 |
Selene | 16 | 128 | NVIDIA A100-80 | 128 | 651 |
Selene | 32 | 256 | NVIDIA A100-80 | 256 | 433 |
Selene | 64 | 512 | NVIDIA A100-80 | 512 | 320 |