diff --git a/doc/sphinx/00_intro/SSNI-baseline-draft.xlsx b/doc/sphinx/00_intro/SSNI-baseline-draft.xlsx index a0fe694a..700de4cc 100644 Binary files a/doc/sphinx/00_intro/SSNI-baseline-draft.xlsx and b/doc/sphinx/00_intro/SSNI-baseline-draft.xlsx differ diff --git a/doc/sphinx/03_vibe/cpu_20.csv b/doc/sphinx/03_vibe/cpu_20.csv index 459ecb2d..5ba4e243 100644 --- a/doc/sphinx/03_vibe/cpu_20.csv +++ b/doc/sphinx/03_vibe/cpu_20.csv @@ -1,4 +1,4 @@ No. Cores, Actual, Ideal -8, 2.00e+06, 2.0e+06 -32, 7.40e+06, 8.0e+06 -56, 1.29e+07, 1.4e+07 \ No newline at end of file +8, 3.40e+06, 3.40e+06 +32, 1.19e+07, 1.36e+07 +56, 1.88e+07, 2.38e+07 \ No newline at end of file diff --git a/doc/sphinx/03_vibe/cpu_40.csv b/doc/sphinx/03_vibe/cpu_40.csv index ccc095c0..96f511dd 100644 --- a/doc/sphinx/03_vibe/cpu_40.csv +++ b/doc/sphinx/03_vibe/cpu_40.csv @@ -1,6 +1,6 @@ No. Cores, Actual, Ideal -8, 1.82e+06, 1.82e+06 -32, 7.04e+06, 7.28e+06 -56, 1.21e+07, 1.274e+07 -88, 1.60e+07, 2.02e+07 -112, 2.00e+07, 2.548e+07 +8, 2.80e+06, 2.80e+06 +32, 1.12e+07, 1.12e+07 +56, 1.79e+07, 1.96e+07 +88, 2.36e+07, 3.08e+07 +112, 2.61e+07, 3.92e+07 diff --git a/doc/sphinx/03_vibe/cpu_60.csv b/doc/sphinx/03_vibe/cpu_60.csv index 5cb43156..9f7e8829 100644 --- a/doc/sphinx/03_vibe/cpu_60.csv +++ b/doc/sphinx/03_vibe/cpu_60.csv @@ -1,6 +1,6 @@ No. Cores, Actual, Ideal -8, 1.51e+06, 1.51e+06 -32, 6.34e+06, 6.04e+06 -56, 1.09e+07, 1.057e+07 -88, 1.55e+07, 1.661e+07 -112, 1.85e+07, 2.114e+07 +8, 2.40e+06, 2.40e+06 +32, 9.56e+06, 9.60e+06 +56, 1.54e+07, 1.68e+07 +88, 2.16e+07, 2.64e+07 +112, 2.44e+07, 3.36e+07 \ No newline at end of file diff --git a/doc/sphinx/03_vibe/gpu.csv b/doc/sphinx/03_vibe/gpu.csv index 96abcfbd..69c8b470 100644 --- a/doc/sphinx/03_vibe/gpu.csv +++ b/doc/sphinx/03_vibe/gpu.csv @@ -1,7 +1,7 @@ Mesh Base Size, Actual -32, 1.75e+07 -64, 1.15e+07 -96, 6.78e+06 -128, 0 -160, 0 +32, 2.88e+07 +64, 2.19e+07 +96, 1.41e+07 +128, 1.36e+07 +160, 1.03e+07 192, 0 diff --git a/doc/sphinx/03_vibe/vibe.rst b/doc/sphinx/03_vibe/vibe.rst index 862eaf2d..da5c8f1d 100644 --- a/doc/sphinx/03_vibe/vibe.rst +++ b/doc/sphinx/03_vibe/vibe.rst @@ -67,8 +67,7 @@ To build Parthenon on CPU, including this benchmark, with minimal external depen .. code-block:: bash parthenon$ mkdir build && cd build - build$ export CXXFLAGS="-fno-math-errno -march=native" - build$ cmake -DPARTHENON_DISABLE_HDF5=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF -DREGRESSION_GOLD_STANDARD_SYNC=OFF -DCMAKE_BUILD_TYPE=Release ../ + build$ cmake -DPARTHENON_DISABLE_HDF5=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF -DREGRESSION_GOLD_STANDARD_SYNC=OFF -DPARTHENON_ENABLE_TESTING=OFF -DCMAKE_BUILD_TYPE=Release ../ build$ make -j .. @@ -81,11 +80,11 @@ On Crossroads the relevant modules for the results shown here are: .. -To build for execution on a single GPU, it should be sufficient to add the following flags to the CMake configuration line +To build for execution on a single GPU, it should be sufficient to add flags similar to the CMake configuration line .. code-block:: bash - cmake -DPARTHENON_DISABLE_MPI=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON + cmake -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON .. @@ -123,7 +122,7 @@ The results presented here use 128 and 160 for memory footprints of approximate Results from Parthenon are provided on the following systems: * Crossroads (see :ref:`GlobalSystemATS3`) -* An Nvidia A100 GPU hosted on an [Nvidia Arm HPC Developer Kit](https://developer.nvidia.com/arm-hpc-devkit) +* A Grace Hopper (Grace ARM CPU 72 cores with 120GB, H100 GPU with 96GB) The mesh and meshblock size parameters are chosen to balance realism/performance with memory footprint. For the following tests we @@ -182,12 +181,12 @@ Crossroads VIBE Throughput Performance on Crossroads using ~60% memory -Nvidia testbed with A100 +Nvidia Grace Hopper ------------------------ -Throughput performance of Parthenon-VIBE on a 40GB A100 is provided within the following table and figure. +Throughput performance of Parthenon-VIBE on a 96 GB H100 is provided within the following table and figure. -.. csv-table:: VIBE Throughput Performance on A100 +.. csv-table:: VIBE Throughput Performance on H100 :file: gpu.csv :align: center :widths: 10, 10 @@ -196,9 +195,9 @@ Throughput performance of Parthenon-VIBE on a 40GB A100 is provided within the f .. figure:: gpu.png :align: center :scale: 50% - :alt: VIBE Throughput Performance on A100 + :alt: VIBE Throughput Performance on H100 - VIBE Throughput Performance on A100 + VIBE Throughput Performance on H100 Multi-node scaling on Crossroads