Skip to content

Commit

Permalink
Add description of problem sizes, weights, etc. for SSNI
Browse files Browse the repository at this point in the history
  • Loading branch information
gshipman committed Nov 29, 2023
1 parent b1ccd86 commit 5a2d7f2
Show file tree
Hide file tree
Showing 4 changed files with 108 additions and 38 deletions.
53 changes: 53 additions & 0 deletions doc/sphinx/00_intro/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,12 @@ Single node benchmarks will require respondent to provide estimates on

* Problem size must be changed to meet % of memory requirements.

* Respondent shall provide CPU strong scaling and GPU throughput results on current generation representative architectures.
If no representative architecture exists respondent can provide modeled / projected CPU strong scaling and GPU throughput results.
respondent may provide both results on current generation representative architectures and modeled / projected architectures.

* For SSNI projections respondent shall use the specific problem size(s) specified for SSNI.

Source code modification categories:

* Baseline: “out-of-the-box” performance
Expand Down Expand Up @@ -232,6 +238,53 @@ Where:
* w = weighting factor.



.. _GlobalSSNIWeightsSizes:

SSNI Weights and SSNI problem sizes
===================================


.. list-table::

* - **SSNI Benchmark**
- **SSNI Weight**
- **SSNI Problem size - % device memory**
* - Branson
- TBD
- 30
* - AMG2023 Problem 1 Setup
- TBD
- 20
* - AMG2023 Problem 2 Setup
- TBD
- 20
* - AMG2023 Problem 1 Solve
- TBD
- 20
* - AMG2023 Problem 2 Solve
- TBD
- 20
* - MiniEM
- TBD
- TBD
* - MLMD Training
- TBD
- N/A
* - MLMD Simulation
- TBD
- 60
* - Parthenon-VIBE
- TBD
- 40
* - Sparta
- TBD
- TBD
* - UMT
- TBD
- TBD


System Information
==================

Expand Down
48 changes: 32 additions & 16 deletions doc/sphinx/01_branson/branson.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,25 @@ It is in replicated mode which means there is very little MPI communication (end

Figure of Merit
---------------
The Figure of Merit is defined as particles/second and is obtained by dividing the number of particles in the problem divided by the `Total transport` value in the output. Future versions will output this number directly.
The Figure of Merit is defined as particles/second and is obtained by dividing the number of particles in the problem divided by the `Total transport` value.
This value is labeled "Photons Per Second (FOM):" in Branson's output.


Problem Sizes
-------------
For strong scaling on a CPU, Branson must be run with three different problem sizes such that the memory
footprint of all Branson processes at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation.


For throughput curves on a GPU the memory footprint of Branson must vary between ~5% and ~80% in increments of at most 5% of the computational device's main memory.

The memory footprint can be controlled by editing "photons" in the input file.

Results of both CPU strong scaling and GPU throughput should be provided on a representative, current-generation hardware configuration used in benchmarking and projections.
Results which are

See (see :ref:`GlobalSSNIWeightsSizes`) for the problem size for SSNI projection.

Building
========

Expand Down Expand Up @@ -104,8 +120,7 @@ It is run with:
..
For strong scaling on a CPU, Branson should be run with three different problem sizes such that the memory
footprint at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation.

Memory footprint is the sum of all Branson processes resident set size (or equivalent) on the node.
This can be obtained on a CPU system using the following (while the application is in step 2):

Expand All @@ -116,8 +131,7 @@ This can be obtained on a CPU system using the following (while the application
ps -C BRANSON -o rss | awk '{sum+=$1;} END{print sum/1024/1024;}'
..
For throughput curves on a GPU the memory footprint of Branson must vary between ~5% and ~60% in increments of at most 5% of the computational device's main memory.
The memory footprint can be controlled by editing "photons" in the input file.


Results from Branson are provided on the following systems:

Expand All @@ -128,17 +142,19 @@ Results from Branson are provided on the following systems:

.. _DarwinA100:

AMD Epyc + Nvidia A100
----------------------

Dual socket AMD Epyc 7502 with 32 cores operating at 2.5 GHz with 256 GBytes CPU
memory and dual Nvidia Ampere A100-SXM4 GPUs with 40GBytes of memory per GPU.



Correctness
------------

Branson has two main checks on correctness. The first is a looser check that's meant as a "smoke
test" to see if a code change has introduced an error. After every timestep, a summary block is
printed sdlfdjskl:
printed:

.. code-block:: bash
Expand Down Expand Up @@ -181,6 +197,7 @@ The second check on correctness is much simpler. For any changes to Branson, the
the same temperature in a standard marshak wave problem after 100 cycles. For the `marshak wave input <https://github.com/lanl/branson/blob/develop/inputs/marshak_wave_replicated.xml>`_ file, the following temperature profile should be reproduced to 3% after 100 cycles, as shown below:

.. code-block:: bash
Step: 100 Start Time: 0.99 End Time: 1 dt: 0.01
source time: 0.094371
-------- VERBOSE PRINT BLOCK: CELL TEMPERATURE --------
Expand Down Expand Up @@ -211,7 +228,7 @@ the same temperature in a standard marshak wave problem after 100 cycles. For th
23 0.010000237 0.0099765577 2.3568109e-07
24 0.010000281 0.0099765314 2.3568212e-07
-------------------------------------------------------
..

This output is expected as long as the spatial, boundary and region blocks are kept the same in the
Expand Down Expand Up @@ -256,8 +273,7 @@ figure.

Branson Strong Scaling Performance on Crossroads 66M particles

Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and
figure.
Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and figure.

.. csv-table:: Branson Strong Scaling Performance on Crossroads 200M particles
:file: cpu_200M.csv
Expand All @@ -272,24 +288,24 @@ figure.

Branson Strong Scaling Performance on Crossroads 200M particles

AMD Epyc + Nvidia A100
------------

AMD Epyc + Nvidia A100
----------------------
Throughput performance of Branson on AMD Epyc + Nvidia A100 (using a single GPU) is provided within the
following table and figure.

.. csv-table::Branson Throughput Performance on AMD Epyc + A100
.. csv-table:: Branson Throughput Performance on AMD Epyc + Nvidia A100
:file: gpu.csv
:align: center
:widths: 10, 10
:widths: 15, 15
:header-rows: 1

.. figure:: gpu.png
:align: center
:scale: 50%
:alt: Branson Throughput Performance on AMD Epyc + A100
:alt: Branson Throughput Performance on AMD Epyc + Nvidia A100

Branson Throughput Performance on AMD Epyc + A100
Branson Throughput Performance on AMD Epyc + Nvidia A100

References
==========
Expand Down
44 changes: 22 additions & 22 deletions doc/sphinx/01_branson/gpu.csv
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
No. Particles,Actual
100000,2.33E+05
200000,4.32E+05
300000,5.55E+05
400000,6.52E+05
500000,7.14E+05
600000,7.84E+05
700000,8.17E+05
800000,8.40E+05
900000,8.81E+05
1000000,9.06E+05
2000000,9.51E+05
3000000,8.72E+05
4000000,8.38E+05
5000000,7.92E+05
6600000,7.39E+05
10000000,6.34E+05
13300000,5.76E+05
20000000,5.03E+05
50000000,3.54E+05
100000000,2.74E+05
200000000,2.23E+05
No. Particles, Actual
100000, 2.33E+05
200000, 4.32E+05
300000, 5.55E+05
400000, 6.52E+05
500000, 7.14E+05
600000, 7.84E+05
700000, 8.17E+05
800000, 8.40E+05
900000, 8.81E+05
1000000, 9.06E+05
2000000, 9.51E+05
3000000, 8.72E+05
4000000, 8.38E+05
5000000, 7.92E+05
6600000, 7.39E+05
10000000, 6.34E+05
13300000, 5.76E+05
20000000, 5.03E+05
50000000, 3.54E+05
100000000, 2.74E+05
200000000, 2.23E+05
1 change: 1 addition & 0 deletions doc/sphinx/09_Microbenchmarks/M1_STREAM/STREAM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ At capacity, the measured values should reach a steady state where increasing th
For Crossroads, the benchmark was build with ``STREAM_ARRAY_SIZE=40000000`` and ``NTIMES=20`` with optmizations and OpenMP enabled.

.. code-block:: bash
make CC=`which mpicc` FF=`which mpifort` CFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20" FFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20"
Expand Down

0 comments on commit 5a2d7f2

Please sign in to comment.