From 3e7e3cb2f12559520006d77a82ae97cc43590ab5 Mon Sep 17 00:00:00 2001 From: "Alex R. Long" Date: Sun, 12 Nov 2023 10:05:38 -0700 Subject: [PATCH] + Add a correctness section for branson --- doc/sphinx/01_branson/branson.rst | 217 ++++++++++++++++++++++++++---- 1 file changed, 190 insertions(+), 27 deletions(-) diff --git a/doc/sphinx/01_branson/branson.rst b/doc/sphinx/01_branson/branson.rst index 64812986..5a8a7863 100644 --- a/doc/sphinx/01_branson/branson.rst +++ b/doc/sphinx/01_branson/branson.rst @@ -2,8 +2,8 @@ Branson ******* -This is the documentation for the ATS-5 Benchmark Branson - 3D hohlraum single node. - +This is the documentation for the ATS-5 Benchmark Branson - 3D hohlraum single node. + Purpose @@ -13,17 +13,17 @@ From their [Branson]_: Branson is not an acronym. -Branson is a proxy application for parallel Monte Carlo transport. -It contains a particle passing method for domain decomposition. +Branson is a proxy application for parallel Monte Carlo transport. +It contains a particle passing method for domain decomposition. + - Characteristics =============== Problem ------- -The benchmark performance problem is a single node 3D hohlraum problem that is meant to be run with a 30 group build of Branson. +The benchmark performance problem is a single node 3D hohlraum problem that is meant to be run with a 30 group build of Branson. It is in replicated mode which means there is very little MPI communication (end of cycle reductions). Figure of Merit @@ -36,14 +36,14 @@ Building Accessing the sources -* Clone the submodule from the benchmarks repository checkout +* Clone the submodule from the benchmarks repository checkout .. code-block:: bash cd git submodule update --init --recursive cd branson - + .. @@ -57,27 +57,27 @@ Build requirements: * `OpenMPI 1.10+ `_ * `mpich `_ -* There is only one CMake user option right now: ``CMAKE_BUILD_TYPE`` which can be +* There is only one CMake user option right now: ``CMAKE_BUILD_TYPE`` which can be set on the command line with ``-DCMAKE_BUILD_TYPE=`` and the default is Release. * If cmake has trouble finding your installed TPLs, you can try - + * appending their locations to ``CMAKE_PREFIX_PATH``, * try running ``ccmake .`` from the build directory and changing the values of build system variables related to TPL locations. -* If building a CUDA enabled version of Branson use the ``CUDADIR`` environment variable to specify your CUDA directory. +* If building a CUDA enabled version of Branson use the ``CUDADIR`` environment variable to specify your CUDA directory. .. code-block:: bash export CXX=`which g++` - cd - mkdir build - cd build + cd + mkdir build + cd build cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX= make -j -.. +.. Testing the build: @@ -86,7 +86,7 @@ Testing the build: cd $build_dir ctest -j 32 -.. +.. Running @@ -94,7 +94,7 @@ Running The ``inputs`` folder contains the 3D hohlraum input file. 3D hohlraums and should be run with a 30 group build of Branson (see Special builds section above). -The ``3D_hohlraum_single_node.xml`` problem is meant to be run on a full node. +The ``3D_hohlraum_single_node.xml`` problem is meant to be run on a full node. It is run with: @@ -104,26 +104,189 @@ It is run with: .. -For strong scaling on a CPU, Branson should be run with three different problem sizes such that the memory -footprint at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation. -Memory footprint is the sum of all Branson processes resident set size (or equivalent) on the node. -This can be obtained on a CPU system using the following (while the application is in step 2): +For strong scaling on a CPU, Branson should be run with three different problem sizes such that the memory +footprint at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation. +Memory footprint is the sum of all Branson processes resident set size (or equivalent) on the node. +This can be obtained on a CPU system using the following (while the application is in step 2): .. code-block:: bash ps -C BRANSON -o euser,c,pid,ppid,cmd,%cpu,%mem,rss --sort=-rss - + ps -C BRANSON -o rss | awk '{sum+=$1;} END{print sum/1024/1024;}' -.. +.. For throughput curves on a GPU the memory footprint of Branson must vary between ~5% and ~60% in increments of at most 5% of the computational device's main memory. -The memory footprint can be controlled by editing "photons" in the input file. +The memory footprint can be controlled by editing "photons" in the input file. Results from Branson are provided on the following systems: * Crossroads (see :ref:`GlobalSystemATS3`) * Sierra (see :ref:`GlobalSystemATS2`) +Correctness +------------ + +Branson has two main checks on correctness. The first is a looser check that's meant as a "smoke +test" to see if a code change has introduced an error. After every timestep, a summary block is +printed: + +.. code-block:: bash +******************************************************************************** +Step: 5 Start Time: 0.04 End Time: 0.05 dt: 0.01 +source time: 0.166658 +WARNING: use_gpu_transporter set to true but GPU kernel not available, running transport on CPU +Total Photons transported: 10632225 +Emission E: 4.43314e-05, Source E: 0, Absorption E: 4.1747e-05, Exit E: 2.59802e-06 +Pre census E: 3.5321e-07 Post census E: 3.396e-07 Post census Size: 219902 +Pre mat E: 0.0130731 Post mat E: 0.0130705 +Radiation conservation: -5.83707e-17 +Material conservation: -5.8599e-15 +Sends posted: 0, sends completed: 0 +Receives posted: 0, receives completed: 0 +Transport time max/min: 7.31594/7.20329 +.. + +Two lines in the block specifically relate to conservation: + +.. code-block:: bash +Radiation conservation: -5.83707e-17 +Material conservation: -5.8599e-15 +.. + +The radiation conservation should capture roughly half of the range of the floating point type +compared to the amount of radiation energy in the problem. The standard version of Branson uses +double precision for all floating point values in both CPU and GPU versions. For the timestep shown +above, there's 4.43314e-5 jerks of energy being emitted and the conservation quantity is -5.837e-17, +so the relative accuracy is about 1.0e-12, which is well above half the range of a double. The same +check can be done for the material energy conservation: here the total energy in the material at the +end of the timestep is 0.0130705 jerks, and the conservation value is -5.8599e-15, representing +relative precision of 1.0e-13. As mentioned above, conservation is a relatively loose check as more +particles and more cells represent more summmations and more opportunities for loss of precision. +This is further complicated by MPI reductions. Still, this check is accurate enough to clearly +detect particles that may havbe been lost in a modified MPI scheme (for example). + +The second check on correctness is much simpler. For any changes to Branson, the code should produce +the same temperature in a standard marshak wave problem after 100 cycles. For the + `marshak wave input `_ +file, the following temperature profile should be reproduced to 3% after 100 cycles, as shown below: + +.. code-block:: bash +Step: 100 Start Time: 0.99 End Time: 1 dt: 0.01 +source time: 0.094371 +-------- VERBOSE PRINT BLOCK: CELL TEMPERATURE -------- + cell T_e T_r abs_E + 0 0.9864821 0.98624394 2.3231089e-05 + 1 0.97376231 0.97335755 2.2986719e-05 + 2 0.95987812 0.95921396 2.2604072e-05 + 3 0.94448294 0.94359619 2.223203e-05 + 4 0.92838247 0.92729361 2.1860113e-05 + 5 0.91059797 0.90933099 2.1487142e-05 + 6 0.89041831 0.88903414 2.1098101e-05 + 7 0.86713097 0.86559489 2.0554045e-05 + 8 0.83972062 0.83807018 1.9926467e-05 + 9 0.80754477 0.80583439 1.9216495e-05 + 10 0.76586319 0.76409724 1.8223846e-05 + 11 0.71065544 0.70892379 1.6994308e-05 + 12 0.6190012 0.61733211 1.5009059e-05 + 13 0.36540211 0.35970671 1.1687053e-05 + 14 0.016821133 0.016162407 6.3406719e-07 + 15 0.01 0.0099763705 2.356755e-07 + 16 0.010000399 0.0099766379 2.3568489e-07 + 17 0.0099989172 0.0099752306 2.3564998e-07 + 18 0.010000684 0.0099769858 2.3569162e-07 + 19 0.009999951 0.0099762996 2.3567434e-07 + 20 0.0099997415 0.0099761208 2.356694e-07 + 21 0.010000476 0.0099768182 2.3568672e-07 + 22 0.0099993136 0.0099756288 2.3565932e-07 + 23 0.010000237 0.0099765577 2.3568109e-07 + 24 0.010000281 0.0099765314 2.3568212e-07 +------------------------------------------------------- +.. + +This output is expected as long as the spatial, boundary and region blocks are kept the same in the +input file. The IMC method that Branson uses is stocahstic so changing the random number seed or the +number of particles will produce a slightly different answer, but the difference should not be more +than 3% if one million or more particles aarre used. This test is sensitive to precision changes in +Branson as propagating the energy correctly involves many small summations as particle's slowly +lose their energy into the material. + + +Crossroads +------------ +Strong scaling performance of Crossroads 10M Particles is provided within the following table and +figure. + +.. csv-table:: Branson Strong Scaling Performance on Crossroads 10M particles + :file: cpu_10M.csv + :align: center + :widths: 10, 10, 10, 10, 10 + :header-rows: 1 + +.. figure:: cpu_10M.png + :align: center + :scale: 50% + :alt: Branson Strong Scaling Performance on Crossroads 10M particles + + Branson Strong Scaling Performance on Crossroads 10M particles + +Strong scaling performance of Branson Crossroads 66M Particles is provided within the following table and +figure. + +.. csv-table:: Branson Strong Scaling Performance on Crossroads 66M particles + :file: cpu_66M.csv + :align: center + :widths: 10, 10, 10, 10 + :header-rows: 1 + +.. figure:: cpu_66M.png + :align: center + :scale: 50% + :alt: Branson Strong Scaling Performance on Crossroads 66M particles + + ranson Strong Scaling Performance on Crossroads 66M particles + +Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and +figure. + +.. csv-table:: Branson Strong Scaling Performance on Crossroads 200M particles + :file: cpu_200M.csv + :align: center + :widths: 10, 10, 10, 10, 10 + :header-rows: 1 + +.. figure:: cpu_200M.png + :align: center + :scale: 50% + :alt: Branson Strong Scaling Performance on Crossroads 200M particles + + Branson Strong Scaling Performance on Crossroads 200M particles + +Sierra +------------ + +Throughput performance of Branson on Sierra is provided within the +following table and figure. + +.. csv-table:: Branson Throughput Performance on Sierra + :file: gpu.csv + :align: center + :widths: 10, 10 + :header-rows: 1 + +.. figure:: gpu.png + :align: center + :scale: 50% + :alt: Branson Throughput Performance on Sierra + + Branson Throughput Performance on Sierra + + +References +========== + +.. [Branson] Alex R. Long, 'Branson', 2023. [Online]. Available: https://github.com/lanl/branson. [Accessed: 22- Feb- 2023] + Crossroads ------------ Strong scaling performance of Crossroads 10M Particles is provided within the following table and @@ -140,7 +303,7 @@ figure. :scale: 50% :alt: Branson Strong Scaling Performance on Crossroads 10M particles - Branson Strong Scaling Performance on Crossroads 10M particles + Branson Strong Scaling Performance on Crossroads 10M particles Strong scaling performance of Branson Crossroads 66M Particles is provided within the following table and figure. @@ -156,7 +319,7 @@ figure. :scale: 50% :alt: Branson Strong Scaling Performance on Crossroads 66M particles - ranson Strong Scaling Performance on Crossroads 66M particles + ranson Strong Scaling Performance on Crossroads 66M particles Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and figure. @@ -172,7 +335,7 @@ figure. :scale: 50% :alt: Branson Strong Scaling Performance on Crossroads 200M particles - Branson Strong Scaling Performance on Crossroads 200M particles + Branson Strong Scaling Performance on Crossroads 200M particles Sierra ------------