Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add branson and amg2023 crossroads results #102

Merged
merged 5 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 49 additions & 5 deletions doc/sphinx/02_amg/amg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -394,24 +394,68 @@ The FOMs of AMG2023 on V100 for Problem 2 is provided in the following table and
Multi-node scaling on Crossroads
================================

The results of the scaling runs performed on rocinante hbm partition are presented below.
The results of the scaling runs performed on Crossroadsare presented below.
Amg and hypre were built with intel oneapi 2023.1.0 and cray-mpich 8.1.25.
These runs used 32, 64, and 96 nodes with 108 tasks per node.
These runs used 32 to 2048 nodes with 108 tasks per node.
Problems 1 and 2 were run with problem sizes per MPI process, `-n`, of 38,38,38 and 60,60,60 respectively to use roughly 15% of available memory while maintaining a cubic grid.
The product of the x,y,z process topology must equal the number of processors.
In this case, x=y=24 for all node counts and z was set to 6, 12, and 18 for 32, 64, and 96 nodes respectively.
Output files can be found in ``./docs/sphinx/02_amg/scaling/output/``

.. figure:: cpu_scale_roci_cubes.png
.. figure:: ./scaling/p1weak.png
:align: center
:scale: 50%
:alt:

.. figure:: ./scaling/p2weak.png
:align: center
:scale: 50%
:alt:

.. csv-table:: Multi Node Scaling AMG problem 1 and 2
:file: amg_scale_roci_cubes_pernode.csv
:file: ./scaling/weak.csv
:align: center
:widths: 10, 10, 10, 10, 10
:widths: 10, 10, 10, 10, 10, 10, 10
:header-rows: 1

Timings were captured using Caliper and are presented below.
Caliper files can be found in ``./doc/sphinx/02_amg/scaling/plots/Caliper``

.. figure:: ./scaling/plots/prob1-totaltime-line.png
:align: center
:scale: 50%
:alt: AMG P1 time spent (exclusive) in each function/region.


.. figure:: ./scaling/plots/prob1-totaltime-area.png
:align: center
:scale: 50%
:alt: AMG P1 time spent (exclusive) in each function/region (Area plot).

.. figure:: ./scaling/plots/prob1-pct.png
:align: center
:scale: 50%
:alt: Percentage of AMG P1 time spent (exclusive) in each function/region.


.. figure:: ./scaling/plots/prob2-totaltime-line.png
:align: center
:scale: 50%
:alt: AMG P2 time spent (exclusive) in each function/region.


.. figure:: ./scaling/plots/prob2-totaltime-area.png
:align: center
:scale: 50%
:alt: AMG prob2-totaltime-line time spent (exclusive) in each function/region (Area plot).

.. figure:: ./scaling/plots/prob2-pct.png
:align: center
:scale: 50%
:alt: Percentage of AMG P2 time spent (exclusive) in each function/region.



References
==========

Expand Down
4 changes: 0 additions & 4 deletions doc/sphinx/02_amg/amg_scale_roci.csv

This file was deleted.

4 changes: 0 additions & 4 deletions doc/sphinx/02_amg/amg_scale_roci_cubes.csv

This file was deleted.

4 changes: 0 additions & 4 deletions doc/sphinx/02_amg/amg_scale_roci_cubes_pernode.csv

This file was deleted.

4 changes: 0 additions & 4 deletions doc/sphinx/02_amg/amg_scale_roci_header.csv

This file was deleted.

11 changes: 0 additions & 11 deletions doc/sphinx/02_amg/cpu.gp
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,3 @@ set output "roci_2_320.png"
set title "AMG2023 Strong Scaling for Problem 2, 320 x 320 x 320" font "serif,22"
plot "roci_2_320.csv" using 1:2 with linespoints linestyle 1, "" using 1:3 with line linestyle 2

# SCALING PLOTS, Y IS FOM PER NODE
unset logscale xy
set xrange [32:96]
set yrange [1e5:3e8]
set xlabel "Nodes"
set format y "%.1e"
set ylabel "FOM/node"
unset title
set output "cpu_scale_roci_cubes.png"
# set title "AMG Multi Node Scaling" font "serif,22"
plot "amg_scale_roci_cubes_pernode.csv" using 1:4 with linespoints linestyle 1 title "Problem 1", "" using 1:5 with line linestyle 2 title "Problem 2"

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

108 changes: 108 additions & 0 deletions doc/sphinx/02_amg/scaling/output/amg-1.1.1024.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
--------------------- ---------------------
srun -N 1024 --ntasks-per-node=108 --hint=nomultithread --distribution=block:block /usr/projects/hpctools/agood/projects/ats5-testing/SSI-scripts-main/bin/amg -problem 1 -n 38 38 38 -P 48 48 48
Running with these driver parameters:
Problem ID = 1

=============================================
Hypre init times:
=============================================
Hypre init:
wall clock time = 0.000041 seconds
Laplacian_27pt:
(Nx, Ny, Nz) = (1824, 1824, 1824)
(Px, Py, Pz) = (48, 48, 48)

=============================================
Generate Matrix:
=============================================
Spatial Operator:
wall clock time = 0.082292 seconds
RHS vector has unit components
Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
wall clock time = 0.002909 seconds
=============================================
Problem 1: AMG Setup Time:
=============================================
GMRES Setup:
wall clock time = 6.279385 seconds

FOM_Setup: nnz_AP / Setup Phase Time: 6.585183e+10

=============================================
Problem 1: AMG-GMRES Solve Time:
=============================================
GMRES Solve:
wall clock time = 4.745409 seconds

Iterations = 12
Final Relative Residual Norm = 4.880826e-13


FOM_Solve: nnz_AP / Solve Phase Time: 8.713874e+10



Figure of Merit (FOM): nnz_AP / (Setup Phase Time + Solve Phase Time) 3.750719e+10

Mem Used: 25766114204 Total Ram: 134277042176 Fraction Ram: 19.190000%
TOTAL RSS MAX: 24572 (GiB) - 19.190000%
MIN RSS MAX: 24719552 23 (GiB) - 18.850000% -- On NODE: 810 - nid002430
MAX RSS MAX: 25352224 24 (GiB) - 19.330000% -- On NODE: 194 - nid001451
______------ ------_____
srun -N 1024 --ntasks-per-node=108 --hint=nomultithread --distribution=block:block /usr/projects/hpctools/agood/projects/ats5-testing/SSI-scripts-main/bin/amg -problem 2 -n 60 60 60 -P 48 48 48
Running with these driver parameters:
Problem ID = 2

=============================================
Hypre init times:
=============================================
Hypre init:
wall clock time = 0.000070 seconds
Laplacian_7pt:
(Nx, Ny, Nz) = (2880, 2880, 2880)
(Px, Py, Pz) = (48, 48, 48)

=============================================
Generate Matrix:
=============================================
Spatial Operator:
wall clock time = 0.071304 seconds
RHS vector has unit components
Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
wall clock time = 0.005967 seconds
=============================================
Problem 2: AMG Setup Time:
=============================================
PCG Setup:
wall clock time = 4.799400 seconds

FOM_Setup: nnz_AP / Setup Phase Time: 5.951326e+10

=============================================
Problem 2: AMG-PCG Solve Time:
=============================================
PCG Solve:
wall clock time = 5.476434 seconds

Iterations = 34
Final Relative Residual Norm = 6.929663e-09


FOM_Solve: nnz_AP * iterations / Solve Phase Time: 5.215582e+10



Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 1.345480e+10

Mem Used: 26891301208 Total Ram: 134277042176 Fraction Ram: 20.030000%
TOTAL RSS MAX: 25645 (GiB) - 20.030000%
MIN RSS MAX: 26086960 24 (GiB) - 19.890000% -- On NODE: 889 - nid002614
MAX RSS MAX: 26310204 25 (GiB) - 20.060000% -- On NODE: 1017 - nid003015
108 changes: 108 additions & 0 deletions doc/sphinx/02_amg/scaling/output/amg-1.1.128.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
--------------------- ---------------------
srun -N 128 --ntasks-per-node=108 --hint=nomultithread --distribution=block:block /usr/projects/hpctools/agood/projects/ats5-testing/SSI-scripts-main/bin/amg -problem 1 -n 38 38 38 -P 24 24 24
Running with these driver parameters:
Problem ID = 1

=============================================
Hypre init times:
=============================================
Hypre init:
wall clock time = 0.000049 seconds
Laplacian_27pt:
(Nx, Ny, Nz) = (912, 912, 912)
(Px, Py, Pz) = (24, 24, 24)

=============================================
Generate Matrix:
=============================================
Spatial Operator:
wall clock time = 0.053407 seconds
RHS vector has unit components
Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
wall clock time = 0.002185 seconds
=============================================
Problem 1: AMG Setup Time:
=============================================
GMRES Setup:
wall clock time = 3.345961 seconds

FOM_Setup: nnz_AP / Setup Phase Time: 1.542544e+10

=============================================
Problem 1: AMG-GMRES Solve Time:
=============================================
GMRES Solve:
wall clock time = 2.517649 seconds

Iterations = 12
Final Relative Residual Norm = 5.351352e-13


FOM_Solve: nnz_AP / Solve Phase Time: 2.050045e+10



Figure of Merit (FOM): nnz_AP / (Setup Phase Time + Solve Phase Time) 8.802245e+09

Mem Used: 3142859408 Total Ram: 16784630272 Fraction Ram: 18.720000%
TOTAL RSS MAX: 2997 (GiB) - 18.720000%
MIN RSS MAX: 24187588 23 (GiB) - 18.450000% -- On NODE: 122 - nid001379
MAX RSS MAX: 24730588 23 (GiB) - 18.860000% -- On NODE: 126 - nid001383
______------ ------_____
srun -N 128 --ntasks-per-node=108 --hint=nomultithread --distribution=block:block /usr/projects/hpctools/agood/projects/ats5-testing/SSI-scripts-main/bin/amg -problem 2 -n 60 60 60 -P 24 24 24
Running with these driver parameters:
Problem ID = 2

=============================================
Hypre init times:
=============================================
Hypre init:
wall clock time = 0.000097 seconds
Laplacian_7pt:
(Nx, Ny, Nz) = (1440, 1440, 1440)
(Px, Py, Pz) = (24, 24, 24)

=============================================
Generate Matrix:
=============================================
Spatial Operator:
wall clock time = 0.069923 seconds
RHS vector has unit components
Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
wall clock time = 0.006537 seconds
=============================================
Problem 2: AMG Setup Time:
=============================================
PCG Setup:
wall clock time = 2.278168 seconds

FOM_Setup: nnz_AP / Setup Phase Time: 1.566778e+10

=============================================
Problem 2: AMG-PCG Solve Time:
=============================================
PCG Solve:
wall clock time = 2.948403 seconds

Iterations = 30
Final Relative Residual Norm = 5.836549e-09


FOM_Solve: nnz_AP * iterations / Solve Phase Time: 1.210616e+10



Figure of Merit (FOM): nnz_AP / (Setup Phase Time + 3 * Solve Phase Time) 3.208902e+09

Mem Used: 3123950132 Total Ram: 16784630272 Fraction Ram: 18.610000%
TOTAL RSS MAX: 2979 (GiB) - 18.610000%
MIN RSS MAX: 24371076 23 (GiB) - 18.590000% -- On NODE: 84 - nid001341
MAX RSS MAX: 24443928 23 (GiB) - 18.640000% -- On NODE: 94 - nid001351
Loading