Skip to content

Commit

Permalink
Merge branch 'main' of github.com:lanl/benchmarks
Browse files Browse the repository at this point in the history
  • Loading branch information
gshipman committed Dec 5, 2023
2 parents 7ba4593 + 645401b commit 0212c91
Show file tree
Hide file tree
Showing 15 changed files with 158 additions and 79 deletions.
2 changes: 1 addition & 1 deletion doc/sphinx/02_amg/mem.gp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ set ylabel "FOM"
set xrange [10:40]
set key left top

set yrange [1.0e+6: 1.0e+7]
set yrange [1.0e+6: 2.0e+7]
set grid
show grid

Expand Down
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_1_120.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,8.6459E+06,8.6459E+06
2,1.4987E+07,1.7292E+07
4,2.9222E+07,3.4583E+07
8,5.5766E+07,6.9167E+07
16,9.5407E+07,1.3833E+08
32,1.4029E+08,2.7667E+08
64,2.2887E+08,5.5333E+08
1,1.2053E+07,1.2053E+07
2,2.3748E+07,2.4106E+07
4,4.4537E+07,4.8212E+07
8,8.1841E+07,9.6424E+07
16,1.5018E+08,1.9285E+08
32,2.6169E+08,3.8570E+08
64,4.0990E+08,7.7139E+08
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_1_160.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,8.4644E+06,8.4644E+06
2,1.2983E+07,1.6929E+07
4,2.7064E+07,3.3857E+07
8,5.0436E+07,6.7715E+07
16,1.0227E+08,1.3543E+08
32,1.3856E+08,2.7086E+08
64,2.3692E+08,5.4172E+08
1,1.1871E+07,1.1871E+07
2,2.2864E+07,2.3743E+07
4,4.3908E+07,4.7486E+07
8,8.2382E+07,9.4972E+07
16,1.5050E+08,1.8994E+08
32,2.7485E+08,3.7989E+08
64,4.7192E+08,7.5977E+08
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_1_200.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,8.4267E+06,8.4267E+06
2,1.2526E+07,1.6853E+07
4,2.4576E+07,3.3707E+07
8,5.0598E+07,6.7413E+07
16,9.3217E+07,1.3483E+08
32,1.2682E+08,2.6965E+08
64,2.3377E+08,5.3931E+08
1,1.1677E+07,1.1677E+07
2,2.2162E+07,2.3354E+07
4,4.2170E+07,4.6708E+07
8,8.3668E+07,9.3415E+07
16,1.4897E+08,1.8683E+08
32,2.7608E+08,3.7366E+08
64,5.0217E+08,7.4732E+08
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_2_200.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,1.7980E+06,1.7980E+06
2,3.2662E+06,3.5961E+06
4,6.2277E+06,7.1922E+06
8,1.2149E+07,1.4384E+07
16,2.2796E+07,2.8769E+07
32,2.8885E+07,5.7538E+07
64,4.6850E+07,1.1508E+08
1,2.7956E+06,2.7956E+06
2,5.0762E+06,5.5913E+06
4,9.3707E+06,1.1183E+07
8,1.7585E+07,2.2365E+07
16,3.2401E+07,4.4730E+07
32,5.8306E+07,8.9460E+07
64,1.0401E+08,1.7892E+08
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_2_256.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,1.7267E+06,1.7267E+06
2,3.0559E+06,3.4535E+06
4,5.8681E+06,6.9069E+06
8,1.1919E+07,1.3814E+07
16,2.0471E+07,2.7628E+07
32,2.7253E+07,5.5255E+07
64,4.5270E+07,1.1051E+08
1,2.5400E+06,2.5400E+06
2,4.6956E+06,5.0801E+06
4,8.5277E+06,1.0160E+07
8,1.6098E+07,2.0320E+07
16,2.9559E+07,4.0640E+07
32,5.5629E+07,8.1281E+07
64,1.0656E+08,1.6256E+08
14 changes: 7 additions & 7 deletions doc/sphinx/02_amg/roci_2_320.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
No. cores,Actual,Ideal
1,1.6485E+06,1.6485E+06
2,2.8577E+06,3.2970E+06
4,5.3917E+06,6.5940E+06
8,1.1154E+07,1.3188E+07
16,2.1099E+07,2.6376E+07
32,2.6207E+07,5.2752E+07
64,4.2568E+07,1.0550E+08
1,2.3462E+06,2.3462E+06
2,4.2833E+06,4.6925E+06
4,7.8280E+06,9.3849E+06
8,1.5619E+07,1.8770E+07
16,2.8615E+07,3.7540E+07
32,5.2880E+07,7.5079E+07
64,9.5875E+07,1.5016E+08
8 changes: 4 additions & 4 deletions doc/sphinx/02_amg/roci_mem.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GB,Problem 1,Problem 2
10,8.6128E+06,1.8494E+06
20,8.4654E+06,1.7930E+06
30,8.4068E+06,1.7416E+06
40,8.3174E+06,1.7258E+06
10,1.2197E+07,2.8391E+06
20,1.2055E+07,2.7853E+06
30,1.1929E+07,2.6162E+06
40,1.1904E+07,2.5890E+06
20 changes: 11 additions & 9 deletions doc/sphinx/09_Microbenchmarks/M5_DGEMM/DGEMM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,32 +40,34 @@ Run Rules
Building
========

Makefiles are provided for the intel and gcc compilers. Before building, load the compiler and blas libraries into the PATH and LD_LIBRARY_PATH.
Load the compiler; make and enter a build directory.

.. code-block:: bash
cd src
patch -p1 < ../dgemm_omp_fixes.patch
make CFLAGS=-I<openblas_include_dir>
cmake -DBLAS_NAME=<blas library name> ..
make
..
If using a different compiler, copy and modify the simple makefiles to apply the appropriate flags.

If using a different blas library than mkl or openblas, modify the C source file to use the correct header and dgemm command.
Current `BLAS_NAME` options are mkl, cblas (openblas), essl, or the raw coded (OpenMP threaded) dgemm.
The `BLAS_NAME` argument is required.
If the headers or libraries aren't found provide `BLAS_LIB_DIR`, `BLAS_INCLUDE_DIR`, or `BLAS_ROOT` to cmake.
If using a different blas library, modify the C source file to use the correct header and dgemm command.

Running
=======

DGEMM uses OpenMP but does not use MPI.

Set the number of OpenMP threads before running.
Set the number of OpenMP threads and other OMP characteristics with export.
The following were used for the Crossroads (:ref:`GlobalSystemATS3`) system.

.. code-block:: bash
export OPENBLAS_NUM_THREADS=<nthreads>
export OPENBLAS_NUM_THREADS=<nthreads> #MKL INHERITS FROM OMP_NUM_THREADS.
export OMP_NUM_THREADS=<nthreads>
export OMP_PLACES=cores
export OMP_PROC_BIND=close
..
Expand Down
86 changes: 86 additions & 0 deletions microbenchmarks/dgemm/src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
################################################################################
# NOTES ON COMPILATION
################################################################################

# Load the modules for each libary and the environment variables will be set
# correctly (load cmake, compiler, blas library)

# Enbable debug mode by passing -DCMAKE_BUILD_TYPE=Debug to CMake, default is
# Release

cmake_minimum_required(VERSION 3.11)

project(DGEMM
VERSION 1.0
DESCRIPTION "DGEMM tests the performance of BLAS libraries"
LANGUAGES C)

site_name( SITENAME )

if ( NOT DEFINED BLAS_NAME )
message( SEND_ERROR "BLAS NAME MUST BE SPECIFIED: cblas, mkl, essl or raw")
endif()

string( TOUPPER ${BLAS_NAME} BLAS_NAME_UPPER )

if ( NOT DEFINED CMAKE_BUILD_TYPE )
set(CMAKE_BUILD_TYPE "Release" )
endif()

set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DUSE_${BLAS_NAME_UPPER}")

set( CMAKE_C_FLAGS_DEBUG "-Wall -O0 -g" )
set( CMAKE_VERBOSE_MAKEFILE "TRUE" )
if (CMAKE_C_COMPILER_ID STREQUAL "GNU" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fopenmp")
set( CMAKE_C_FLAGS_RELEASE "-ffast-math -mavx2 -ftree-vectorizer-verbose=3 -O3 -funroll-loops -fno-var-tracking-assignments")
elseif( CMAKE_C_COMPILER_ID MATCHES "Intel" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -openmp")
set( CMAKE_C_FLAGS_RELEASE "-O3 -fp-speculation=fast -fp-model=precise -qno-opt-dynamic-align")
elseif( CMAKE_C_COMPILER_ID STREQUAL "Cray" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fopenmp")
set( CMAKE_C_FLAGS_RELEASE "-O3")
endif()

if (CMAKE_BUILD_TYPE STREQUAL "Release" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_FLAGS_RELEASE}")
elseif (CMAKE_BUILD_TYPE STREQUAL "Debug" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_FLAGS_DEBUG}")
endif()

# Summary of user-selectable build options
message( "\nBuild Summary:\n")
message( STATUS "Machine name : ${SITENAME}")
message( STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")
message( STATUS "Compiler : ${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER}")
message( STATUS "BLAS : ${BLAS_NAME}")
message( STATUS "-----------------------------------")
message( STATUS "Compiler Flags (All) : ${CMAKE_C_FLAGS}")
message( STATUS "Compiler Flags (Debug) : ${CMAKE_C_FLAGS_DEBUG}")
message( STATUS "Compiler Flags (Release): ${CMAKE_C_FLAGS_RELEASE}")
message("\n")

add_executable(dgemm mt-dgemm.c)

if ( DEFINED BLAS_ROOT )
target_link_directories( dgemm PRIVATE "${BLAS_ROOT}/lib")
target_include_directories( dgemm PRIVATE "${BLAS_ROOT}/include")
endif()

if ( DEFINED BLAS_LIB_DIR )
target_link_directories( dgemm PRIVATE ${BLAS_LIB_DIR} )
endif()

if ( DEFINED BLAS_INCLUDE_DIR )
target_include_directories( dgemm PRIVATE ${BLAS_INCLUDE_DIR} )
endif()

if ( ${BLAS_NAME} STREQUAL "mkl" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -qmkl=parallel")
elseif ( ${BLAS_NAME} STREQUAL "cblas" )
target_link_libraries( dgemm LINK_PUBLIC "openblas")
elseif ( ${BLAS_NAME} STREQUAL "essl" )
target_link_libraries( dgemm LINK_PUBLIC "essl")
endif()


10 changes: 0 additions & 10 deletions microbenchmarks/dgemm/src/Makefile

This file was deleted.

11 changes: 0 additions & 11 deletions microbenchmarks/dgemm/src/Makefile.intel

This file was deleted.

12 changes: 12 additions & 0 deletions microbenchmarks/dgemm/src/mt-dgemm.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@
#include <stdlib.h>
#include <sys/time.h>

#define BLAS_LIB "nolib"

#ifdef USE_MKL
#include "mkl.h"
#define BLAS_LIB "mkl"
#endif

#ifdef USE_CBLAS
#include "cblas.h"
#define BLAS_LIB "cblas"
#endif

#ifdef USE_ESSL
#include "essl.h"
#define BLAS_LIB "essl"
#endif

#define DGEMM_RESTRICT __restrict__
Expand Down Expand Up @@ -107,6 +112,8 @@ int main(int argc, char* argv[]) {
}

printf("Performing multiplication...\n");
printf("Using Blas Type: %s\n", BLAS_LIB);
printf("Iteration #:\n");

const double start = get_seconds();

Expand Down Expand Up @@ -142,7 +149,12 @@ int main(int argc, char* argv[]) {
}
}
#endif
if ( r%10 == 0 ) {
printf("%d, ", r);
fflush(stdout);
}
}
printf("\n");

// ------------------------------------------------------- //
// VENDOR NOTIFICATION: END MODIFIABLE REGION
Expand Down
2 changes: 1 addition & 1 deletion microbenchmarks/spatter
Submodule spatter updated 2 files
+0 −1 .gitmodules
+1 −1 spatter
2 changes: 1 addition & 1 deletion umt
Submodule umt updated 47 files
+7 −17 .gitignore
+5 −3 DEPENDENCIES.md
+52 −2 README.md
+0 −4 benchmarks/README
+0 −41 benchmarks/generate_strong_scaling_runs_spp1.py
+0 −41 benchmarks/generate_weak_scaling_runs_spp1.py
+9 −33 build_and_run_umt.sh
+ img/tile.jpg
+59 −32 src/CMakeLists.txt
+6 −6 src/cmake/InitBuildTypeCompilerFlags.cmake
+9 −6 src/teton/CMakeLists.txt
+136 −0 src/teton/aux/AppendSourceToPsi.F90
+1 −0 src/teton/aux/CMakeLists.txt
+0 −1 src/teton/aux/DestructMeshData.F90
+5 −3 src/teton/aux/checkInputSanity.F90
+124 −164 src/teton/control/finalizeSets.F90
+25 −76 src/teton/control/initializeSets.F90
+1,531 −1,004 src/teton/driver/test_driver.cc
+176 −13 src/teton/gpu/CornerSweepUCBxyz_OMPOL.F90
+42 −29 src/teton/gpu/OMPWrappers.F90.templates
+10 −5 src/teton/gpu/SetSweep_OMPOL.F90
+1 −1 src/teton/gpu/SweepGreyUCBxyz_OMPOL.F90
+7 −31 src/teton/gpu/finalizeGPUMemory.F90
+6 −50 src/teton/gpu/initializeGPUMemory.F90
+17 −3 src/teton/include/TetonBlueprint.hh
+36 −9 src/teton/include/TetonConduitInterface.hh
+22 −0 src/teton/include/TetonInterface.hh
+153 −0 src/teton/include/TetonSources.hh
+15 −0 src/teton/include/TetonSurfaceTallies.hh
+11 −0 src/teton/include/macros.h
+11 −1 src/teton/include/omp_wrappers.h
+2 −0 src/teton/interface/CMakeLists.txt
+313 −35 src/teton/interface/TetonBlueprint.cc
+406 −154 src/teton/interface/TetonConduitInterface.cc
+207 −0 src/teton/interface/TetonSources.cc
+300 −0 src/teton/interface/TetonSurfaceTallies.cc
+7 −7 src/teton/misc/mpif90_mod.F90
+6 −2 src/teton/mods/AngleSet_mod.F90
+5 −33 src/teton/mods/MemoryAllocator_mod.F90
+16 −37 src/teton/mods/MemoryAllocator_mod.F90.templates
+45 −24 src/teton/mods/Options_mod.F90
+2 −2 src/teton/mods/Size_mod.F90
+47 −30 src/teton/mods/system_info_mod.F90
+0 −10 src/teton/rt/RecvFlux.F90
+0 −13 src/teton/rt/SendFlux.F90
+7 −44 src/teton/rt/initcomm.F90
+10 −1 src/teton/rt/rtmainsn.F90

0 comments on commit 0212c91

Please sign in to comment.