-
Notifications
You must be signed in to change notification settings - Fork 9
Building and Running MPICH (CH4)
This page describes how to build and run MPICH (CH4 device) to test the libfabric GNI provider. It is assumed the user is building MPICH on a Cray XC system like jupiter or edision/cori, and that you have built and installed a copy of libfabric.
MPICH/CH4 can be built to use the Cray PMI.
First, if you don't already have a clone of MPICH
% git clone [email protected]:pmodels/mpich.git
Make sure that your clone has PR 2557.
Next, configure and build/install MPICH. Note you will need libtool 2.4.4 or higher to keep MPICH's configury happy.
If you intend to use Cray PMI, you may need to apply these two patches:patch0, patch1. If you are pulling in a fresh copy of MPICH after April 14, 2017, you do not need to apply these patches.
After applying the patches, the following steps can be used to configure MPICH CH4:
% module load PrgEnv-gnu
%./autogen.sh
%./configure CFLAGS="-DMPIDI_CH3_HAS_NO_DYNAMIC_PROCESS" LDFLAGS="-Wl,-rpath -Wl,<path-to-ofi-libfabric-install>/lib" --with-pmi=cray --with-pm=none
--prefix=<path-to-mpich-install> --with-libfabric=<path-to-ofi-libfabric-install> --with-device=ch4:ofi
% make -j install
Note if you are wanting to run MPI multi-threaded tests which use MPI_THREAD_MULTIPLE,
you will need to add --enable-threads-multiple
to the configure line.
There does not appear to be any way currently to build MPICH with CH4 support and SLURM, or at least I've not been able to figure out how to do it.
First you will need to build an MPI app using MPICH's compiler wrapper:
% export PATH=mpich_install_dir/bin:${PATH}
% mpicc -o my_app my_app.c
On Tiger and NERSC edison/cori, the application can be launched using srun:
% export MPIR_CVAR_OFI_USE_PROVIDER=gni
% srun -n 2 -N 2 ./my_app
If you'd like to double check against the sockets provider, do the following
% export MPIR_CVAR_OFI_USE_PROVIDER=sockets
% srun -n 2 -N 2 ./my_app
This will force the OFI CH4 netmod to use the sockets provider. Note it seems that the default behavior of the CH4/OFI device is to pick up the sockets provider.
OSU provides a relatively simple set of MPI benchmark tests which are useful for testing the GNI libfabric provider.
% wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.0.tar.gz
% tar -zxvf osu-micro-benchmarks-5.0.tar.gz
% cd osu-micro-benchmarks-5.0
% ./configure CC=mpicc
% make
In the mpi/pt2pt
and mpi/collective
subdirectories there are a number
of tests. To test, for example MPICH send/recv message latency, osu_latency
can be used
% cd mpi/pt2pt
% srun -n 2 -N 2 ./osu_latency
The MPICH CH4 OFI MPI one-sided doesn't work with the OSU one-sided tests. This owes to the fact that OSU uses MPI_Win_Allocate
and currently that doesn't work for providers that only support FI_MR_BASIC
. At this writing, none of the OSU one-sided tests pass with the GNI provider.
The osu_ibcast test fails when run using more than 4 processors. It fails with both the GNI and sockets provider so it's most likely a bug higher up in the MPICH's non-blocking collectives implementation.