This directory contains an overall "getting started" guide for running Mochi hands-on exercies on Summit. Additional exercises (referenced in the tutorial presentations) are available in subdirectories.
Summit requires two-factor authentication. Hopefully you have a token from Oak Ridge. If you do not, and you are reading this on 6 February at the boot camp for the first time, you will have to find another machine to work on.
You'll log in with your pin + one-time pass code.
Make a directory for this bootcamp:
mkdir ~/bootcamp
cd ~/bootcamp
git clone https://xgitlab.cels.anl.gov/sds/mochi-boot-camp.git
The easiest way to download and compile Mochi components is via the Spack package manager. Begin by cloning the spack repository from github, and then adding spack to your environment:
cd ~/bootcamp
git clone https://github.com/spack/spack.git
cd spack
. ~/bootcamp/spack/share/spack/setup-env.sh
At this point, the spack
command line tool is available to you, but we
need just a few more steps to configure it ideally for this envionment.
Summit has a minimal base environment, and relies on modules
(http://lmod.readthedocs.org) to populate the environment.
We'll load a recent gcc compiler and teach spack about it:
[robl@login1]~% module load gcc/9.1.0
[robl@login1]~% spack compiler find
==> Added 1 new compiler to /home/robl/.spack/linux/compilers.yaml
[email protected]
==> Compilers are defined in the following files:
/ccs/home/robl/.spack/linux/compilers.yaml
Other compilers, such as Intel, IBM's XL, or PGI compilers might work for some or all of the Mochi components, but we know gcc-9 supports the language features used by any of our components.
Many of the Mochi software components have not yet been upstreamed to the
Spack package manager. They are available as a separate software repository
that can be added to spack with the spack repo add
command:
[robl@login1]~% cd ~/bootcamp
[robl@login1]~/bootcamp% git clone https://xgitlab.cels.anl.gov/sds/sds-repo.git
...
[robl@login1]~/bootcamp% spack repo add sds-repo
[robl@login1]~/bootcamp%
==> Added repo with namespace 'sds'.
At this point you can install any Mochi software component, but it will
likely download and compile more packages than are strictly necessary, which
is time consuming. At this point we will install a packages.yaml
file
that customizes Spack by informing it of system packages that it should
reuse, and specifies a subset of network transports to use for Mochi.
We have provided a pre-configured packages.yaml
file for this purpose
that is already tailored to the Summit environment. It primarily does
the following:
- informs Spack to use already available software for certain common system packages (Spack by default will build these packages itself)
- Configure the Mercury RPC package to use the Infiniband 'verbs' interface for our external network fabric package 'libfabric'
You can activate this configuration for your account by doing the following:
cp ~/bootcamp/mochi-boot-camp/ecp-am-2020/sessions/hands-on/packages.yaml ~/.spack/linux/
At this point you are ready to install and run Mochi software!
The following are the most important commands to know:
spack spec <package>
to see what will be installed if you were to install it (including dependencies and version numbers)spack install <package>
to install a packagespack load -r <package>
to load the package into your environmentspack find --loaded
to observe what modules you have loaded
There are two critical commands that you will want to either run every time you log into a Summit node, or else add to your ~/.bashrc file so that they are performed automatically. We recommend the latter to save time:
cat ~/bootcamp/mochi-boot-camp/ecp-am-2020/sessions/hands-on/bashrc.mochi.summit >> ~/.bashrc
Now when you log into Summit moving forward you will have the correct compiler and Spack command line tools available in your environment.
Run the following to download, compile, and install Margo:
spack install margo
This will take a few minutes and will install all of the necessary dependencies, including Mercury and Argobots. You can now load these packages by running:
# re-load spack env to make sure that it finds new packages
. ~/bootcamp/spack/share/spack/setup-env.sh
spack load -r margo
... and inspect to confirm that they are present in your environment with:
spack find --loaded
cd ~/bootcamp/mochi-boot-camp/ecp-am-2020/sessions/hands-on/sum
make
The above example is a very slightly modified (to use Infiniband instead of TCP for communication) copy of the Sending arguments, returning values example from the Mochi Readthedocs page.
This will compile a simple client and server program, linked against margo. If you inspect the Makfile you will see pkg-config commands that are used to find the correct CFLAGS and LDFLAGS for the build.
Summit uses the LSF job scheduler plus some job managment utilites specific to
summit. One requests an allocation of nodes with bsub
and runs a program in
that allocation with jsrun
To get an interactive allocation for 15 minutes:
$ bsub -Is -W 0:15 -nnodes 2 -P CSC332 $SHELL
Job <823109> is submitted to default queue <batch>.
<<Waiting for dispatch ...>>
<<Starting on batch1>>
[\[email protected] \W]\$
You'll need to replace -P CSC332
with the name of whichever project you're using.
The job scheduler will drop you into an interactive login on one of the
"monitor" nodes. Don't run jobs here directly. Instead, we'll use jsrun
.
Note that we asked for two nodes from the scheduler. We'll run one service on one node in the background, then run the client on the other.
First, start the server. The job will inherit your environment, so if you loaded modules and built the client and server, things should work ok.
# no need to change directories: interactive bsub drops you in the directory
# from which you executed it
$ jsrun -n 1 -r 1 -g ALL_GPUS ./server &
[1] 129086
Server running at address ofi+verbs;ofi_rxm://10.41.0.103:49201
Next, run the client, passing it the address string given to you by the server.
[\[email protected] \W]\$ jsrun -n 1 -r 1 -g ALL_GPUS ./client 'ofi+verbs;ofi_rxm://10.41.0.103:49201'
Computed 42 + 43 = 85
Computed 44 + 45 = 89
Computed 46 + 47 = 93
Computed 48 + 49 = 97
Got response: 42+43 = 85
Got response: 44+45 = 89
Got response: 46+47 = 93
Got response: 48+49 = 97
[1] + done jsrun -n 1 -r 1 -g ALL_GPUS ./server
[\[email protected] \W]\$ exit