-
Notifications
You must be signed in to change notification settings - Fork 4
rsmpi
Jupyter servers on our JupyterHub cluster only have a few cores. To
run jobs with MPI or OpenMP, you
can request one or more
compute nodes, which are assigned manually devops. Use Slack channel
#rsmpi
to make allocation requests.
Once allocated, you can access the nodes by running
rsmpi
, which is a wrapper for mpiexec. To
learn the options, just run rsmpi without arguments:
jupyter$ rsmpi
error: missing command argument
usage: rsmpi [-n processes] [-h hosts] [-t tasks-per-host] <mpi-command args...>
Starts mpiexec <mpi-command args...> with specified processes and hosts.
Options:
hosts: indices of hosts to use; all hosts is 2 [default]
processes: integer between 1 and 40 [default]
tasks-per-host: integer between 1 and 20 [default]
In this example, the compute nodes (hosts) are numbers 1 and 2. Each compute node may have up to 20 cores (tasks-per-host), which allows up to 40 MPI processes (processes)
Just like mpiexec, you can run with -n
to specify a number of MPI
processes, e.g.
jupyter$ rsmpi -n 2 echo hello
hello
hello
Processes are allocated always on the first host, then the second
host, etc. unless you specify the host number on the rsmpi
command
line, e.g.
jupyter$ rsmpi -h 2 -n 2 echo hello
hello
hello
Here, two processes were executed on your host 2
.
Tasks-per-host allows you to reduce the default number (20 above) of processes per host, for example:
jupyter$ rsmpi -t 2 hostname
rs1.local
rs1.local
rs2.local
rs2.local
hostname
ran twcie on each host, because we set tasks-per-host
to 2. If we hadn't set it, it would have run 40 times (20 times per
host).
If you have multiple hosts, you will need to remember which hosts are
in use. You can always check the host status by running ps
as
follows:
jupyter$ rsmpi -n 1 -h 1 ps
PID TTY TIME CMD
1 ? 00:00:00 tini
6 ? 00:00:03 jupyter-labhub
1102 ? 00:00:00 hydra_pmi_proxy
1103 ? 00:00:00 sleep
1119 ? 00:00:00 hydra_pmi_proxy
1120 ? 00:00:00 ps
This checks the status of processes on host 1
. You can see there are
two hydra_pmi_proxy
processes, which indicates there are two rsmpi
jobs running on the machine: the ps
and a sleep 100
.
rsmpi
allows execution of OpenMP programs, but you will need to
manage the threads. Our hosts are hyperthreaded, and OpenMP treats
hyperthreads as real cores. For compute bound jobs, please set
$OMP_NUM_THREADS
before executing. You will also need to set -n 1
so that mpiexec will only start one process. And, since OpenMP does
not support inter-node communication, you will want to specify the
host. For example,
jupyter$ OMP_NUM_THREADS=20 rsmpi -n 1 -h 1 some-open-mp-program