Skip to content
robnagler edited this page Feb 18, 2022 · 3 revisions

Using MPI on Jupyter Cluster

Jupyter servers on our JupyterHub cluster only have a few cores. To run jobs with MPI or OpenMP, you can request one or more compute nodes, which are assigned manually devops. Use Slack channel #rsmpi to make allocation requests.

Once allocated, you can access the nodes by running rsmpi, which is a wrapper for mpiexec. To learn the options, just run rsmpi without arguments:

jupyter$ rsmpi
error: missing command argument
usage: rsmpi [-n processes] [-h hosts] [-t tasks-per-host] <mpi-command args...>

Starts mpiexec <mpi-command args...> with specified processes and hosts.

Options:
    hosts: indices of hosts to use; all hosts is 2 [default]
    processes: integer between 1 and 40 [default]
    tasks-per-host: integer between 1 and 20 [default]

In this example, the compute nodes (hosts) are numbers 1 and 2. Each compute node may have up to 20 cores (tasks-per-host), which allows up to 40 MPI processes (processes)

Just like mpiexec, you can run with -n to specify a number of MPI processes, e.g.

jupyter$ rsmpi -n 2 echo hello
hello
hello

Processes are allocated always on the first host, then the second host, etc. unless you specify the host number on the rsmpi command line, e.g.

jupyter$ rsmpi -h 2 -n 2 echo hello
hello
hello

Here, two processes were executed on your host 2.

Tasks-per-host allows you to reduce the default number (20 above) of processes per host, for example:

jupyter$ rsmpi -t 2 hostname
rs1.local
rs1.local
rs2.local
rs2.local

hostname ran twcie on each host, because we set tasks-per-host to 2. If we hadn't set it, it would have run 40 times (20 times per host).

Monitoring Status

If you have multiple hosts, you will need to remember which hosts are in use. You can always check the host status by running ps as follows:

jupyter$ rsmpi -n 1 -h 1 ps
  PID TTY          TIME CMD
    1 ?        00:00:00 tini
    6 ?        00:00:03 jupyter-labhub
 1102 ?        00:00:00 hydra_pmi_proxy
 1103 ?        00:00:00 sleep
 1119 ?        00:00:00 hydra_pmi_proxy
 1120 ?        00:00:00 ps

This checks the status of processes on host 1. You can see there are two hydra_pmi_proxy processes, which indicates there are two rsmpi jobs running on the machine: the ps and a sleep 100.

Using OpenMP on MPI Cluster

rsmpi allows execution of OpenMP programs, but you will need to manage the threads. Our hosts are hyperthreaded, and OpenMP treats hyperthreads as real cores. For compute bound jobs, please set $OMP_NUM_THREADS before executing. You will also need to set -n 1 so that mpiexec will only start one process. And, since OpenMP does not support inter-node communication, you will want to specify the host. For example,

jupyter$ OMP_NUM_THREADS=20 rsmpi -n 1 -h 1 some-open-mp-program
Clone this wiki locally