Legate Quickstart provides scripts for building Legate libraries from source and running Legate programs with appropriate defaults for a number of supported clusters (and auto-detected settings for local installs).
The scripts in this repository will detect if you are running on the login node of a supported cluster, and automatically use the appropriate flags to build and run Legate.
The scripts will automatically invoke the appropriate job scheduler commands, so you don't need to create jobscripts yourself. Please run the commands directly from the login node.
Even if your specific cluster is not covered, you may be able to adapt an
existing workflow; look for all the places where the PLATFORM
variable is
checked and add a case for your cluster.
You can use the same scripts on your local machine, in which case the build/run flags will be set according to the detected hardware resources.
Invoke any script with -h
to see more available options.
This section covers clusters where you build on a distributed filesystem, and run your application directly on the compute node hardware.
Find your cluster below, and add the corresponding suggested setup instructions
to ~/.bash_profile
, ~/.bashrc
or similar shell startup file.
module load python
module load cudatoolkit
module load craype-accel-nvidia80
module load cray-pmi
module del cray-libsci
module load cuda/11.0.3 gcc/9.3.0
# optionally: module load openblas/0.3.20-omp
# can then skip openblas from conda env, and build cuNumeric using --with-openblas $OLCF_OPENBLAS_ROOT
module swap PrgEnv-cray PrgEnv-gnu/6.0.9
module load daint-gpu
module load cudatoolkit/11.2.0_3.39-2.1__gf93aa1c
module load cuda/11.7 mpi/openmpi/4.1.5 slurm/23.02.1
module load gcc/8.3.1 cuda/11.1.0
If your cluster doesn't provide an installation of conda, you can use an appropriate installer from https://github.com/conda-forge/miniforge/#download to perform a user-specific installation.
Use the scripts/generate-conda-envs.py
script from legate.core to create a
conda environment, e.g.:
git clone https://github.com/nv-legate/legate.core <legate.core-dir>
cd <legate.core-dir>
./scripts/generate-conda-envs.py --python 3.10 --ctk 12.0.1 --os linux --ucx
conda env create -n legate -f environment-test-linux-py310-cuda12.0.1-ucx.yaml
Note that Quickstart uses the UCX networking backend by default on supported clusters (and on local-machine non-MacOS installations, if you choose to build with networking support), so --ucx is required in those cases.
See https://github.com/nv-legate/legate.core/blob/HEAD/BUILD.md#getting-dependencies-through-conda for more details on installing dependencies from conda.
Make sure you use an environment file with a --ctk
version matching the
system-wide CUDA version (i.e. the version provided by the CUDA module
you
load).
You may wish to auto-activate this environment on login, by doing conda activate
in your shell startup file.
See below for special instructions required on certain clusters:
Make sure conda environments are installed under /global/common
, so they are accessible
from all nodes. Do this by installing using a prefix rather a name:
conda env create -p /global/common/sub/dir -f env.yaml
and activate with:
conda activate /global/common/sub/dir
The pkg-config
package from conda-forge may be conflicting with the system version, in
which case you should do:
conda uninstall pkg-config
git clone https://gitlab.com/StanfordLegion/legion.git <legion-dir>
git clone https://github.com/nv-legate/legate.core <legate.core-dir>
git clone https://github.com/nv-legate/cunumeric <cunumeric-dir>
cd <legate.core-dir>
LEGION_DIR=<legion-dir> <quickstart-dir>/build.sh
cd <cunumeric-dir>
<quickstart-dir>/build.sh
<quickstart-dir>/run.sh <num-nodes> <legate-args> <py-program> <program-args>
On container-based clusters typically each user prepares an image ahead of time and provides it at job submission time, to be instantiated on each allocated node. Such clusters utilize a cluster-aware container engine, such as Pyxis/Enroot, Singularity or Shifter.
The make_image.sh
script can be used to build Docker images containing all
Legate libraries.
Certain configuration options, such as the CUDA Toolkit version to use, should be specified at docker build time, to match the environment where the image will be used. For example:
CUDA_VER=11.8.0 ./make_image.sh
See make_image.sh -h
for a full list of such options.
After building the image, you can test it locally:
docker run -it --rm --gpus all <image> /bin/bash
Once inside the container, you can try running some examples:
legate --gpus 1 --fbmem 15000 /opt/legate/cunumeric/examples/gemm.py
Note that to use Nvidia GPUs from inside a container the host needs to provide a CUDA installation at least as recent as the version used in the image, and a GPU-aware container execution engine like nvidia-docker.
The run.sh
script can handle container-based workflows when run directly on the
login node, but will need to be specialized for each particular cluster; look for
all the places where the PLATFORM
variable is checked in run.sh
, and add a
case for your cluster.
Even though you are meant to invoke the run.sh
script from the login node, any
paths on the command line will refer to files within the image, not the
filesystem on the host cluster. For example, you cannot (by default) invoke a
python program stored on your home directory on the login node, only python
files already included within the image. If you wish to use files from a
directory on the host filesystem, you need to explicitly mount that directory
inside the container (see the MOUNTS
argument of run.sh
).
If you have questions, please contact us at legate(at)nvidia.com.