From ce5ed983dc0f2eeb53d89fa5f3c3c5427bcc4077 Mon Sep 17 00:00:00 2001 From: Luo Jian Date: Sat, 16 Nov 2024 12:48:31 +0800 Subject: [PATCH] docs: deploy slurm cluster in a namespace aligned with the cluster name rest.RenderSecret renders the cluster name as the namespace Signed-off-by: Luo Jian --- README.md | 3 ++- images/common/scripts/complement_jail.sh | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5256381d..0b3d8225 100644 --- a/README.md +++ b/README.md @@ -156,7 +156,8 @@ In general, you need to follow these steps: 2. Install the [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator). 3. If you use InfiniBand, install the [NVIDIA Network Operator](https://github.com/Mellanox/network-operator). 4. Install Soperator by applying the [soperator](helm/soperator) Helm chart. -5. Create a Slurm cluster by applying the [slurm-cluster](helm/slurm-cluster) Helm chart. +5. Create a Slurm cluster in a namespace with the same name as the slurm cluster by + applying the [slurm-cluster](helm/slurm-cluster) Helm chart. 6. Wait until the `slurm.nebius.ai/SlurmCluster` resource becomes `Available`. [//]: # (TODO: Refer to Helm OCI images instead of file directories when the repo is open) diff --git a/images/common/scripts/complement_jail.sh b/images/common/scripts/complement_jail.sh index b949c8a9..d211361c 100755 --- a/images/common/scripts/complement_jail.sh +++ b/images/common/scripts/complement_jail.sh @@ -2,7 +2,7 @@ # Complement jaildir by bind-mounting virtual filesystems, users, and NVIDIA binaries from the host filesystem -set -x # Print actual command when before +set -x # Print actual command before executing it set -e # Exit immediately if any command returns a non-zero error code usage() { echo "usage: ${0} -j -u [-w] [-h]" >&2; exit 1; }