-
Notifications
You must be signed in to change notification settings - Fork 33
Cromwell
Cromwell is a workflow engine which can be used to execute containers on a variety of platforms.
https://cromwell.readthedocs.io/
In particular, we are interested in using Cromwell to run ChRIS plugins using Singularity via a SLURM scheduler.
Actually, Cromwell is most similar in functionality as the pfcon + pman duo.
For our convenience, we have implemented a shim for pman to dispatch requests to Cromwell. In theory, this means all platforms supported by Cromwell are now also supported by ChRIS.
pfcon
, pman
, and cromwell
are running on a container host, where the cromwell
container has access to a SLURM cluster via sbatch
, squeue
, and scancel
commands. pfcon
is responsible for localization and delocalization of data (i.e. moving data to and from a filesystem mounted by the SLURM cluster).`
The container user (or mapped UID) of pfcon and Cromwell must be the same, and one of an authorized SLURM user.
Cromwell must be configured to support WDLs generated by the Jinja2 template defined in pman/cromwell/slurm/wdl.py.
include required(classpath("application"))
backend {
default = SLURM
providers {
SLURM {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
Int timelimit = 30
Int cpu = 1
Int memory_mb = 4000
Int gpu_limit = 0
Int number_of_workers = 1
String slurm_partition = "short"
String slurm_account = "mylab"
String docker
String sharedir
"""
submit-docker = """
# https://cromwell.readthedocs.io/en/stable/tutorials/Containers/#job-schedulers
# https://github.com/broadinstitute/cromwell/blob/develop/cromwell.example.backends/singularity.slurm.conf
sbatch -J ${job_name} \
-D ${cwd} -o ${out} -e ${err} -t ${timelimit} \
-p ${slurm_partition} -A ${slurm_account} \
--cpus-per-task ${cpu} \
--mem ${memory} \
--gpus-per-task ${gpu_limit} \
--nodes ${number_of_workers} \
chrispl_singularity_wrapper.sh \
${cwd}:${docker_cwd} \
${docker} ${job_shell} ${docker_script} ${sharedir}
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
}
}
Usually, the GPU nodes of a SLURM cluster are in a different partition. So it may be helpful to add something like this to your configuration:
partition=${slurm_partition}
if [ "${gpu_limit}" -gt "0" ]; then
partition=has-gpu
fi
GPU usage on SLURM can be different. Consult your cluster's specific documentation. E.g. it is common for the --gres
flag to be used:
sbatch ... --gres=gpu:Titan_RTX:${gpu_limit} ...
chrispl_singularity_wrapper.sh
should be a wrapper script which executes ChRIS plugins using Apptainer. It can also have more features such as management of the Singularity image build cache. Here is a basic example satisfying the above usage:
#!/bin/bash -ex
cwd="$1"
image="$2"
shell="$3"
script="$4"
sharedir="$5"
if [ -n "$SLURM_JOB_GPUS" ]; then
gpu_flag='--nv'
fi
export SINGULARITY_CACHEDIR=/work/singularity-cache
module load singularity/3.8.5
exec singularity exec --containall $gpu_flag -B "$cwd" -B "$sharedir:/share" "docker://$image" "$shell" "$script"
pman should be configured with the environment variables:
SECRET_KEY=aaaaaaaa
CONTAINER_ENV=cromwell
CROMWELL_URL=http://example.com/
STORAGE_TYPE=host
STOREBASE=/some/path/on/nfs/server
TIMELIMIT_MINUTES=30
/some/path/on/nfs/server
should be a filesystem mounted in pfcon
and in the SLURM cluster on the same path.
Authentication with Cromwell is not currently supported.