The repository provides a Snakemake profile for running jobs on a Univa Grid Engine (UGE). It is heavily based on the excellent snakemake-lsf profile.
After installation and set-up of this profile (described in detail below), snakemake can be run on a UGE with the simple command:
For snakemake versions less or equal to v7.30:
snakemake --profile uge [snakemake options]
For snakemake versions greater or equal to v8.0:
snakemake --executor cluster-sync --profile uge [snakemake options]
The profile takes care of job submission and status checks. Rule specific parameters can be provided in a separate .yaml file provided in the working directory (see Examples).
Note: For pipelines consisting of many jobs with short excution times (less than or a few minutes), we recommend running snakemake in single node mode i.e. a single multicore UGE interactive job on one node. In these pipelines, the overall run time could be dominated by UGE queueing/processing time. For pipelines with longer job runtimes and/or very different memory/cpu requirements per rule, using the profile described in this repository is recommended.
This profile is deployed using Cookiecutter. cookiecutter
can be installed using conda
or pip
:
pip install --user cookiecutter
# or
conda install -c conda-forge cookiecutter
IMPORTANT: With snakemake version 8, a new CLI for interaction with a high performance compute system was introduced (as described in these release-notes). For this profile to work with snakemake >= 8, the following executor-plugin needs to be installed in the environment from which snakemake is called:
pip install snakemake-executor-plugin-cluster-sync
To download and set up this profile on your cluster, create a profiles' directory for snakemake:
mkdir -p "${HOME}/.config/snakemake"
Then use cookiecutter to create the profile in the config directory:
cookiecutter --output-dir "${HOME}/.config/snakemake" "gh:meyer-lab-cshl/snakemake-uge"
The latter command will prompt you to set default parameters described in the next two subsections. Each parameter has default settings and simply pressing enter at the prompt will choose the default setting of that parameter for the profile.
Parameter explanations as retrieved from snakemake --help
.
-
latency_wait
Default:
45
This sets the default
--latency-wait/--output-wait/-w
parameter insnakemake
.--latency-wait SECONDS, --output-wait SECONDS, -w SECONDS Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 120).
-
use_conda
Default:
True
Valid options:False
,True
This sets the default
--use-conda
parameter insnakemake
.--use-conda If defined in the rule, run job in a conda environment. If this flag is not set, the conda directive is ignored.
-
use_singularity
Default:
False
Valid options:False
,True
This sets the default
--use-singularity
parameter insnakemake
.--use-singularity If defined in the rule, run job within a singularity container. If this flag is not set, the singularity directive is ignored.
-
keep_going
Default:
True
Valid options:False
,True
This sets the default
--keep-going
parameter insnakemake
.--keep-going Go on with independent jobs if a job fails.
-
restart_times
Default:
0
This sets the default
--restart-times
parameter insnakemake
.--restart-times RESTART_TIMES Number of times to restart failing jobs (defaults to 0).
-
jobs
Default:
500
This sets the default
--cores/--jobs/-j
parameter insnakemake
.--cores [N], --jobs [N], -j [N] Use at most N cores in parallel. If N is omitted or 'all', the limit is set to the number of available cores.
In the context of a cluster,
-j
denotes the number of jobs submitted to the cluster at the same time1. -
default_mem_mb
Default:
1024
This sets the default memory, in megabytes, for a
rule
being submitted to the cluster withoutmem_mb
set underresources
.See below for how to overwrite this in a
rule
. -
default_threads
Default:
1
This sets the default number of threads for a
rule
being submitted to the cluster without thethreads
variable set.See below for how to overwrite this in a
rule
.NOTE: The submission script takes care of converting the threads and memory specified in MegaByte per rule into a memory request "per thread" in GigaByte.
-
default_cluster_logdir
Default:
"cluster_logs"
This sets the directory under which cluster log files are written. The path is relative to the working directory of the pipeline. If it does not exist, it will be created.
-
default_queue
Default: None
The default queue on the cluster to submit jobs to. If left unset, then the default on your cluster will be used. The
qsub
parameter that this controls is [-q
][qsub-q]. -
profile_name
Default:
uge
The name to use for this profile. The directory for the profile is created as this name i.e.
$HOME/.config/snakemake/<profile_name>
. This is also the value you pass tosnakemake --profile <profile_name>
. -
print_shell_commands
Default:
False
Valid options:False
,True
This sets the default
--printshellcmds/-p
parameter insnakemake
.--printshellcmds, -p Print out the shell commands that will be executed.
The status check parameters dsecribed here should not be changed unless discussed with IT.
The compute cluster is a shared resource and running workflow managers like snakemake submitting large amounts
of jobs and high frequency status checks can slow down the compute environment for everyone. Should issues
occur with this profile and job status checks by snakemake,
for instance snakemake.exceptions.WorkflowError: Failed to obtain job status.
errors, it is
recommended to set log_status_checks
to True to track the issues.
-
log_status_checks
Default: False
When set, status check tries and exceptions are printed to stderr. Recommended to set to True for issues with status checks by snakemake, e.g.
snakemake.exceptions.WorkflowError: Failed to obtain job status.
errors.
Once set up is complete, this will allow you to run snakemake with the cluster
profile using the --profile
flag. For profile name uge
, you can run:
snakemake --profile uge [snakemake options]
and pass any other valid snakemake options.
The following resources can be specified within a rule
in the Snakemake file:
-
threads: <INT>
the number of threads needed for the job. If not specified, will default to the amount you set when initialising the profile. As stated in the snakemake manual, it should be noted that the specified threads have to be seen as a maximum. When Snakemake is executed with fewer cores, the number of threads will be adjusted. -
resources:
mem_mb = <INT>
: the memory required for the rule, in megabytes. If not specified, will default to the amount you set when initialising the profile. For details on memory specification see the snakemake documentation on resources.
NOTE: these settings within the snakemake rules will override the profile defaults.
Since the deprecation of cluster configuration files the ability to specify per-rule cluster settings is snakemake-profile-specific.
Per-rule configuration must be placed in a file called <profile_name>.yaml
and must be located in the working directory for the pipeline. If you set
workdir
manually within your workflow, the config file has to be in there.
Common parameters that can be provided to the cluster configuration (for
details check man qsub
):
-
runtime
: the maximum amount of time the job will be allowed to run for-l h_rt={runtime_hr}:{runtime_min}:00
-
queue
: override the default queue for this job.-q QUEUENAME
-
output
: override the default name of stdout logfile-o path/to/file/for/output_stream
-
error
: override the default name of stderr logfile-o path/to/file/for/output_stream
-
jobname
: override the default name of the job-N JOBNAME
-
project
: Specifies the project to which this job is assigned to-P PROJECTNAME
NOTE: these settings are highly specific to the UGE cluster system and this profile and are not guaranteed to be valid on non-UGE cluster systems.
All settings are given with the rule
name as the key, and the additional
cluster settings as a list (sequence), with the UGE-specific
flag followed by its argument (if applicable).
NOTE: Directory paths should not be used as wildcards. If a directory is used as a wildcard, any "/" will be replaced with "-".
Snakefile
rule grep:
input: "input.txt"
output: "output.txt"
shell:
"grep 'icecream' {input} > {output}"
rule count:
input: "output.txt"
output: "output_count.txt"
shell:
"wc -l {input} > {output}"
uge.yaml
__default__:
- "-P standard "
- "l h_rt=01:00:00"
grep:
- "-P icecream "
- "-N icecream.search"
In this example, we specify a default (__default__
) project (-P
)
and runtime limit (-l h_rt=01:00:00
) that will apply to all rules.
We then override the project and, additionally, specify a new job name for
the rule grep
. This will lead to a submission command, for grep
that looks something like
$ qsub [options] -P standard -l h_rt=01:00:00 -P icecream -N icecream.search ...
Although -P
is provided twice, UGE uses the last instance.