Snakemake UGE profile

The repository provides a Snakemake profile for running jobs on a Univa Grid Engine (UGE). It is heavily based on the excellent snakemake-lsf profile.

After installation and set-up of this profile (described in detail below), snakemake can be run on a UGE with the simple command:

For snakemake versions less or equal to v7.30:

snakemake --profile uge [snakemake options]

For snakemake versions greater or equal to v8.0:

snakemake --executor cluster-sync --profile uge [snakemake options]

The profile takes care of job submission and status checks. Rule specific parameters can be provided in a separate .yaml file provided in the working directory (see Examples).

Note: For pipelines consisting of many jobs with short excution times (less than or a few minutes), we recommend running snakemake in single node mode i.e. a single multicore UGE interactive job on one node. In these pipelines, the overall run time could be dominated by UGE queueing/processing time. For pipelines with longer job runtimes and/or very different memory/cpu requirements per rule, using the profile described in this repository is recommended.

IMPORTANT: With snakemake version 8, a new CLI for interaction with a high performance compute system was introduced (as described in these release-notes). For this profile to work with snakemake >= 8, the following executor-plugin needs to be installed in the environment from which snakemake is called:

pip install snakemake-executor-plugin-cluster-sync

Profile

To download and set up this profile on your cluster, create a profiles' directory for snakemake:

mkdir -p "${HOME}/.config/snakemake"

Then use cookiecutter to create the profile in the config directory:

cookiecutter --output-dir "${HOME}/.config/snakemake"  "gh:meyer-lab-cshl/snakemake-uge"

The latter command will prompt you to set default parameters described in the next two subsections. Each parameter has default settings and simply pressing enter at the prompt will choose the default setting of that parameter for the profile.

Submission parameters

Parameter explanations as retrieved from snakemake --help.

latency_wait

Default: 45

This sets the default --latency-wait/--output-wait/-w parameter in snakemake.

  --latency-wait SECONDS, --output-wait SECONDS, -w SECONDS
                      Wait given seconds if an output file of a job is not
                      present after the job finished. This helps if your
                      filesystem suffers from latency (default 120).

use_conda

Default: True
Valid options: False, True

This sets the default --use-conda parameter in snakemake.

 --use-conda           If defined in the rule, run job in a conda
                      environment. If this flag is not set, the conda
                      directive is ignored.

use_singularity

Default: False
Valid options: False, True

This sets the default --use-singularity parameter in snakemake.

  --use-singularity     If defined in the rule, run job within a singularity
                       container. If this flag is not set, the singularity
                       directive is ignored.

keep_going

Default: True
Valid options: False, True

This sets the default --keep-going parameter in snakemake.
```
--keep-going        Go on with independent jobs if a job fails.
```

restart_times

Default: 0

This sets the default --restart-times parameter in snakemake.

  --restart-times RESTART_TIMES
                      Number of times to restart failing jobs (defaults to
                      0).

jobs

Default: 500

This sets the default --cores/--jobs/-j parameter in snakemake.

  --cores [N], --jobs [N], -j [N]
                      Use at most N cores in parallel. If N is omitted or
                      'all', the limit is set to the number of available
                      cores.

In the context of a cluster, -j denotes the number of jobs submitted to the cluster at the same time¹.

default_mem_mb

Default: 1024

This sets the default memory, in megabytes, for a rule being submitted to the cluster without mem_mb set under resources.

See below for how to overwrite this in a rule.
default_threads

Default: 1

This sets the default number of threads for a rule being submitted to the cluster without the threads variable set.

See below for how to overwrite this in a rule.

NOTE: The submission script takes care of converting the threads and memory specified in MegaByte per rule into a memory request "per thread" in GigaByte.
default_cluster_logdir

Default: "cluster_logs"

This sets the directory under which cluster log files are written. The path is relative to the working directory of the pipeline. If it does not exist, it will be created.
default_queue

Default: None

The default queue on the cluster to submit jobs to. If left unset, then the default on your cluster will be used. The qsub parameter that this controls is [-q][qsub-q].
profile_name

Default: uge

The name to use for this profile. The directory for the profile is created as this name i.e. $HOME/.config/snakemake/<profile_name>. This is also the value you pass to snakemake --profile <profile_name>.
print_shell_commands

Default: False Valid options: False, True

This sets the default --printshellcmds/-p parameter in snakemake.
```
  --printshellcmds, -p  Print out the shell commands that will be executed.
```

Status check parameters

The status check parameters dsecribed here should not be changed unless discussed with IT. The compute cluster is a shared resource and running workflow managers like snakemake submitting large amounts of jobs and high frequency status checks can slow down the compute environment for everyone. Should issues occur with this profile and job status checks by snakemake, for instance snakemake.exceptions.WorkflowError: Failed to obtain job status. errors, it is recommended to set log_status_checks to True to track the issues.

log_status_checks

Default: False

When set, status check tries and exceptions are printed to stderr. Recommended to set to True for issues with status checks by snakemake, e.g. snakemake.exceptions.WorkflowError: Failed to obtain job status. errors.

Usage

Once set up is complete, this will allow you to run snakemake with the cluster profile using the --profile flag. For profile name uge, you can run:

snakemake --profile uge [snakemake options]

and pass any other valid snakemake options.

Standard rule-specific cluster resource settings

The following resources can be specified within a rule in the Snakemake file:

threads: <INT> the number of threads needed for the job. If not specified, will default to the amount you set when initialising the profile. As stated in the snakemake manual, it should be noted that the specified threads have to be seen as a maximum. When Snakemake is executed with fewer cores, the number of threads will be adjusted.
resources:
- mem_mb = <INT>: the memory required for the rule, in megabytes. If not specified, will default to the amount you set when initialising the profile. For details on memory specification see the snakemake documentation on resources.

NOTE: these settings within the snakemake rules will override the profile defaults.

Non-standard rule-specific cluster resource settings

Since the deprecation of cluster configuration files the ability to specify per-rule cluster settings is snakemake-profile-specific.

Per-rule configuration must be placed in a file called <profile_name>.yaml and must be located in the working directory for the pipeline. If you set workdir manually within your workflow, the config file has to be in there.

Common parameters that can be provided to the cluster configuration (for details check man qsub):

runtime: the maximum amount of time the job will be allowed to run for
```
  -l h_rt={runtime_hr}:{runtime_min}:00
```
queue: override the default queue for this job.
```
  -q QUEUENAME
```
output: override the default name of stdout logfile
```
  -o path/to/file/for/output_stream
```
error: override the default name of stderr logfile
```
  -o path/to/file/for/output_stream
```
jobname: override the default name of the job
```
  -N JOBNAME
```
project: Specifies the project to which this job is assigned to
```
  -P PROJECTNAME
```

NOTE: these settings are highly specific to the UGE cluster system and this profile and are not guaranteed to be valid on non-UGE cluster systems.

All settings are given with the rule name as the key, and the additional cluster settings as a list (sequence), with the UGE-specific flag followed by its argument (if applicable).

NOTE: Directory paths should not be used as wildcards. If a directory is used as a wildcard, any "/" will be replaced with "-".

Examples

Snakefile

rule grep:
    input: "input.txt"
    output: "output.txt"
    shell:
        "grep 'icecream' {input} > {output}"

rule count:
    input: "output.txt"
    output: "output_count.txt"
    shell:
        "wc -l {input} > {output}"

uge.yaml

__default__:
  - "-P standard "
  - "l h_rt=01:00:00"

grep:
  - "-P icecream "
  - "-N icecream.search"

In this example, we specify a default (__default__) project (-P) and runtime limit (-l h_rt=01:00:00 ) that will apply to all rules. We then override the project and, additionally, specify a new job name for the rule grep. This will lead to a submission command, for grep that looks something like

$ qsub [options] -P standard -l h_rt=01:00:00 -P icecream -N icecream.search ...

Although -P is provided twice, UGE uses the last instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Snakemake UGE profile

Table of Contents

Install

Dependencies

Profile

Submission parameters

Status check parameters

Usage

Standard rule-specific cluster resource settings

Non-standard rule-specific cluster resource settings

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

Snakemake UGE profile

Table of Contents

Install

Dependencies

Profile

Submission parameters

Status check parameters

Usage

Standard rule-specific cluster resource settings

Non-standard rule-specific cluster resource settings

Examples