-
Notifications
You must be signed in to change notification settings - Fork 4
Home
snakemake library for various applications, with a focus on bioinformatics and next-generation sequencing.
Comparison with biomake
snakemakelib is basically a port of the rules in biomake to snakemake. The design principles are similar in that my aim is to compile a library of rules that can be reused and configured via a simple configuration interface.
Use the rules at your own risk, and make sure you understand them
before running any commands. I take no responsibility if you'd happen
to run a snakemake clean
in an inappropriate location, removing
precious data in the process. You have been warned!
The snakemake rules contain general recipies for commonly used applications and bioinformatics programs. The use cases reflect the needs I've had and do by no means have a comprehensive coverage. Nevertheless, many commands are so commonly used that the recipes may be of general interest.
Clone the repository https://github.com/percyfal/snakemakelib to an appropriate location:
git clone https://github.com/percyfal/snakemakelib /path/to/snakemakelib
snakemake version >= 3.1 that supports the global config
variable.
The intended usage is that the user first creates a Snakefile for use
with a particular dataset/problem. Thereafter, include statements are
used to include rules of interest. Here, we include the rules for the
aligner bwa
:
#-*- snakemake -*-
# Snakefile example
# Add path to snakemakelib, unless installed in a virtualenv or similar
sys.path.append('/path/to/snakemakelib')
# Include settings and utilities
include: "/path/to/snakemake/rules/settings.rules"
include: "/path/to/snakemake/rules/utils.rules"
# Include rules for bwa
include: "/path/to/snakemake/rules/bio/ngs/align/bwa.rules"
snakemake includes options to view tasks:
snakemake -l
The bwa_mem
rule comes from bwa.rules
. In addition, the rules file
utils.rules
included above defines convenience rules for viewing
more detailed rule information. For instance, the rule rule_ll
shows
the following:
snakemake rule_ll
This rule prints the docstring and the definitions of the input
,
output
, and shell
parameters. In the example above, we see that
the output looks like {prefix}.bam
, where {prefix}
is a wildcard
that matches a given pattern in the input. To see what they look like
in this example, run
snakemake test.bam
As its name implies, Snakemake works like GNU Make in that one seeks
to build a target output, in this case test.bam
. Had the files
test_R1_001.fastq
and test_R2_001.fastq
been present, the rule
bwa_mem
had run the command defined in the shell
section.
The implementation of the configuration interface is still very much work in progress and is likely to undergo substantial changes!
The purpose of snakemakelib is to build a library of rules that can be reused without actually writing them anew. The motivation is that only parameters, e.g. program options, inputs and outputs, of a rule change from time to time, but the rule execution is identical. Therefore, my aim is to provide a very simplistic configuration interface in which the rule parameters can be modified with simple strings.
To begin with, each rule file consists of rules and an accompanying default configuration. The latter ensures that all rules have sensible defaults set, regardless whether the user decides to modify them or not. In principle, a rule file has two parts:
- default configuration
- rules
The configuration is a modified dict object, with at most three levels:
namespace
section/parameter
parameter
The namespace is an identifier for the rules file, and should be
named path.to.rules
, where path
and to
are directory names
relative to the rules root path. The section/option is either a
parameter related to the program, or a subprogram which in turn can
have parameters assigned to it. The configuration default in
bwa.rules
is
config_default = BaseConfig({
'bio.ngs.align.bwa' : BaseConfig({
'cmd' : "bwa",
'ref' : sml_config['bio.ngs.settings']['db']['ref'],
'threads' : sml_config['bio.ngs.settings']['threads'],
'options' : "-M",
'mem' : BaseConfig({
'options' : "",
}),
}),
})
The modified dict is a snakemakelib.config.BaseConfig object that
does simple type checking and basically ensures that the user only can
modify keys that have already been defined. The namespace is
bio.ngs.align.bwa
, reflecting the fact that the rules file is
located in the folder rules/bio/ngs/align
and is named bwa.rules
.
sml_config
is a global BaseConfig object that stores all loaded
rule configurations. Incidentally, this example shows another key idea
of the configuration, namely that some options inherit from rules
files higher up in the file hierarchy. The rules file
rules/bio/ngs/settings.rules
contains a generic configuration that
is common to all ngs rules. This implementation makes it possible to
override settings for specific programs, like for instance the ref
parameter above.
utils.rules
defines a rule conf
that can be used to view the
current configuration of included files:
snakemake conf
The output is section according to namespace, i.e. the rules file.
Furthermore, Snakemake defines its own global configuration variable
config that can be accessed via the command line. At the end of file
rules/bio/ngs/settings.rules
, three Snakemake config options have
been added that are useful in the context of ngs:
# Add configuration variable to snakemake global config object
config['lanes'] = []
config['samples'] = []
config['flowcells'] = []
A user can modify the configuration by defining a BaseConfig object
and updating the sml_config
object mentioned in the previous
section. This is done in the Snakefile that uses include
statements
to include rules files, and must be done before any include
statement. The reason is that when a rules file is included, the
default configuration values are compared to the existing
sml_config
. If the user has defined custom configurations, these
will take precedence over the default values. If no custom
configuration exists, the default values are applied.
As an example, imagine we want to change the options to -k 10 -w 50
for bwa mem
in the example Snakefile above. The modified Snakefile
would then look as follows:
#-*- snakemake -*-
# Snakefile example
# Add path to snakemakelib, unless installed in a virtualenv or similar
sys.path.append('/path/to/snakemakelib')
# Import config-related stuff
from snakemakelib.config import init_sml_config, update_sml_config, BaseConfig
my_config = BaseConfig({
'bio.ngs.align.bwa' : BaseConfig({
'mem' : BaseConfig({
'options' : "-k 10 -w 50",
}),
})
})
# Initialize configuration
init_sml_config(my_config)
# Include settings and utilities
include: "/path/to/snakemake/rules/settings.rules"
include: "/path/to/snakemake/rules/utils.rules"
# Include rules for bwa
include: "/path/to/snakemake/rules/bio/ngs/align/bwa.rules"
Currently, it is necessary to use exactly the same structure as that of the default configuration for the relevant sections.
TO BE IMPLEMENTED
Alternatively, the user configuration can be loaded from a yaml file. The user configuration in the previous section would simply look like
bio.ngs.align.bwa:
mem:
options: "-k 10 -w 50"