manage the pipeline jobs via the nf-core/configs: KAUST Configuration
The purpose of this custom configurations is to streamline executing nf-core pipelines on the KAUST Ibex cluster.
We have a wiki page dedicated to the Bioinformatics team at KAUST to help users: Bioinformatics Workflows.
The recommended way to activate Nextflow
, that is needed to run the nf-core
workflows on Ibex,
is to use the module system:
# Log in to the desired cluster
ssh <USER>@ilogin.ibex.kaust.edu.sa
# Activate the modules, you can also choose to use a specific version with e.g. `Nextflow/24.04.4`.
module load nextflow
Launch the pipeline with -profile kaust
(one hyphen) to run the workflows using the KAUST profile.
This will download and launch the kaust.config
which has been pre-configured with a setup suitable for the KAUST servers.
It will enable Nextflow
to manage the pipeline jobs via the Slurm
job scheduler and Singularity
to run the tasks.
Using the KAUST profile, Docker
image(s) containing required software(s) will be downloaded, and converted to Singularity
image(s) if needed before execution of the pipeline. To avoid downloading same images by multiple users, we provide a singularity libraryDir
that is configured to use images already downloaded in our central container library. Images missing from our library will be downloaded to the user's directory as defined by cacheDir
.
Additionally, institute-specific pipeline profiles exists for:
- mag
- rnaseq
We provide a collection of reference genomes, enabling users to run workflows seamlessly without the need to download the files. To enable access to this resource, add the species name with the --genome
parameter.
The KAUST profile makes running the nf-core workflows as simple as:
# Load Nextflow and Singularity modules
module purge
module load nextflow
module load singularity
# Launch nf-core pipeline with the kaust profile, e.g. for analyzing human data:
$ nextflow run nf-core/<PIPELINE> -profile kaust -r <PIPELINE_VERSION> --genome GRCh38.p14 --samplesheet input.csv [...]
Where input_csv
contains information about the samples and datafile paths.
Remember to use -bg
to launch Nextflow
in the background, so that the pipeline doesn't exit if you leave your terminal session.
Alternatively, you can also launch a tmux
or a screen
session to run the commands above. Another good way, is to run it as an independent sbatch job as explained here.
Please let us know if there are particular processes that continously fail so that we modify the defaults in the corresponding pipeline profile.