class: middle, center
#### Your Host: Nathan Weeks- What are (Singularity) containers?
- Finding container images
- Using Singularity on Cannon
- Building Singularity container images
- Common pitfalls
- Useful tips & tricks
-
Log into Cannon (either via SSH, or a web browser using the FASRC VDI portal (FASRC VPN connection required)
-
Launch an interactive job on a compute node:
salloc -p test,shared -t 2:00:00 --mem=4g
*NOTE:* the `singularity` *command is not available on Cannon login nodes*
A set of one or more processes (running programs) that share (at least) a different root file system ("/"; usually provided by a container image) than processes running outside of the container on the same host operating system kernel (typically Linux).
- An open-source container platform for Linux
- https://github.com/hpcng/singularity
- Old beta for macOS (don't bother)
- Started in 2015 by Greg Kurtzer from Lawrence Berkeley National Laboratory; now commercially developed/supported by Sylabs
- Provides software tooling for both building container images and creating containers in which commands are run
- Used:
- Primarily for HPC clusters
- Cannon: FASRC VDI portal (Open OnDemand) to run JupyterLab, RStudio Server, and other interactive apps
- Workflow management systems, such as Snakemake, NextFlow, and Galaxy
- Open Science Grid
- Primarily for HPC clusters
- A Singularity image is a file containing a (compressed, read-only SquashFS) file system, usually containing:
- A base (Linux) operating system (e.g., Ubuntu, CentOS, Alpine)
- Target software (e.g., NCBI BLAST+) and all software dependencies
- Typical file extension: .sif (Singularity Image Format); older (pre Singularity 3.x) may use ".simg"
- Since it's a file, can be copied to other hosts (e.g.,
scp
), or archived/shared (e.g., alongside code and research data)
- A Singularity container runs user processes with a software environment reflecting the (read-only) file system within the Singularity image file, plus select (generally-writable) directories from the host that are bind-mounted onto this file system.
- On Cannon, this includes /n/ (network) file systems, like /n/holyscratch01 and */n/home **
- Save time
- (Re)creating a complex software environment on a different host can be difficult
- Portability
- A container image can run on another (Linux) host, or...
- Easily shared with other users on the same system (e.g., on Cannon with other members of the same lab)
- Reproducibility
- Guarantees software environment (including exact versions) recorded & reproduced faithfully
Example: Trinity de novo transcriptome assembler
- Difficult to install; many software dependencies
- Bespoke environment modules for a couple older versions exist on Cannon (see
module-query trinityrnaseq
)
- Bespoke environment modules for a couple older versions exist on Cannon (see
- "Official" container image encapsulates Trinity & dependencies (see the Dockerfile container image "recipe")
- Can be used on Cannon
singularity exec is the most common mechanism to run a command in singularity container, both for batch and interactive jobs:
singularity exec [...options...] singularity_image.sif command [...command arguments...]
e.g.:
$ image=/n/singularity_images/informatics/braker2/braker2_2.1.6.sif
$ singularity exec ${image} which braker.pl
/usr/local/bin/braker.pl
$ singularity exec ${image} braker.pl --version
braker.pl version 2.1.6
NOTE: It is good practice to use the singularity exec --cleanenv
option; this will be discussed in more detail in part 2
singularity shell is used to get a "shell" in the container (almost like logging into a virtual machine) for interactive exploration and (short) interactive work:
$ singularity shell /n/singularity_images/informatics/maker/maker:3.01.03-repbase.sif
Singularity> type maker
maker is /usr/local/bin/maker
Singularity> cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The shell prompt changes to Singularity>
to indicate the shell is in a container
-
Start a shell in an NCBI BLAST container using the
singularity shell
command:singularity shell /n/singularity_images/informatics/ncbi-blast/ncbi-blast:2.10.0.sif
-
Answer the following questions:
- What is the base operating system of the container image?
(hint:
cat /etc/os-release
) - Where is the location of the
blastn
executable? (hint:type blastn
orwhich blastn
orcommand -v blastn
)
- What is the base operating system of the container image?
(hint:
- Container images are typically hosted in a container registry
- Hosting service for container repositories, analogous to git repositories
- Many container registries have bad (or no) search interfaces, and may not be your first stop when looking for container images
- Notable possible exception: NVIDIA GPU Accelerated Container Registry (NGC) for GPU-accelerated container images
- (Update: now defunct?) Singularity Hub (https://singularity-hub.org/)
- The first Singularity container registry
- Requires(d) exessive privileges to your GitHub account to be able to search / use
- Sylabs Cloud Library (https://cloud.sylabs.io/library)
- singularity search command can be used to search---but is not too useful (except for custom-built images)
Singularity can build SIF images from Docker / OCI images in container other registries. The most popular:
- DockerHub
- https://hub.docker.com/
- The original and most popular (public) container registry
- Image "pull" limits
- 100 anonymous image pulls per 6 hours per public IP address
- 200 anonymous images pulls per (free) Docker account (see singularity remote login to authenticate)
- Quay
- https://quay.io/search
- Caveat: looks like search will required RedHat SSO on July 1, 2021...
- https://quay.io/search
- GitHub Container Registry
- https://docs.github.com/en/packages/guides/about-github-container-registry
- Newish, but looks promising for continuous integration / automated builds of GitHub-hosted projects
- GitLab Container Registry
- https://docs.gitlab.com/ee/user/packages/container_registry/
- More mature than GitHub Container Registry; images hosted per-project / repository
- NVIDIA GPU Accelerated Container Registry (NGC)
- https://ngc.nvidia.com/
- singularity
--nv
option to use host GPU in container (see Singularity GPU Support (NVIDIA CUDA & AMD ROCm)
Mainly for paying customers:
- Azure Container Registry
- https://azure.microsoft.com/en-us/services/container-registry/
- Hosts some "native" Singularity images
- Amazon Elastic Container Registry (ECR)
- Oracle Container Registry
- Bioconda is a bioinformatics-focused channel of software packages for the conda package manager
- See earlier conda tutorial
- BioContainers provides container images for (mostly) Bioconda packages (including dependencies)
- Can search for BioContainers images:
- BioContainers registry
- Bioconda Package Index
- Click on a package name > "container" link > tag > Fetch Tag (download icon)
Many container registries assume Docker, and suggest syntax like:
docker pull registry/user/repository:tag
To adapt to Singularity, replace with:
singularity pull docker://registry/user/repository:tag
singularity pull --disable-cache docker://quay.io/biocontainers/samtools:1.12--h9aed4be_1
- The
--disable-cache
option prevents image layers from being cached in ${HOME}/.singularity/cache
Prefer "official" container images provided by the project; e.g.
- Check docs, or for existence of Dockerfile in git repo
- Examples:
curl -O https://data.broadinstitute.org/Trinity/TRINITY_SINGULARITY/\
trinityrnaseq.v2.12.0.simg
- FAS Informatics [Best Practices for De Novo Transcriptome Assembly with Trinity](https://informatics.fas.harvard.edu/best-practices-for-de-novo-transcriptome-assembly-with-trinity.html) illustrates optimized use on Cannon
- QIIME 2
singularity pull --disable-cache docker://quay.io/qiime2/core:2021.2
- MultiQC
- https://github.com/ewels/MultiQC
- Links to: https://hub.docker.com/r/ewels/multiqc
singularity pull --disable-cache docker://ewels/multiqc:1.10.1
- https://github.com/ewels/MultiQC
- Existence of Dockerfile doesn't mean an image is avialable in a container registry (like Docker Hub)
- e.g., Augustus provides a Dockerfile, but needs to be built
- https://cernvm.cern.ch/fs/
- Singularity images of Biocontainers (maintained by the Galaxy Project) available at:
/cvmfs/singularity.galaxyproject.org/FIRST_LETTER/SECOND_LETTER/\
PACKAGE_NAME:VERSION--CONDA_BUILD
ex.
$ singularity exec --cleanenv \
/cvmfs/singularity.galaxyproject.org/b/l/blast:2.11.0--pl526he19e7b1_0 \
blastn -version
WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
blastn: 2.11.0+
Package: blast 2.11.0, build Mar 12 2021 10:19:58
/etc/resolv.conf
warning is a known issue: bioconda/bioconda-recipes#11583- https://docs.rc.fas.harvard.edu/kb/singularity-on-the-cluster/#BioContainers
- Delay (up to a minute or so) when auto-mounting
/cvmfs/singularity.galaxyproject.org
on a given compute node - Further delay when fetching a container images
- Don't use for a large number of jobs
- Copy frequently-used containers to high-performance shared storage
- Choose a Biocontainers image, using your favorite interface:
ls /cvmfs/singularity.galaxyproject.org/FIRST_LETTER/SECOND_LETTER/
- Find the container image in CVMFS, and execute a command in the container
singularity exec --cleanenv \
/cvmfs/singularity.galaxyproject.org/b/l/blast:2.11.0--pl526he19e7b1_0 \
blastn -version
- Copy & paste the
singularity exec
command into the chat
Processes in a Singularity container behave the same as processes on the host outside of a container w.r.t. I/O (including stdin/stdout/stderr). As such, they can be used with (unix) pipes:
singularity pull --disable-cache docker://quay.io/biocontainers/samtools:1.12--h9aed4be_1
singularity pull --disable-cache docker://quay.io/biocontainers/bwa-mem2:2.2.1--h9a82719_1
singularity exec --cleanenv bwa-mem2_2.2.1--h9a82719_1.sif bwa-mem2 index reference.fa.gz
singularity exec --cleanenv bwa-mem2_2.2.1--h9a82719_1.sif \
bwa-mem2 mem reference.fa.gz left.fastq.gz right.fastq.gz |
singularity exec --cleanenv samtools_1.12--h9aed4be_1.sif \
samtools sort --output-fmt bam > output.bam
For a more detailed tutorial on Unix pipes, see harvardinformatics/bioinformatics-coffee-hour/unix-pipes
- Singularity User Guide
- Software Carpentries - Introduction to Singularity (alpha)
- Creating and running software containers with Singularity (originally NIH)
- Singularity Containers for Bioinformatics (Pawsey Supercomputing Centre)
- BioContainers Registry: searching for bioinformatics tools, packages and containers