Skip to content

How to manually build a container image for a tool

Kjetil Klepper edited this page Apr 12, 2024 · 87 revisions

⚠️ NOTE: Before you start building your own container for a tool, it could be wise to check if a suitable container already exists in some public repository, such as BioContainers (mirrored in the singularity.galaxyproject.org CVMFS repository)

Building a container image with Planemo

If the tool's requirements are available in Conda but the container building step fails in Galaxy, you can try to build the container manually. The most convenient way to do this is with the help of Planemo. If you have access to Docker on your own machine, you can install Planemo there, but Planemo has also been installed in the home directory of the SysAdmin user on test.usegalaxy.no (note that it may fail here if you run out of disk space):

ssh [email protected]

cd planemo
source .venv/bin/activate

Obtain the XML tool wrapper file for the tool, and execute the following command to create a docker image based on the <requirements> listed in the wrapper. (Note: all the wrappers from tools installed on UseGalaxy.no from Tool Sheds can be found under /srv/galaxy/var/shed_tools/.)

planemo mull <toolwrapper.xml>

Note: You may have to run this command (and other Docker commands below) as "sudo". If you are sysadmin on "test.usegalaxy.no", you also need to specify the full path when referring to planemo since it will not be in your PATH variable: sudo /home/sysadmin/planemo/.venv/bin/planemo mull <toolwrapper.xml>.

If some of the required packages are in non-standard Conda channels, such as e.g. HCC, you can use the --conda_channels option to explicitly list them:

planemo mull --conda_channels "HCC,iuc,bioconda,conda-forge,defaults" <toolwrapper.xml>

The planemo mull command will create the Docker image and place it somewhere on your machine. You can find the name ("IMAGE ID") of the image by running:

docker image list

Afterwards, you must convert it into a Singularity image, as described below.

Tips

  • If the container is not built properly, try specifying the --conda_channels in a different order (or using fewer channels). Sometimes a package may be broken in one channel but work in another.

Converting a Docker image into a Singularity image

If the Docker image exists in a public repository, you can pull it and convert it into a Singularity image with a single command:

singularity build <singularityImageFilename> docker://<url-to-docker-image>

If you only have the Docker image on your local machine, you first have to export the Docker image to a tarball file:
(Tip: You can find the image ID with docker image list)

docker save <imageID> -o dockerimagefile.tar

Then you can run the following command to convert it to a Singularity image file:

singularity build <singularityImageFilename> docker-archive://dockerimagefile.tar

Give the Singularity image file the proper name expected by Galaxy, and place it in the directory /srv/galaxy/containers/singularity/mulled/. ⚠️ WARNING: If the name of the file is on a form not expected by Galaxy, it could cause all tools to malfunction! (Ref: issue #82). Remember to also change the owner and group of the file to "galaxy" (with "chown" and "chgrp" commands) and also the file access rights, if necessary.

You can manually check that the container works by running the tool command inside it. You can find the tool command from the tool wrapper.

singularity exec /path/to/container/image <command>

Fixing a tool that requires a Docker image

Sometimes a tool can fail because it lists a Docker container as its requirement, and we are currently not able to run Docker containers directly. E.g.:

<requirements>
    <container type="docker">labsyspharm/basic-illumination:1.0.3</container>
</requirements>

The easiest (albeit a bit hackish) way to fix this is to convert the Docker image into a Singularity image (as described above), and then modify the installed tool wrapper directly to use the Singularity image instead. To do this, find the location of the wrapper file and change the type property of the <container> element from "docker" to "singularity". Then replace the Docker URL within the container element with the full path to the local Singularity image file. It does not really matter what you name the Singularity file in this case, but it is advisable to use the same name as the original image. (NB! If you place the image file in the /srv/galaxy/containers/singularity/mulled/ directory, the filename must be in a format expected by Galaxy, or else all tools will malfunction!). You need to restart Galaxy for the changes to take effect.

<requirements>
    <container type="singularity">/srv/galaxy/containers/singularity/basic-illumination:1.0.3</container>
</requirements>

Installing additional packages into an existing container

If you have a pre-built container for a specific tool but need to install additional packages into the container as well, you can do that with Conda. These containers do not normally include Conda already, and it might be difficult to download the Conda install script inside the containers. So, the easiest way to install Conda is to download the install script to your local machine first, import it into the container from your machine and then run the installation script.

First download the Miniconda install script and rename it to "miniconda.sh" for simplicity (this example is on a Linux machine).

wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
mv Miniconda3-py38_4.12.0-Linux-x86_64.sh miniconda.sh

Next, create a Singularity definition file named e.g. "image.def" and specify which existing image you want to base your new image on. The example below uses the container for the "Salmon" tool. Add the name of the Conda installation script to the %files section of the definition file. This will import the script into the container at the beginning of the build process. In the %post section, execute the installation script first, and then proceed to install the Conda packages as usual with conda install.

Bootstrap: docker
From: combinelab/salmon:latest

%files
    miniconda.sh

%post
    bash miniconda.sh -b -p /opt/conda
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge seqtk==1.3
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge samtools==1.16.1
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge vpolo==0.2.0
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge pandas==1.5.2
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge graphviz==3.0.0
    /opt/conda/bin/conda install -c bioconda -c defaults -c conda-forge scipy=1.9.3

Build the container by running

sudo singularity build ContainerName.sif image.def

Building a container from a Conda environment

If you are able to create a functioning Conda environment to run the tool on one machine, you can create a Singularity container from that environment. There are two ways to do this: either you can make a list of all the packages in the environment and re-create that with Conda when building the container, or you can wrap up all your existing files installed by Conda in a tar-file and unpack those inside the container.

Building the container from a list of Conda packages

Assuming you have a Conda environment called "your_env" with all the necessary packages installed, you can activate this and export a specification of this environment to a YAML file.

conda activate your_env
conda env export > environment.yml

Next, create a Singularity definition file named e.g. "image.def" in the same directory as "environment.yml" with the following contents:

Bootstrap: docker

From: continuumio/miniconda3

%files
    environment.yml

%post
    /opt/conda/bin/conda env create -f environment.yml

%runscript
    exec /opt/conda/envs/$(head -n 1 environment.yml | cut -f 2 -d ' ')/bin/"$@"

Note: depending on the version of Singularity you're using, you may need to replace $(head -n 1 environment.yml | cut -f 2 -d ' ') in the last line (which tries to find the environment name from the YAML file) with the actual name of your environment, e,g: exec /opt/conda/envs/your_env/bin/"$@"

Then build the container by running

sudo singularity build ContainerName.sif image.def

Building the container from existing files

You can use the tool conda-pack (available from the "conda-forge" Conda channel) to containerize existing Conda environments without re-creating them from an "environment.yml" file. This is particularly useful when the environment doesn't resolve anymore, or when packages have been installed into the environment without Conda, e.g. using R's install.packages.

First pack the environment (here named "your_env")

conda-pack -n your_env -o packed_environment.tar.gz

Next, create a Singularity definition file named e.g. "image.def" in the same directory as the "packed_environment.tar.gz" file with the following contents:

Bootstrap: docker

From: continuumio/miniconda3

%files
    packed_environment.tar.gz /packed_environment.tar.gz

%post
    tar xvzf /packed_environment.tar.gz -C /opt/conda
    conda-unpack
    rm /packed_environment.tar.gz

Then build the container by running

sudo singularity build ContainerName.sif image.def

Troubleshooting

Sometimes a tool can fail because it relies on some dependencies (typically shared libraries) that are not included in the container after all. The advantage of building the container by packing existing files (see above) is that you can make direct changes to the filesystem in the environment to fix such problems, either before packing it or after unpacking it.

For instance, when building the container for "gnuplot-py" by packing an existing Conda installation of this tool, it complained about gnuplot: error while loading shared libraries: libjpeg.so.8. The container already included the libjpeg.so.9 library, which worked just as well, so the problem was simply solved by making a symlink named libjpeg.so.8 that pointed to libjpeg.so.9 after the environment was unpacked in the container. This was achieved by adding an extra line at the end of the %post block in the definition file before building the image:

%post
    tar xvzf /packed_environment.tar.gz -C /opt/conda
    conda-unpack
    rm /packed_environment.tar.gz
    ln -s /opt/conda/lib/libjpeg.so.9 /opt/conda/lib/libjpeg.so.8

Note that this symlink could just as well have been created in the filesystem of the original Conda environment before packing it into a tarball, and the extra post-processing line in the definition file would then not be needed. But in this example the environment had already been packed by the time the error was discovered, so the symlink was created after unpacking the files in the container instead.

Building a container from scratch

If the required packages cannot be found in any Conda channel, you will have to manually build a Singularity container from scratch. This is actually fairly simple. All you need to do is to write a definition file, which describes how the container image is to be built, and then run the following command to build the image based on this file:

sudo singularity build <image_name.sif> <definition_file.def>

Tip: During the build process, singularity first writes the entire filesystem of the container to a temp-directory, so you may need a lot of disk space to run this command. If you run out of space (for instance on "test.usegalaxy.no"), you can relocate this tmp-directory to a partition with more space by setting the SINGULARITY_TMPDIR environment variable, e.g. export SINGULARITY_TMPDIR=/data/part0/singularity_build/. Run the sudo command with an extra -E option afterwards to preserve this environment: sudo -E singularity build ....

After you have built the container, you can either execute a pre-defined "run script" that exists within the container with the command:

singularity run <image_name.sif> [optional arguments to pass on to the run script]

Or you can execute an arbitrary command within the container with:

singularity exec <image_name.sif> <command>

Singularity definition files

A singularity image definition file consists of a HEADER followed by several SECTIONS. Here are some examples.

Header

The HEADER describes the core operating system to build within the container, such as which Linux distribution to use. This is done with a bootstrap agent, which comes in many different flavours, each with their own options. The image can, for instance, be based on pre-existing container images (either Singularity of Docker) which can then be customized. The Bootstrap keyword must be the first entry in the header.

One way to build a container with the Debian distribution, is to base it on a Docker image with the following header:

Bootstrap: docker
From: debian

Or you can base your container on an existing container in the Container Library which runs a specific version of Ubuntu:

Bootstrap: library
From: ubuntu:18.04

Sections

Following the header in the definition file comes several sections, which have predefined names prefixed with a percentage sign, e.g. %post. The sections can be listed in any order, but they will be applied at specific times during the build process or at run-time. The most useful sections are described below. See the official documentation for a more comprehensive list. Indentation is not necessary within the section, but it can improve layout and make the definition file more readable. Lines starting with a # sign are comments.

%files

The %files section is used copy files and folders into the container at the beginning of the build process. Each entry in this section is on the form <source_path> <dest_path>, where the source path refers to an existing file on the host machine (where you are building the container) and the destination path is the location on the filesystem within the container that the file should be copied to. Missing directories in the destination path will be created as needed. To copy an entire directory, use a wildcard to refer to all the files within the directory (i.e. place a /* after the source path). If you have trouble copying a directory this way, you can create a tarball of the whole directory, copy this as a single file and then later "untar" the file inside the container during the %post step.

%files
   <source_path> <dest_path>
   <source_folder>/* <dest_folder>

%post

After all the "%files" have been copied into the container, you can run various shell commands in the %post section. These are executed with /bin/sh. This section can be used to e.g. install additional OS-packages using a package manager such as apt or yum (depending on the container's OS), fetch other files from the internet with wget, uncompress tarballs that you have copied into the container and do other kinds of file manipulation. You can also compile programs from source, if necessary, for instance by running Makefiles to build and install executables to their final location in the container filesystem.

The section is run with elevated privileges, so you don't need to prefix commands with "sudo".

%post
    yum -y install tar
    yum -y install gzip
    
    wget http://eddylab.org/software/hmmer/hmmer-2.3.2.tar.gz
    tar -xzf hmmer-2.3.2.tar.gz
    
    cd hmmer-2.3.2
    ./configure
    make
    make install

%environment

This section can be used to set environment variables that will be available when the container is run. You can for instance modify the PATH variable to include directories containing tools that you installed during the %post step. Note that these will not be available at build-time, however. While the container is being built, this section is written to a file in the container's meta-data directory, but it is not sourced at this point. The file is only sourced when the container is run, before the execution of the %runscript, %startscript or %test.

%environment
    export PATH="/my/new/tool/bin/:$PATH"
    export LISTEN_PORT=12345
    export LC_ALL=C  

%test

This section runs /bin/sh commands that you can use to verify that the container has been built properly, for instance by calling the name of the tool that you just installed into the container to see if it runs correctly (and can be found on the PATH). This section is executed automatically at the end of the build phase, but it can also be run manually after the container has been built with the command singularity test <image_name.sif>.

%test
    hmmsearch -h

%runscript

The lines in the %runscript section will be saved to a file somewhere during the build phase and later executed by /bin/sh when the container is started with the command singularity run <image_name>.sif. Additional arguments after this command will be passed on to the runscript. A similar %startscript section is run when a container is started with singularity instance start, but neither of these sections are really needed for Galaxy tool containers, since these are always executed with the singularity exec command.

%runscript
    echo "Container was created $NOW"
    echo "Arguments received: $*"
    exec echo "$@"
Clone this wiki locally