Skip to content

Commit

Permalink
Add Dorado 0.8.0 Dockerfile and README
Browse files Browse the repository at this point in the history
  • Loading branch information
fraser-combe committed Sep 20, 2024
1 parent ea55b66 commit 3f643ce
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 0 deletions.
63 changes: 63 additions & 0 deletions dorado/0.8.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Use NVIDIA CUDA image as the base image
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS app

ARG DORADO_VER=0.8.0

# Metadata
LABEL base.image="nvidia/cuda:12.2.0-devel-ubuntu20.04"
LABEL dockerfile.version="1"
LABEL software="dorado ${DORADO_VER}"
LABEL software.version="${DORADO_VER}"
LABEL description="A tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing"
LABEL website="https://github.com/nanoporetech/dorado"
LABEL license="https://github.com/nanoporetech/dorado/blob/master/LICENSE"
LABEL original.website="https://nanoporetech.github.io/dorado/"
LABEL maintainer="Fraser Combe"
LABEL maintainer.email="[email protected]"

# Set working directory
WORKDIR /usr/src/app

# Install dependencies
RUN apt-get update && apt-get install -y \
build-essential \
wget

# Download and extract Dorado package
RUN wget https://cdn.oxfordnanoportal.com/software/analysis/dorado-${DORADO_VER}-linux-x64.tar.gz \
&& tar -xzvf dorado-${DORADO_VER}-linux-x64.tar.gz -C /opt \
&& rm dorado-${DORADO_VER}-linux-x64.tar.gz

# Set environment variables for Dorado binary
ENV PATH="/opt/dorado-${DORADO_VER}-linux-x64/bin:${PATH}"

# Download basecalling models
RUN mkdir /dorado_models && \
cd /dorado_models && \
dorado download --model all

# Download the specific Pod5 test file
RUN wget -O /usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
https://github.com/nanoporetech/dorado/raw/release-v0.7/tests/data/pod5/dna_r10.4.1_e8.2_260bps/\
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5

# Default command
CMD ["dorado"]

# -----------------------------
# Test Stage
# -----------------------------
FROM app AS test

# Set working directory
WORKDIR /usr/src/app

# Run test command (using CPU mode)
RUN dorado basecaller \
--device cpu \
/dorado_models/[email protected] \
dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves --max-reads 10 > basecalled.sam

# Verify the output file exists and is not empty
RUN test -s basecalled.sam
97 changes: 97 additions & 0 deletions dorado/0.8.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Dorado Docker Image

This Dockerfile sets up an environment for running **Dorado**, a tool for basecalling Fast5/Pod5 files from Oxford Nanopore sequencing.

## Table of Contents

- [Introduction](#introduction)
- [Requirements](#requirements)
- [Building the Docker Image](#building-the-docker-image)
- [Running the Docker Container](#running-the-docker-container)
- [Testing the Docker Image](#testing-the-docker-image)
- [Basecalling Test](#basecalling-test)
- [Verifying the Output](#verifying-the-output)
- [Additional Notes](#additional-notes)
- [License](#license)

## Introduction

This Docker image includes:

- **Dorado**: Version **0.8.0**, a tool for basecalling Oxford Nanopore sequencing data.
- **NVIDIA CUDA**: Version **12.2.0**, for GPU acceleration (requires NVIDIA GPU).
- **Pre-downloaded basecalling models**: All models are downloaded during the build.
- **Sample Pod5 test file**: Included for testing the basecalling process.

## Requirements

- **Docker**: Installed on your system.
- **NVIDIA GPU and Drivers**: Installed and configured.
- **NVIDIA Container Toolkit**: To enable GPU support in Docker containers.

## Building the Docker Image

**Build the Docker image** using the following command:

```bash
docker build -t dorado-image .
```

## Running the Docker Container

To run the Dorado tool within the Docker container, use the following command:

```bash
docker run --gpus all -it dorado-image dorado --help
```

This command will display the help information for Dorado, confirming that it's installed correctly.

## Testing the Docker Image

To test that Dorado is working correctly, perform a basecalling operation using the provided sample Pod5 file and basecalling models.

### Basecalling Test

Run the following command:

```bash
docker run --gpus all -v $(pwd):/usr/src/app -it dorado-image bash -c "\
dorado basecaller /dorado_models/[email protected] \
/usr/src/app/dna_r10.4.1_e8.2_260bps-FLO_PRO114-SQK_NBD114_96_260-4000.pod5 \
--emit-moves > /usr/src/app/basecalled.sam"
```

**Explanation:**

- `--gpus all`: Enables GPU support.
- `-v $(pwd):/usr/src/app`: Mounts the current directory to `/usr/src/app` inside the container.
- `bash -c "...":` Runs the basecalling command inside the container.
- `> /usr/src/app/basecalled.sam`: Redirects the output to `basecalled.sam` in your current directory.

### Verifying the Output

Check the output file to ensure basecalling was successful:

```bash
less basecalled.sam
```

You should see SAM-formatted basecalling results.

## Additional Notes

- **Basecalling Models**: All models are downloaded to `/dorado_models` during the build process.
- **Sample Data**: The sample Pod5 file is downloaded to `/usr/src/app` during the build.
- **Internal Testing**: An internal test stage is included in the Dockerfile to verify installation.

## License

Dorado is licensed under [Oxford Nanopore Technologies' License](https://github.com/nanoporetech/dorado/blob/master/LICENSE).


---

**Note**: Please ensure that you have the necessary NVIDIA drivers and the NVIDIA Container Toolkit installed to utilize GPU acceleration.

---

0 comments on commit 3f643ce

Please sign in to comment.