Skip to content

Commit

Permalink
30 change to unicycler v050 (#45)
Browse files Browse the repository at this point in the history
* update version information

* remove old spades binary for new unicycler version

* update dockerfile to use new conda recipe

* add testing profiles

* working conda installation

* update multiqc report

* fixed docker image

* increase asked cpus

* Update nextflow_schema.json

* decrease expected genome size

* fix long reads assemblers installation

* split long reads samplesheets

* flye require 10.GB RAM

* update workflows

* easimon do not work in macos

* update java

* small update for tests

* canu needs 4 cpus

* update genome size value

* split samplesheet in ont & pacbio

* split proflies in ont & pacbio (hybrid)

* delete unused files

* new docker image has new python version

* docker image fixed

* skip flye tht requires much memory

* fixed pacbio execution

* revert to ubuntu

* canu also is too much for git actions

* remove unnecessary files

* modify pilon polish script to run on number of desired rounds

* fix output namings

* fix outputs

* pacbio profile working

* update pilon info in docs

* update changelog

* upload yml with version of installed tools

* update docs
  • Loading branch information
fmalmeida authored May 18, 2023
1 parent 07ca6ac commit 8ea57fc
Show file tree
Hide file tree
Showing 365 changed files with 778 additions and 119,124 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test_pr_hybrid_docker.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: Testing hybrid / docker from PR
on:
pull_request:
branches: [ master, dev ]
branches: [ master ]
types: [ opened, synchronize, reopened ]

jobs:
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/test_pr_hybrid_singularity.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: Testing hybrid / singularity from PR
on:
pull_request:
branches: [ master, dev ]
branches: [ master ]
types: [ opened, synchronize, reopened ]

jobs:
Expand All @@ -14,6 +14,11 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Singularity
uses: eWaterCycle/setup-singularity@v7
with:
singularity-version: 3.8.3

- name: Install Nextflow
env:
CAPSULE_LOG: none
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/test_pr_illumina_singularity.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Singularity
uses: eWaterCycle/setup-singularity@v7
with:
singularity-version: 3.8.3

- name: Install Nextflow
env:
CAPSULE_LOG: none
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Testing long-reads / docker from PR
name: Testing long-reads / docker (ONT) from PR
on:
pull_request:
branches: [ master, dev ]
Expand Down Expand Up @@ -26,12 +26,12 @@ jobs:
sudo rm -rf /usr/local/lib/android # will release about 10 GB if you don't need Android
sudo rm -rf /usr/share/dotnet # will release about 20GB if you don't need .NET
- name: Run tests for long-reads
- name: Run tests for long-reads (ont)
run: |
nextflow run main.nf -profile docker,test,lreads
nextflow run main.nf -profile docker,test,lreads,ont --max_memory '6.GB' --max_cpus 2 --skip_flye --skip_canu
rm -r work .nextflow*
- name: View results
run: |
sudo apt-get install -y tree
tree lreads_test
tree lreads_test_ont
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Testing long-reads / singularity from PR
name: Testing long-reads / singularity (ONT) from PR
on:
pull_request:
branches: [ master, dev ]
Expand All @@ -7,13 +7,24 @@ on:
jobs:
run_nextflow:
name: Run pipeline for the upcoming PR
runs-on: ubuntu-latest
runs-on: macos-latest

steps:

- name: 'Set up latest Oracle JDK 17'
uses: oracle-actions/setup-java@v1
with:
website: oracle.com
release: 17

- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Singularity
uses: eWaterCycle/setup-singularity@v7
with:
singularity-version: 3.8.3

- name: Install Nextflow
env:
CAPSULE_LOG: none
Expand All @@ -26,12 +37,12 @@ jobs:
sudo rm -rf /usr/local/lib/android # will release about 10 GB if you don't need Android
sudo rm -rf /usr/share/dotnet # will release about 20GB if you don't need .NET
- name: Run tests for long-reads
- name: Run tests for long-reads (ont)
run: |
nextflow run main.nf -profile singularity,test,lreads
nextflow run main.nf -profile singularity,test,lreads,ont
rm -r work .nextflow*
- name: View results
run: |
sudo apt-get install -y tree
tree lreads_test
tree lreads_test_ont
2 changes: 1 addition & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"description": "<p>MpGAP is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. It is an easy to use pipeline that adopts well known software for _de novo_ genome assembly of Illumina, Pacbio and Oxford Nanopore sequencing data through illumina only, long reads only or hybrid modes.</p>",
"license": "other-open",
"title": "fmalmeida/MpGAP: A generic multi-platform genome assembly pipeline",
"version": "v3.1.4",
"version": "v3.2",
"upload_type": "software",
"creators": [
{
Expand Down
32 changes: 19 additions & 13 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,28 +1,34 @@
FROM nfcore/base
FROM mambaorg/micromamba
LABEL authors="Felipe Almeida" \
description="Docker image containing all software requirements for the fmalmeida/mpgap pipeline"

# Install the conda environment
RUN conda install -y -c conda-forge mamba
COPY environment.yml /
RUN mamba env create --quiet -f /environment.yml && mamba clean -a
RUN micromamba env create --quiet -f /environment.yml --yes && micromamba clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/mpgap-3.1/bin:$PATH
ENV PATH /opt/conda/envs/mpgap-3.2/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name mpgap-3.1 > mpgap-3.1.yml
RUN micromamba env export --name mpgap-3.2 > mpgap-3.2.yml

# check problematic installation
RUN medaka --help

# download busco dbs
ENV CONDA_PREFIX=/opt/conda
RUN mkdir -p $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/bacteria.tar.gz https://busco.ezlab.org/v2/datasets/bacteria_odb9.tar.gz
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/eukaryota.tar.gz https://busco.ezlab.org/v2/datasets/eukaryota_odb9.tar.gz
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/fungi.tar.gz https://busco.ezlab.org/v2/datasets/fungi_odb9.tar.gz
RUN mkdir -p $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/augustus3.2.3 && wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/augustus3.2.3/augustus.tar.gz http://bioinf.uni-greifswald.de/augustus/binaries/old/augustus-3.2.3.tar.gz
RUN mkdir -p $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/bacteria.tar.gz https://busco.ezlab.org/v2/datasets/bacteria_odb9.tar.gz
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/eukaryota.tar.gz https://busco.ezlab.org/v2/datasets/eukaryota_odb9.tar.gz
RUN wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/fungi.tar.gz https://busco.ezlab.org/v2/datasets/fungi_odb9.tar.gz
RUN mkdir -p $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/augustus3.2.3 && wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/augustus3.2.3/augustus.tar.gz http://bioinf.uni-greifswald.de/augustus/binaries/old/augustus-3.2.3.tar.gz

# fix permissions
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/augustus3.2.3
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/medaka
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/augustus3.2.3
RUN chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/medaka

# install ps
USER root
RUN apt-get update && apt-get install -y procps && rm -rf /var/lib/apt/lists/*
USER mambauser
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Therefore, feedbacks are very well welcomed. If you believe that your use case i

```bash
# for docker
docker pull fmalmeida/mpgap:v3.1
docker pull fmalmeida/mpgap:v3.2
# run
nextflow run fmalmeida/mpgap -profile docker [options]
Expand All @@ -91,7 +91,7 @@ Therefore, feedbacks are very well welcomed. If you believe that your use case i
# read more at https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub
export NXF_SINGULARITY_LIBRARYDIR=MY_SINGULARITY_IMAGES # your singularity storage dir
export NXF_SINGULARITY_CACHEDIR=MY_SINGULARITY_CACHE # your singularity cache dir
singularity pull --dir $NXF_SINGULARITY_LIBRARYDIR fmalmeida-mpgap-v3.1.img docker://fmalmeida/mpgap:v3.1
singularity pull --dir $NXF_SINGULARITY_LIBRARYDIR fmalmeida-mpgap-v3.2.img docker://fmalmeida/mpgap:v3.2
# run
nextflow run fmalmeida/mpgap -profile singularity [options]
Expand Down Expand Up @@ -120,8 +120,25 @@ Therefore, feedbacks are very well welcomed. If you believe that your use case i

:fire: Please read the documentation below on [selecting between conda, docker or singularity](https://github.com/fmalmeida/mpgap/tree/master#selecting-between-profiles) profiles, since the tools will be made available differently depending on the profile desired.

## Quickstart

A few testing datasets have been made available so that users can quickly try-out the features available in the pipeline:

```bash
# short-reads
nextflow run fmalmeida/mpgap -profile test,sreads,<docker/singularity>
# long-reads
nextflow run fmalmeida/mpgap -profile test,lreads,<ont/pacbio>,<docker/singularity>
# hybrid
nextflow run fmalmeida/mpgap -profile test,hybrid,<ont/pacbio>,<docker/singularity>
```

## Documentation

<a href="https://mpgap.readthedocs.io/en/latest/index.html"><strong>Complete online documentation. »</strong></a>

### Selecting between profiles

Nextflow profiles are a set of "sensible defaults" for the resource requirements of each of the steps in the workflow, that can be enabled with the command line flag `-profile`. You can learn more about nextflow profiles at:
Expand Down Expand Up @@ -171,22 +188,22 @@ Also, since in quast 5.0.2 the automatic download of its busco databases is brok

```bash
# create the directory
mkdir -p $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/
mkdir -p $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/
# bacteria db
wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/bacteria.tar.gz https://busco.ezlab.org/v2/datasets/bacteria_odb9.tar.gz
wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/bacteria.tar.gz https://busco.ezlab.org/v2/datasets/bacteria_odb9.tar.gz
# eukaryota db
wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/eukaryota.tar.gz https://busco.ezlab.org/v2/datasets/eukaryota_odb9.tar.gz
wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/eukaryota.tar.gz https://busco.ezlab.org/v2/datasets/eukaryota_odb9.tar.gz
# fungi db
wget -O $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco/fungi.tar.gz https://busco.ezlab.org/v2/datasets/fungi_odb9.tar.gz
chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.1/lib/python3.6/site-packages/quast_libs/busco
wget -O $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco/fungi.tar.gz https://busco.ezlab.org/v2/datasets/fungi_odb9.tar.gz
chmod -R 777 $CONDA_PREFIX/envs/mpgap-3.2/lib/python3.8/site-packages/quast_libs/busco
# get augustus database with
# must be executed in the end because its links for bacteria, fungi and eukaryota are broken
# it is only working for augustus
conda activate mpgap-3.1 && quast-download-busco
conda activate mpgap-3.2 && quast-download-busco
```

### Explanation of hybrid strategies
Expand All @@ -201,9 +218,7 @@ It uses the hybrid assembly modes from Unicycler, Haslr and/or SPAdes.

#### Strategy 2

It produces a long reads only assembly and polishes (correct errors) it with short reads using Pilon.

> If polishing with Illumina paired end reads pilon will be executed with [Unicycler-polish program](https://github.com/rrwick/Unicycler/blob/main/docs/unicycler-polish.md), taking advantage of its ability to perform multiple rounds of polishing until changes are minimal.
It produces a long reads only assembly and polishes (correct errors) it with short reads using Pilon. By default, it runs 4 rounds of polishing (params.pilon_polish_rounds).

#### Example:

Expand Down
17 changes: 0 additions & 17 deletions assets/hybrid_test.yml

This file was deleted.

9 changes: 9 additions & 0 deletions assets/hybrid_test_ont.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
samplesheet:

- id: ont_hybrid
nanopore: https://github.com/fmalmeida/test_datasets/raw/main/ecoli_ont_15X.fastq.gz
genome_size: 1m
illumina:
- https://github.com/fmalmeida/test_datasets/raw/main/ecoli_illumina_15X_1.fastq.gz
- https://github.com/fmalmeida/test_datasets/raw/main/ecoli_illumina_15X_2.fastq.gz
hybrid_strategy: both
9 changes: 9 additions & 0 deletions assets/hybrid_test_pacbio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
samplesheet:

- id: pacbio_hybrid
pacbio: https://github.com/fmalmeida/test_datasets/raw/main/ecoli_pacbio_15X.fastq.gz
genome_size: 1m
illumina:
- https://github.com/fmalmeida/test_datasets/raw/main/ecoli_illumina_15X_1.fastq.gz
- https://github.com/fmalmeida/test_datasets/raw/main/ecoli_illumina_15X_2.fastq.gz
hybrid_strategy: both
7 changes: 0 additions & 7 deletions assets/lreads_test.yml

This file was deleted.

4 changes: 4 additions & 0 deletions assets/lreads_test_ont.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
samplesheet:
- id: ont_only
nanopore: https://github.com/fmalmeida/test_datasets/raw/main/ecoli_ont_15X.fastq.gz
genome_size: 1m
4 changes: 4 additions & 0 deletions assets/lreads_test_pacbio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
samplesheet:
- id: pacbio_only
pacbio: https://github.com/fmalmeida/test_datasets/raw/main/ecoli_pacbio_15X.fastq.gz
genome_size: 0.2m
Loading

0 comments on commit 8ea57fc

Please sign in to comment.