Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add busco 5.8.0 #1095

Merged
merged 3 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [blast+](https://hub.docker.com/r/staphb/blast/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/blast)](https://hub.docker.com/r/staphb/blast) | <ul><li>[2.13.0](blast/2.13.0/)</li><li>[2.14.0](blast/2.14.0/)</li><li>[2.14.1](blast/2.14.1/)</li><li>[2.15.0](blast/2.15.0/)</li><li>[2.16.0](./blast/2.16.0/)</li></ul> | https://www.ncbi.nlm.nih.gov/books/NBK279690/ |
| [bowtie2](https://hub.docker.com/r/staphb/bowtie2/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bowtie2)](https://hub.docker.com/r/staphb/bowtie2) | <ul><li>[2.4.4](./bowtie2/2.4.4/)</li><li>[2.4.5](./bowtie2/2.4.5/)</li><li>[2.5.1](./bowtie2/2.5.1/)</li><li>[2.5.3](./bowtie2/2.5.3/)</li><li>[2.5.4](./bowtie2/2.5.4/)</li></ul> | http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml <br/>https://github.com/BenLangmead/bowtie2 |
| [Bracken](https://hub.docker.com/r/staphb/bracken/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bracken)](https://hub.docker.com/r/staphb/bracken) | <ul><li>[2.9](./bracken/2.9)</li></ul> | https://ccb.jhu.edu/software/bracken/index.shtml?t=manual <br/>https://github.com/jenniferlu717/Bracken |
| [BUSCO](https://hub.docker.com/r/staphb/busco/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/busco)](https://hub.docker.com/r/staphb/busco) | <ul><li>[5.4.7](./busco/5.4.7/)</li><li>[5.6.1](./busco/5.6.1/)</li><li>[5.6.1-prok-bacteria_odb10_2024-01-08](./busco/5.6.1-prok-bacteria_odb10_2024-01-08/)</li><li>[5.7.1](./busco/5.7.1/)</li><li>[5.7.1-prok-bacteria_odb10_2024-01-08](./busco/5.7.1-prok-bacteria_odb10_2024-01-08/)</li></ul> | https://busco.ezlab.org/busco_userguide.html <br/>https://gitlab.com/ezlab/busco |
| [BUSCO](https://hub.docker.com/r/staphb/busco/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/busco)](https://hub.docker.com/r/staphb/busco) | <ul><li>[5.4.7](./busco/5.4.7/)</li><li>[5.6.1](./busco/5.6.1/)</li><li>[5.6.1-prok-bacteria_odb10_2024-01-08](./busco/5.6.1-prok-bacteria_odb10_2024-01-08/)</li><li>[5.7.1](./busco/5.7.1/)</li><li>[5.7.1-prok-bacteria_odb10_2024-01-08](./busco/5.7.1-prok-bacteria_odb10_2024-01-08/)</li><li>[5.8.0](./busco/5.8.0/)</li><li>[5.8.0-prok-bacteria_odb10_2024-01-08](./busco/5.8.0-prok-bacteria_odb10_2024-01-08/)</li></ul> | https://busco.ezlab.org/busco_userguide.html <br/>https://gitlab.com/ezlab/busco |
| [BWA](https://hub.docker.com/r/staphb/bwa) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bwa)](https://hub.docker.com/r/staphb/bwa) | <ul><li>0.7.17</li><li>[0.7.18](./bwa/0.7.18/)</li></ul> | https://github.com/lh3/bwa |
| [Canu](https://hub.docker.com/r/staphb/canu) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/canu?)](https://hub.docker.com/r/staphb/canu)| <ul><li>2.0</li><li>2.1.1</li><li>2.2</li></ul> | https://canu.readthedocs.io/en/latest/ <BR/> https://github.com/marbl/canu |
| [Canu-Racon](https://hub.docker.com/r/staphb/canu-racon/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/canu-racon)](https://hub.docker.com/r/staphb/canu-racon) | <ul><li>1.7.1 (Canu), 1.3.1 (Racon), 2.13 (minimap2)</li><li>1.9 (Canu), 1.4.3 (Racon), 2.17 (minimap2)</li><li>1.9i (Canu), 1.4.3 (Racon), 2.17 (minimap2), (+racon_preprocess.py)</li><li>2.0 (Canu), 1.4.3 (Racon), 2.17 (minimap2)</li></ul> | https://canu.readthedocs.io/en/latest/ <br/> https://github.com/lbcb-sci/racon <br/> https://github.com/isovic/racon (ARCHIVED) <br/> https://lh3.github.io/minimap2/ |
Expand Down
77 changes: 77 additions & 0 deletions busco/5.8.0-prok-bacteria_odb10_2024-01-08/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
ARG BUSCO_VER="5.8.0"

FROM ubuntu:jammy AS app

ARG BUSCO_VER
ARG BBMAP_VER="39.10"
ARG SEPP_VER="4.5.5"
ARG DEBIAN_FRONTEND=noninteractive

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="BUSCO"
LABEL software.version="${BUSCO_VER}"
LABEL description="Slim version of BUSCO for prokaryotes only"
LABEL website="https://busco.ezlab.org/"
LABEL license="https://gitlab.com/ezlab/busco/-/raw/master/LICENSE"
LABEL maintainer="Kutluhan Incekara"
LABEL maintainer.email="[email protected]"

# install dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
python3-pip \
python3-setuptools \
python3-requests \
python3-pandas \
hmmer \
prodigal \
lbzip2 \
openjdk-8-jre-headless \
&& rm -rf /var/lib/apt/lists/* && apt-get autoclean \
&& ln -s /usr/bin/python3 /usr/bin/python

# BioPython (python3-biopython installs 1.73. It causes python error in this version)
RUN pip install --no-cache-dir biopython

# bbtools
RUN wget -q https://sourceforge.net/projects/bbmap/files/BBMap_${BBMAP_VER}.tar.gz &&\
tar -xvf BBMap_${BBMAP_VER}.tar.gz && rm BBMap_${BBMAP_VER}.tar.gz &&\
mv /bbmap/* /usr/local/bin/

# sepp
RUN wget https://github.com/smirarab/sepp/archive/refs/tags/v${SEPP_VER}.tar.gz &&\
tar -xvf v${SEPP_VER}.tar.gz && rm v${SEPP_VER}.tar.gz &&\
cd sepp-${SEPP_VER} &&\
python setup.py config -c && python setup.py install

# busco
RUN wget -q https://gitlab.com/ezlab/busco/-/archive/${BUSCO_VER}/busco-${BUSCO_VER}.tar.gz &&\
tar -xvf busco-${BUSCO_VER}.tar.gz && \
rm busco-${BUSCO_VER}.tar.gz &&\
cd busco-${BUSCO_VER} && \
python3 setup.py install

# download bacteria_odb10
RUN busco --download bacteria_odb10

ENV LC_ALL=C

WORKDIR /data

CMD busco -h

## Tests ##
FROM app AS test

RUN busco -h

# offline test
RUN wget -q https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/941/835/GCA_010941835.1_PDT000052640.3/GCA_010941835.1_PDT000052640.3_genomic.fna.gz && \
gzip -d GCA_010941835.1_PDT000052640.3_genomic.fna.gz && \
busco --offline -l /busco_downloads/lineages/bacteria_odb10 -m genome -i GCA_010941835.1_PDT000052640.3_genomic.fna -o offline --cpu 4 && \
head offline/short_summary*.txt

# auto-lineage-prok
RUN busco -m genome -i GCA_010941835.1_PDT000052640.3_genomic.fna -o auto --cpu 4 --auto-lineage-prok && \
head auto/short_summary*.txt
24 changes: 24 additions & 0 deletions busco/5.8.0-prok-bacteria_odb10_2024-01-08/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCO) container

Main tool : [BUSCO](https://gitlab.com/ezlab/busco/)

Additional tools:
- BBTools 39.10
- HMMER 3.3.2
- Prodigal 2.6.3
- SEPP 4.5.5
- Python 3.10.12
- BioPython 1.83
- Perl 5.34.0
- OpenJDK 1.8.0_422

Full documentation: https://busco.ezlab.org/busco_userguide.html

This is a BUSCO docker image which has basic functions for prokaryotes only. This image contains bacteria_odb10 lineage dataset for offline use.
## Example Usage
```bash
# offline usage with bacteria lineage
busco --offline -i assembly.fasta -l /busco_downloads/lineages/bacteria_odb10 -o output -m genome
# auto lineage selection
busco -i assembly.fasta -o output -m genome --auto-lineage-prok
```
103 changes: 103 additions & 0 deletions busco/5.8.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
ARG BUSCO_VER="5.8.0"

FROM ubuntu:focal AS app

ARG BUSCO_VER
ARG BBMAP_VER="39.10"
ARG BLAST_VER="2.16.0"
ARG MINIPROT_VER="0.13"
ARG SEPP_VER="4.5.5"
ARG METAEUK_VER="7-bba0d80"
ARG DEBIAN_FRONTEND=noninteractive

LABEL base.image="ubuntu:focal"
LABEL dockerfile.version="1"
LABEL software="BUSCO"
LABEL software.version="${BUSCO_VER}"
LABEL description="Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs"
LABEL website="https://busco.ezlab.org/"
LABEL license="https://gitlab.com/ezlab/busco/-/raw/master/LICENSE"
LABEL maintainer="Kutluhan Incekara"
LABEL maintainer.email="[email protected]"

# install dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
wget \
python3-pip \
python3-pandas \
python3-setuptools\
python3-requests \
hmmer \
prodigal \
augustus \
r-cran-ggplot2 \
gcc-x86-64-linux-gnu \
openjdk-8-jre-headless \
libjenkins-json-java \
libgoogle-gson-java \
libjson-java \
lbzip2 \
&& rm -rf /var/lib/apt/lists/* && apt-get autoclean \
&& ln -s /usr/bin/python3 /usr/bin/python

# install other necessary tools
# BioPython (python3-biopython installs 1.73. It causes python error in this version)
RUN pip install --no-cache-dir biopython
# blast
RUN wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/${BLAST_VER}/ncbi-blast-${BLAST_VER}+-x64-linux.tar.gz &&\
tar -xvf ncbi-blast-${BLAST_VER}+-x64-linux.tar.gz && rm ncbi-blast-${BLAST_VER}+-x64-linux.tar.gz
# sepp
RUN wget https://github.com/smirarab/sepp/archive/refs/tags/v${SEPP_VER}.tar.gz &&\
tar -xvf v${SEPP_VER}.tar.gz && rm v${SEPP_VER}.tar.gz &&\
cd sepp-${SEPP_VER} &&\
python setup.py config -c && python setup.py install
# bbtools
RUN wget https://sourceforge.net/projects/bbmap/files/BBMap_${BBMAP_VER}.tar.gz &&\
tar -xvf BBMap_${BBMAP_VER}.tar.gz && rm BBMap_${BBMAP_VER}.tar.gz &&\
mv /bbmap/* /usr/local/bin/
# metaeuk
RUN wget https://github.com/soedinglab/metaeuk/releases/download/${METAEUK_VER}/metaeuk-linux-sse41.tar.gz &&\
tar -xvf metaeuk-linux-sse41.tar.gz && rm metaeuk-linux-sse41.tar.gz &&\
mv /metaeuk/bin/* /usr/local/bin/
# miniprot
RUN wget https://github.com/lh3/miniprot/releases/download/v${MINIPROT_VER}/miniprot-${MINIPROT_VER}_x64-linux.tar.bz2 &&\
tar -C /usr/local/bin/ --strip-components=1 --no-same-owner -xvf miniprot-${MINIPROT_VER}_x64-linux.tar.bz2 miniprot-${MINIPROT_VER}_x64-linux/miniprot &&\
rm miniprot-${MINIPROT_VER}_x64-linux.tar.bz2

# and finally busco
RUN wget https://gitlab.com/ezlab/busco/-/archive/${BUSCO_VER}/busco-${BUSCO_VER}.tar.gz &&\
tar -xvf busco-${BUSCO_VER}.tar.gz && \
rm busco-${BUSCO_VER}.tar.gz &&\
cd busco-${BUSCO_VER} && \
python3 setup.py install

ENV AUGUSTUS_CONFIG_PATH="/usr/share/augustus/config/" \
PATH="${PATH}:/ncbi-blast-${BLAST_VER}+/bin:/usr/share/augustus/scripts:/busco-${BUSCO_VER}/scripts" \
LC_ALL=C

WORKDIR /data

CMD busco -h && generate_plot.py -h

## Tests ##
FROM app AS test

ARG BUSCO_VER

RUN busco -h && generate_plot.py -h

# run tests for bacteria and eukaryota
RUN busco -i /busco-${BUSCO_VER}/test_data/bacteria/genome.fna -c 8 -m geno -f --out test_bacteria
RUN busco -i /busco-${BUSCO_VER}/test_data/eukaryota/genome.fna -c 8 -m geno -f --out test_eukaryota
RUN busco -i /busco-${BUSCO_VER}/test_data/eukaryota/genome.fna -l eukaryota_odb10 -c 8 -m geno -f --out test_eukaryota_augustus --augustus

# generate plot
RUN mkdir my_summaries &&\
find . -name "short_summary.*.txt" -exec cp {} my_summaries \; &&\
generate_plot.py -wd my_summaries

# using actual data (Salmonella genome)
RUN wget -q https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/941/835/GCA_010941835.1_PDT000052640.3/GCA_010941835.1_PDT000052640.3_genomic.fna.gz && \
gzip -d GCA_010941835.1_PDT000052640.3_genomic.fna.gz && \
busco -m genome -i GCA_010941835.1_PDT000052640.3_genomic.fna -o busco_GCA_010941835.1 --cpu 4 --auto-lineage-prok && \
head busco_GCA_010941835.1/short_summary*.txt
95 changes: 95 additions & 0 deletions busco/5.8.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCO) container

Main tool : [BUSCO](https://gitlab.com/ezlab/busco/)

Additional tools:
- BBTools 39.10
- HMMER 3.3
- Prodigal 2.6.3
- BLAST+ 2.16.0
- AUGUSTUS 3.3.3
- MetaEuk (Release 7-bba0d80)
- SEPP 4.5.5
- Python 3.8.10
- BioPython 1.83
- R 3.6.3
- Perl 5.30.0
- OpenJDK 1.8.0_392
- Miniprot 0.13

Full documentation: https://busco.ezlab.org/busco_userguide.html

This fully functional BUSCO docker image allows you to use all the program options. All additional tools were added to satisfy the requirements of those functions. This image does not contain any lineage dataset. BUSCO downloads the passed dataset name automatically while running. If a full path is given as lineage, this automated management will be disabled. The usage options are given below. Please refer to the BUSCO manual for further information.
## Example Usage
### Specific lineage
```bash
busco -i assembly.fasta -l bacteria_odb10 -o output -m genome
```
or
```bash
busco -i assembly.fasta -l /path/to/folder/bacteria_odb10 -o output -m genome
```
### Auto lineage selection:
```bash
busco -i assembly.fasta -o output -m genome --auto-lineage-prok
```
### Additional options:
```bash
-i FASTA FILE, --in FASTA FILE
Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
-o OUTPUT, --out OUTPUT
Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
-m MODE, --mode MODE Specify which BUSCO analysis mode to run.
There are three valid modes:
- geno or genome, for genome assemblies (DNA)
- tran or transcriptome, for transcriptome assemblies (DNA)
- prot or proteins, for annotated gene sets (protein)
-l LINEAGE, --lineage_dataset LINEAGE
Specify the name of the BUSCO lineage to be used.
--auto-lineage Run auto-lineage to find optimum lineage path
--auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path
--auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path
-c N, --cpu N Specify the number (N=integer) of threads/cores to use.
-f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist.
-r, --restart Continue a run that had already partially completed.
-q, --quiet Disable the info logs, displays only errors
--out_path OUTPUT_PATH
Optional location for results folder, excluding results folder name. Default is current working directory.
--download_path DOWNLOAD_PATH
Specify local filepath for storing BUSCO dataset downloads
--datasets_version DATASETS_VERSION
Specify the version of BUSCO datasets, e.g. odb10
--download_base_url DOWNLOAD_BASE_URL
Set the url to the remote BUSCO dataset location
--update-data Download and replace with last versions all lineages datasets and files necessary to their automated selection
--offline To indicate that BUSCO cannot attempt to download files
--metaeuk_parameters METAEUK_PARAMETERS
Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. "--param1=1,--param2=2"
--metaeuk_rerun_parameters METAEUK_RERUN_PARAMETERS
Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. "--param1=1,--param2=2"
-e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
--limit REGION_LIMIT How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
--augustus Use augustus gene predictor for eukaryote runs
--augustus_parameters AUGUSTUS_PARAMETERS
Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. "--param1=1,--param2=2"
--augustus_species AUGUSTUS_SPECIES
Specify a species for Augustus training.
--long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms
--config CONFIG_FILE Provide a config file
-v, --version Show this version and exit
-h, --help Show this help message and exit
--list-datasets Print the list of available BUSCO datasets
```
### Plot
Example usage of plotting script:
```bash
# collect short summaries
mkdir my_summaries
cp SPEC1/short_summary.generic.lineage1_odb10.SPEC1.txt my_summaries/.
cp SPEC2/short_summary.generic.lineage2_odb10.SPEC2.txt my_summaries/.
cp SPEC3/short_summary.specific.lineage2_odb10.SPEC3.txt my_summaries/.
cp SPEC4/short_summary.generic.lineage3_odb10.SPEC4.txt my_summaries/.
cp SPEC5/short_summary.generic.lineage4_odb10.SPEC5.txt my_summaries/.
# plot via script
python3 generate_plot.py –wd my_summaries
```