Skip to content

blobtools_ref

Young edited this page Feb 9, 2024 · 8 revisions

Downloading a blast database for Blobtools

Containers and github repositories are not great places to store blast databases. As such, additional command line or downloading expertise is required for the optional Blobtools subworkflow.

Downloading a blast database is not particularly complicated, but it does require some patience and a good internet connection. Blast databases can be downloaded via a web browser (such as chrome) from NCBI's blast database website : ftp.ncbi.nlm.nih.gov/blast/db/. The most common databases are nt (nucleotide) and those curated by refseq (they generally start with ref).

ref_prok_rep_genomes

UPHL downloads the ref_prok_rep_genomes with the commands found in Grandeur/bin/download_blast.sh

mkdir blast_db 
cd blast_db
for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
do
    wget --continue --show-progress "https://ftp.ncbi.nlm.nih.gov/blast/db/v5/ref_prok_rep_genomes.$i.tar.gz"
    tar -xvf ref_prok_rep_genomes.$i.tar.gz
    rm ref_prok_rep_genomes.$i.tar.gz
done

Then set the params.blast_db parameter to your new directory on the command line or in a config file.

params.blast_db = '/path/to/blast_db'

And be sure to set the corresponding database type on the command line or in a config file.

params.blast_db_type = "ref_prok_rep_genomes"

RefSeq releases occur in the first two weeks of odd numbered months, namely: January, March, May, July, September, November.

nt_prok

Another "popular" database is the "nt" database. This database used to include everything, but in 2023, NCBI separated out a prokaryotic version. To download this resource is very similar.

mkdir blast_db 
cd blast_db
for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25
do
    wget --continue --show-progress "https://ftp.ncbi.nlm.nih.gov/blast/db/v5/nt_prok.$i.tar.gz"
    tar -xvf nt_prok.$i.tar.gz
    rm nt_prok.$i.tar.gz
done

And be sure to set the corresponding database type on the command line or in a config file.

params.blast_db_type = "nt_prok"

These can also be downloaded with blast

List of prebuilt blast databases

update_blastdb.pl --showall

Downloading one of those (nt_prok in this example)

update_blastdb.pl nt_prok

More information can be found here: https://www.ncbi.nlm.nih.gov/books/NBK569850/

You can also create your own custom blast database following these instructions : https://www.ncbi.nlm.nih.gov/books/NBK569841/

A full list of pre-built blast databases can be found at https://ftp.ncbi.nlm.nih.gov/blast/db/v5/.

Clone this wiki locally