You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why is the nt dataset downloaded from this link https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ larger [378GB] compared to the one downloaded using the command update_blastdb.pl --decompress nt [151GB]? Why are there differences between the two downloads? Could you provide details on the specific data that has been added or removed, and the reasons for these changes? I would greatly appreciate it.
The text was updated successfully, but these errors were encountered:
nt.##.tar.gz The nucleotide sequence database contains entries from traditional divisions of GenBank, EMBL and DDBJ. Sequences from bulk divisions, i.e., gss, sts, pat, est, htg, wgs, con, and environmental sequences are excluded. RefSeq genomic entries are also excluded.
nt.gz The FASTA equivalent of the nt.##.tar.gz database files.
Search the page for "Getting the preformatted database files" for a description of the benefits of the files downloaded through update_blastdb.pl.
But here's the ultimate explanation for the file size discrepancy:
Preformatted database files remove the makeblastdb formatting steps, and saves valuable processing time and diskspace
If I understand correctly, the preformatted downloads are stored as presumably optimized binary databases instead of as plain text FASTAs.
Why is the nt dataset downloaded from this link https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ larger [378GB] compared to the one downloaded using the command
update_blastdb.pl --decompress nt
[151GB]? Why are there differences between the two downloads? Could you provide details on the specific data that has been added or removed, and the reasons for these changes? I would greatly appreciate it.The text was updated successfully, but these errors were encountered: