Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download the database: kraken2-build --standard --db kraken2-standard-db/ --threads 42 #661

Closed
hebamuh68 opened this issue Nov 29, 2022 · 5 comments

Comments

@hebamuh68
Copy link

This line kraken2-build --standard --db kraken2-standard-db/ --threads 42

Gives me this error

Downloading nucleotide gb accession to taxon map...rsync: getaddrinfo: ftp.ncbi.nlm.nih.gov 873: Temporary failure in name resolution
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.2.7]

image

@Somebodyatthdoor
Copy link

Hi hebamuh68,

I have also been getting this error for the past few days. It seems to not be a problem with kraken2 but with the ncbi website. When I have tried to download the files outside of kraken2 I have had the same issue. I have also tried it on multiple machines. Sometimes the command works for a short period of time, then it fails. Sometimes it just fails straight away. I think it might just be a case of waiting to see if ncbi fix the problem.

Cheers,
Laura

@hebamuh68
Copy link
Author

@Somebodyatthdoor

I asked someone and there's alternative named 'Metaphlan', I'm trying to install it now

@dandaman
Copy link

dandaman commented Dec 8, 2022

The issue is related to the use of FTP in the 2 scripts download_genomic_library.sh and rsync_from_ncbi.pl. You can patch them using these diffs:

diff --git a/scripts/download_genomic_library.sh b/scripts/download_genomic_library.sh
index ffd96d2..39bd7c7 100755
--- a/scripts/download_genomic_library.sh
+++ b/scripts/download_genomic_library.sh
@@ -14,7 +14,7 @@ set -e  # Stop on error

 LIBRARY_DIR="$KRAKEN2_DB_NAME/library"
 NCBI_SERVER="ftp.ncbi.nlm.nih.gov"
-FTP_SERVER="ftp://$NCBI_SERVER"
+FTP_SERVER="https://$NCBI_SERVER"
 RSYNC_SERVER="rsync://$NCBI_SERVER"
 THIS_DIR=$PWD
diff --git a/scripts/rsync_from_ncbi.pl b/scripts/rsync_from_ncbi.pl
index 446efc9..d92a625 100755
--- a/scripts/rsync_from_ncbi.pl
+++ b/scripts/rsync_from_ncbi.pl
@@ -43,7 +43,7 @@ while (<>) {
   my $full_path = $ftp_path . "/" . basename($ftp_path) . $suffix;
   # strip off server/leading dir name to allow --files-from= to work w/ rsync
   # also allows filenames to just start with "all/", which is nice
-  if (! ($full_path =~ s#^ftp://${qm_server}${qm_server_path}/##)) {
+  if (! ($full_path =~ s#^https://${qm_server}${qm_server_path}/##)) {
     die "$PROG: unexpected FTP path (new server?) for $ftp_path\n";
   }
   $manifest{$full_path} = $taxid;

See: #653

@Somebodyatthdoor
Copy link

Hi,

Unfortunately this isn't the solution for me, as the version of kraken2 I have downloaded already has these changes implemented. The same problem happens trying to download the databases using wget, which made me think it may be a problem on ncbi's end. However, when I contacted ncbi they said that they had had no complaints about the problem from anyone else, and that it was likely to be a firewall problem on my end. My IT department disagrees, as they replicated my problem on several different machines. For the moment, like @hebamuh68, I have also switched to using metaphlan, though I much prefer the functionality of kraken2.

Thanks for the suggestions,
Laura

@hebamuh68
Copy link
Author

@Somebodyatthdoor

I find API called tool chest can run kraken2 on the cloud and it works perfectly, try it.
good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants