Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error "Error: Internal error (invalid zip archive). Please try again." Take 2 #360

Closed
corneliusroemer opened this issue May 15, 2024 · 27 comments

Comments

@corneliusroemer
Copy link

Sadly the issue is still active, at least for taxons ebola-zaire and mpox.

See #356

New version of client (16.16.0) available at https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-arm64/datasets.
Error: Internal error (invalid zip archive). Please try again

Originally posted by @corneliusroemer in #356 (comment)

@corneliusroemer corneliusroemer changed the title 1 task Getting error Error: Internal error (invalid zip archive). Please try again Getting error Error: Internal error (invalid zip archive). Please try again. Take 2 May 15, 2024
@corneliusroemer corneliusroemer changed the title Getting error Error: Internal error (invalid zip archive). Please try again. Take 2 Getting error "Error: Internal error (invalid zip archive). Please try again." Take 2 May 15, 2024
@ericcox1
Copy link
Collaborator

Thanks @corneliusroemer, we are continuing to look into this. Would you mind updating to 16.16.0 and if the problem persists, please include --debug and report the phid. This will help us to better understand what went wrong.

Best,
Eric

@joverlee521
Copy link

joverlee521 commented May 16, 2024

I am also seeing this error in our automated pipelines for zika, mpox, measles, and dengue, which are all scheduled to run at 9AM PDT. If I rerun the workflow at a later time, the error goes away. Does the time coincide with the datasets updates?

@corneliusroemer
Copy link
Author

corneliusroemer commented May 17, 2024

@ericcox1 Yes, getting the error with 16.16.0 as well. An example run is: Ncbi-Phid: 1D715361FD2DDA414583C0181D715361FD2DDA414583C018 (it might be that this exact run happened to work, I can't tell as having run --debug my terminal got flooded with binary text). I'll try to provoke an error again.

Is it possible that some part of the server struggles with the number of requests it's getting? As part of a project, I'm doing dataset downloads via CLI for a few taxa around every 3 minutes (it's run as part of CI). It's done with API key and the allowed rate is 10 requests per second so we should be far away from that limit but it might still be that no one else hitherto has sent requests so frequently.

@AngieHinrichs
Copy link

I've been getting the same error (Error: Internal error (invalid zip archive). Please try again) repeatedly for the past several days while trying to get influenza A genomes with this command:

datasets download virus genome taxon 11320 --include genome,biosample --debug >& datasets.log

Here is the gzipped --debug output: datasets.log.gz

The download proceeds for a varying amount of time (~two to 39 minutes) and downloads a varying amount of data (haven't kept track but noticed different numbers of GB) before exiting with the error.

I'm using datasets version: 16.17.0

@AngieHinrichs
Copy link

Earlier today, this command succeeded for me:

datasets download virus genome taxon "Alphainfluenzavirus influenzae" --filename all_alphainfluenza.zip

-- it's the first example command on https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/virus/get-influenza-genomes/ . In 87 minutes, it downloaded a 555MB (530MiB) file that includes data_report.jsonl and genome.fna, but not biosample.jsonl.

Unfortunately the command above with --include genome,biosample has failed twice this afternoon, both times making it to 67.3MB before getting the invalid zip archive error.

@olearyna
Copy link
Contributor

Hi AngieHinrichs,

Thanks for opening the issue. We're looking into it.

Nuala

@olearyna
Copy link
Contributor

@AngieHinrichs,

Can you run this again with the --debug flag and send us the PHID? - thanks!

@AngieHinrichs
Copy link

OK, I am kicking off this command (there's no --no-progress-bar option, so adding a grep -v) and will send PHID and log. Thanks!

time datasets download virus genome taxon 11320 --include genome,biosample --debug |& grep -v ^$'\033' > datasets.log

@AngieHinrichs
Copy link

OK, PHID is 2F4065564DC261B8F1FA965F. Log attached.
datasets.2024-05-24.log.gz

@olearyna
Copy link
Contributor

Hi AngieHinrichs,

We need to take a deeper look at the issue. We'll post her when we have a fix.

Nuala

@AngieHinrichs
Copy link

Thanks @olearyna!

@carolinasisco
Copy link

Hi,

Any good news on this? I had the same error since Monday, I though it was something wrong with my code until I read this post.

@olearyna
Copy link
Contributor

Hi carolinasisco,

We are actively working on a fix and aim to have it released within the week. We apologize for any inconvenience this may have caused. Thanks for the patience!

Nuala

@olearyna
Copy link
Contributor

Hi carolinasisco and AngieHinrichs,

We have released a fix in the latest version (v16.18.1) of the command line tool that we believe addresses the reported issues. Please test this update and let us know if you encounter any further errors.

Thanks
Nuala

@AngieHinrichs
Copy link

Thanks @olearyna, I'll try it out right away!

@AngieHinrichs
Copy link

It worked and it was much faster than before! Thanks again!

@olearyna
Copy link
Contributor

Great! I'll close this issue.

@carolinasisco
Copy link

Hi, it did not worked for me, any suggestions?
Got the same error

@corneliusroemer
Copy link
Author

Thanks so much @olearyna and @ericcox1!
I just upgraded to 16.18.1 and the first run is optimistic, none of the 4 taxon downloads failed. 🎉

I will comment as soon as I see failures again.

@carolinasisco are you sure you're using version 16.18.1?

I think it would help the devs if you could run with --debug then and share the PHID 😀

@olearyna
Copy link
Contributor

Hi @carolinasisco,

Yes, if you are still having issues with the latest version can you run --debug and share the PHID. Thanks for the suggestion corneliusroemer!

@carolinasisco
Copy link

Hi @olearyna

I updated through conda --update, the version showing is 16.18.1, This is my code (I ran it with --debug as suggested):

datasets download gene accession --inputfile ~/Desktop/wp_1_50 --filename wp150 --include gene,protein --debug
The error is:

Error: Download error: http2: server sent GOAWAY and closed the connection; LastDownloading: ncbi_dataset.zip 4.62MB error
Find attached the screen capture with the phid.

phid

Thanks!

@olearyna
Copy link
Contributor

olearyna commented Jun 2, 2024

Hi carolinasisco,

Thanks for the information! I think this is a separate issue from the virus genome download. We'll look into it tomorrow.

Nuala

@carolinasisco
Copy link

Hi, thank you. I'm trying to download a large set of sequences (nt and aa) from pseudomonas.

@mverce
Copy link

mverce commented Oct 17, 2024

Hi, I would like to add another example of this error, in hopes of it being helpful in finding a solution. I am using ncbi datasets version 16.31.0. I was trying to download Streptococcus genomic sequences using the following command:
datasets download genome taxon Streptococcus --include genome,gbff --reference

This results in the following outcome:
Collecting 125 genome records [================================================] 100% 125/125 Downloading: ncbi_dataset.zip 273MB done Validating package files [==>---------------------------------------------] 9% 23/254 Error: Internal error (invalid zip archive). Please try again

On several attempts, the validation of the package files reaches 6 - 9 %.

I reran the command while including either genomes or gbff. When downloading genomes only (--include genome), the process finished successfully. When downloading gbff only (--include gbff) the process failed with the same Internal Error as mentioned above.

@ericcox1
Copy link
Collaborator

Hi @mverce,

Thanks for your report.

I wasn't able to reproduce this error and we think you may have encountered a temporary problem.

If you don't mind trying this one more time, please add the --debug flag and report the Ncbi-phid value here so we can investigate further.

datasets download genome taxon Streptococcus --include gbff --reference --filename strep.zip --debug

Best,
Eric

@mverce
Copy link

mverce commented Oct 18, 2024

Hi @ericcox1,

I have tried it again with the commands that were problematic yesterday, as well as with your exact command (incl. --filename strep.zip), but the problem persists. The last Ncbi-Phid from the debug output is: 1CA6C01E4134F3592F685054.6.1

Thanks and best regards,
Marko

@corneliusroemer
Copy link
Author

I tried the same command as Eric listed and can't reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants