Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download the Influenza A taxon assembly #432

Closed
anna-parker opened this issue Dec 5, 2024 · 5 comments
Closed

Unable to download the Influenza A taxon assembly #432

anna-parker opened this issue Dec 5, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@anna-parker
Copy link

anna-parker commented Dec 5, 2024

Describe the bug

Hi! I'm having trouble downloading all influenza A assemblies, I have been downloading for over 3hours and am not yet finished - previously the download was interrupted twice after 2h with a message: "Connection reset by peer". As of today I have been unable to download the dataset. I have been trying to download it using:

datasets download genome taxon 11320 --assembly-source genbank  --filename genbank_assembly.zip

To Reproduce
Run

datasets download genome taxon 11320 --assembly-source genbank  --filename genbank_assembly.zip

Expected behavior
I understand that this is a very large folder, so I do expect the download to take time - it there perhaps a way to batch this and not have the download fail after 2hours?

For context I am able to download all influenza A samples in ~10min using

datasets download virus genome taxon 11320 --filename data.zip

I am also able to download all assemblies from refseq just not genbank

datasets download genome taxon 11320 --assembly-source refseq  --filename refseq_assembly.zip

Thank you

Thanks for your feedback--your bug reports help improve NCBI Datasets.

@anna-parker anna-parker added the bug Something isn't working label Dec 5, 2024
@mtntsuchiya
Copy link
Contributor

Hi Anna,
Thank you for letting us know of this issue. I'm looking into it now and I'll let you know what I find.
Thanks,
Mirian

Mirian T. N. Tsuchiya, Ph.D.
Bioinformatics Data Wrangler (contractor)
NCBI Datasets (NCBI/NLM/NIH)
(she/her/hers)

@syntheticgio
Copy link

We are able to replicate the problem @anna-parker, currently investigating what might be the underlying cause.

@anna-parker
Copy link
Author

Thanks! I am connecting from Switzerland - so maybe its a location issue? I tried downloading from my home and from work today and saw similar results. My internet speed at home isn't great but speed tests show I have 17.8 Mbps download.

@ericcox1
Copy link
Collaborator

ericcox1 commented Jan 3, 2025

Hi @anna-parker,

Have you tried using rehydration? This should speed up the download.

For example, to download influenza A assemblies:

datasets download genome taxon 11320 --assembly-source genbank --dehydrated  --filename genbank_assembly.zip
unzip genbank_assembly.zip -d genbank_assembly
datasets rehydrate --directory genbank_assembly

Best,
Eric

@anna-parker
Copy link
Author

@ericcox1 thank you so much for this suggestion! I tested it now and it took me 1h27m to download but the download was successful!!! I'm so excited to get a closer look at the data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants