Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence download fails (Too Many Requests) #55

Open
yari-iw opened this issue Aug 23, 2022 · 2 comments
Open

Sequence download fails (Too Many Requests) #55

yari-iw opened this issue Aug 23, 2022 · 2 comments
Assignees

Comments

@yari-iw
Copy link

yari-iw commented Aug 23, 2022

Hi,
I'm using homopolish (polish mode) in a pipeline and I noticed that some of the results I was getting were not reproducible.
The logs helped me to identify that the problem comes from the sequences download :

command:
python3 homopolish.py polish -t 12 -a $assembly -s $homopolish_db -m R9.4.pkl -o .
logs:

...
[2022/08/23 14:04] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 3 SECS.
[2022/08/23 14:04] INFO: Stage: Download closely-related genomes
 INFO: 20 homologous sequence need to download:
Downloaded NZ_CP021908.1
Downloaded NZ_CP021906.1
Downloaded NZ_CP021906.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP021908.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP009362.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP011527.1&rettype=fasta
Downloaded NZ_CP035102.1
Downloaded NC_013322.1
Downloaded NZ_AP014943.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP028469.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP028471.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_018965.1&rettype=fasta
Downloaded NC_019009.1
Downloaded NC_013340.1
Downloaded NZ_CP026065.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_013292.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP029079.1&rettype=fasta
Downloaded NZ_CP029199.1
Downloaded NC_003140.1
Downloaded NC_021552.1
...

The error comes from the fact that by default homopolish download sequences by batches of 3 and this seems to overload some clients.
By changing the variable max_pool_size (from the download.py script) to 1 instead of 3, all sequences are correctly downloaded.

As this can be a problem for reproducibility (and can be stay unnoticed until a proper testing is performed) would it be possible to add an option to manually set the number of requests or to lower he number of requests by default ?

I'm using the latest version of homopolish cloned from github earlier today (which I suppose is v0.4) but the --version option tells me I'm using : Homopolish VERSION: 0.3.4

@ythuang0522
Copy link
Owner

Thank you for reporting this issue. We ever saw the same errors but they were not easily reproducible from our servers. We ever suspected this might be due to firewall protection or loading policy of NCBI. We will test again and very likely lower the default downloading threads from 3 to 1 in order to fit their protection policy. The option will be added then. We forgot to change the version number in the code. Will fix them together. Thanks again for your helpful feedback.

@yari-iw
Copy link
Author

yari-iw commented Aug 24, 2022

Hi @ythuang0522, thank you for your quick answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants