You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm using homopolish (polish mode) in a pipeline and I noticed that some of the results I was getting were not reproducible.
The logs helped me to identify that the problem comes from the sequences download :
...
[2022/08/23 14:04] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 3 SECS.
[2022/08/23 14:04] INFO: Stage: Download closely-related genomes
INFO: 20 homologous sequence need to download:
Downloaded NZ_CP021908.1
Downloaded NZ_CP021906.1
Downloaded NZ_CP021906.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP021908.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP009362.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP011527.1&rettype=fasta
Downloaded NZ_CP035102.1
Downloaded NC_013322.1
Downloaded NZ_AP014943.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP028469.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP028471.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_018965.1&rettype=fasta
Downloaded NC_019009.1
Downloaded NC_013340.1
Downloaded NZ_CP026065.1
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_013292.1&rettype=fasta
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP029079.1&rettype=fasta
Downloaded NZ_CP029199.1
Downloaded NC_003140.1
Downloaded NC_021552.1
...
The error comes from the fact that by default homopolish download sequences by batches of 3 and this seems to overload some clients.
By changing the variable max_pool_size (from the download.py script) to 1 instead of 3, all sequences are correctly downloaded.
As this can be a problem for reproducibility (and can be stay unnoticed until a proper testing is performed) would it be possible to add an option to manually set the number of requests or to lower he number of requests by default ?
I'm using the latest version of homopolish cloned from github earlier today (which I suppose is v0.4) but the --version option tells me I'm using : Homopolish VERSION: 0.3.4
The text was updated successfully, but these errors were encountered:
Thank you for reporting this issue. We ever saw the same errors but they were not easily reproducible from our servers. We ever suspected this might be due to firewall protection or loading policy of NCBI. We will test again and very likely lower the default downloading threads from 3 to 1 in order to fit their protection policy. The option will be added then. We forgot to change the version number in the code. Will fix them together. Thanks again for your helpful feedback.
Hi,
I'm using homopolish (polish mode) in a pipeline and I noticed that some of the results I was getting were not reproducible.
The logs helped me to identify that the problem comes from the sequences download :
command:
python3 homopolish.py polish -t 12 -a $assembly -s $homopolish_db -m R9.4.pkl -o .
logs:
The error comes from the fact that by default homopolish download sequences by batches of 3 and this seems to overload some clients.
By changing the variable max_pool_size (from the download.py script) to 1 instead of 3, all sequences are correctly downloaded.
As this can be a problem for reproducibility (and can be stay unnoticed until a proper testing is performed) would it be possible to add an option to manually set the number of requests or to lower he number of requests by default ?
I'm using the latest version of homopolish cloned from github earlier today (which I suppose is v0.4) but the
--version
option tells me I'm using :Homopolish VERSION: 0.3.4
The text was updated successfully, but these errors were encountered: