You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Comparing the public-latest.all.fa sequences to the nwk and tsv metadata file and it appears that there is a discrepancy within the sample numbers. Within the .fa file there are ~6.6 million and the tsv and tree have ~8.3 million sequences. Is the fasta reduced to just unique sequences or is there an issue preventing all ~8.3 million sequences from being written in the fasta? http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/
Thanks for this great resource!
The text was updated successfully, but these errors were encountered:
I've been adding new public sequences from the daily build to the public MSA, but that misses quite a few sequences over time because sometimes a new sequence is available from GISAID earlier than from public repo like GenBank, so the GISAID version of the sequence is aligned to reference and added to the tree -- and then later, when the public version becomes available, it is renamed in the tree instead of being aligned & added. So I needed to round up 1.7 million missing sequences, align them and add them to the MSA.
Hi,
Comparing the public-latest.all.fa sequences to the nwk and tsv metadata file and it appears that there is a discrepancy within the sample numbers. Within the .fa file there are ~6.6 million and the tsv and tree have ~8.3 million sequences. Is the fasta reduced to just unique sequences or is there an issue preventing all ~8.3 million sequences from being written in the fasta?
http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/
Thanks for this great resource!
The text was updated successfully, but these errors were encountered: