Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with pangolin assignments for BA.3 and BA.2.13 amongst others #601

Closed
luenling opened this issue May 3, 2022 · 2 comments
Closed
Labels
duplicate This is an overlap or duplication of an existing proposal pangoLEARN Issues related to pangoLEARN

Comments

@luenling
Copy link

luenling commented May 3, 2022

Dear pango-designation team,

We ran in to a few issues with using GISAID, Covspectrum and outbreak.info recently, as we were first looking into a spike of BA.3 sequences in Austria and now into sequences containing S:L452M.
I looked at 98 sequences from Austria classified as BA.3 by Pangolin, and Nextclade assigned all - apart from 2 - as BA.2.*. When checking the mutations in more detail, I found that while two sequences are genuine BA.3/BA.3.1, of the rest 90 had one or more of orf1a:T842I, orf1a:L3027F, orf1a:L3021F and orf6:D61L, which should be BA.2 but not BA.3, I think.
2 of these 90 also have S:del69/70, but as they also contain the BA.2 orf1a:T842I, orf1a:L3027F, orf1a:L3021F and orf6:D61L, I guess they are contaminations or coinfections of BA.1 with BA.2
The other problem we ran into, was with BA.2.13, which according to issue 531 by @corneliusroemer should have 22916A (S:452M). With Cornelius search (https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?nucMutations=22792T%2C22916A%2C23767G&pangoLineage=BA.2*&nucMutations1=22792T%2C22916A%2C23767G&pangoLineage1=BA.2*&) we find 399 sequences, while there are 13508 sequences assigned BA.2.13, of which only 592 contain 22916A (S:452M).
I am not sure whether this is an easily fixable problem, but both at Covspectrum and oubreak.info - I guess they draw their lineage information from GISIAD - BA.3 and BA.2.11 do not show all defining markers at frequencies above 90% anymore. BA.3.1 still seems to be assigned correctly and shows great defining marker frequencies.

All the best and thank you for your great work,
Lukas

For the Austrian sequences the EPI Ids for the two classified as BA.3 are in the following file:
ba3_aut.txt

the 90 sequences that I think were wrongly assigned BA.3 are in this file:
ba3_doubtful_aut.txt

@corneliusroemer corneliusroemer added the pangoLEARN Issues related to pangoLEARN label May 3, 2022
@corneliusroemer
Copy link
Contributor

Hi @luenling, thanks for reporting this. I had a look at your "doubtful BA.3" and they almost all have a lot of missing nucleotides, on the order of 5-15k missing. I'm not sure to which extent pangoLEARN intends to work in this situation.

I think both of your general points re BA.3 and BA.2.13 have already got their own issues.

For BA.3 here: #584 (apparently fixed yesterday)
For BA.2.13 here: #592 (still open)

So I'll close this one as a duplicate for now.

Do you know why you're getting so many Ns in your sequences? Is that always the case or only in a small fraction of sequences? It suggests some sequencing/assembly problem that would probably be good to be resolved. Many pipelines will filter out such incomplete sequences. It's still useful for genotyping of course, but the more complete the better, usually.

@corneliusroemer corneliusroemer added the duplicate This is an overlap or duplication of an existing proposal label May 3, 2022
@luenling
Copy link
Author

luenling commented May 3, 2022

Dear @corneliusroemer ,
Sorry, I should have checked the issues before posting.
About the sequence quality, only three of the sequences are from us (CeMM). Most of the sequences are from Lifebrain, a commercial lab, and they might still be using an older artic primer set. Our group was just asked about the unexpectedly high number of BA.3s coming from Austria, so I looked into it a bit. Lifebrain also submitted quite a few BA.4 sequences recently - also somewhat patchy, but they seem to be correctly assigned.
All the best and thanks again for your great work,
Lukas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This is an overlap or duplication of an existing proposal pangoLEARN Issues related to pangoLEARN
Projects
None yet
Development

No branches or pull requests

2 participants