Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Closed
Sinickle opened this issue Jun 4, 2022 · 2 comments

Comments

@Sinickle
Copy link

Sinickle commented Jun 4, 2022

Without actually having looked at the training set, I suspect that it might have more lower quality or varied sequences classified as BA.2, than there are for those classified as BA.4/BA.5.

This is an issue with countries that have more dropout in their sequences, but it seems it also is causing BA.4/BA.5 sequences that have some additional mutations to be classified as BA.2.

Let's take Botswana as an example.

There are 29 samples with S:486V and s:452R in the last 3 months.
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A484a%2Cs%3A486v&pangoLineage1=ba.2*&

Only one of them is labeled as either BA.4 or BA.5.
Let's exclude the ones that have a dropout...
Now we are down to 10
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A452r%2Cs%3A486v%2Corf1a%3A116v%2Cn%3A418q%2Corf1a%3A41e&pangoLineage1=ba.2*&aaMutations2=s%3A452r%2Cs%3A486v&pangoLineage2=ba.2*&

Now if we throw in some extra pieces to specify that various residues are set to the wild-type amino acid (as they are in wildtype, BA.2, and BA.4/BA.5)...
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A452r%2Cs%3A486v%2Corf1a%3A116v%2Cn%3A418q%2Corf1a%3A41e%2Corf1a%3A1m%2Cn%3A19g%2Cs%3A3v%2Corf1b%3A1156m&pangoLineage1=ba.2*&aaMutations2=s%3A452r%2Cs%3A486v&pangoLineage2=ba.2*&

Now the only one left is the one labeled as BA.4/BA.5!

...without actually having looked into the training set, I suspect this could be because either there is more BA.2 in the training set than BA.4/BA.5, or that there is lower variance in the BA.4/BA.5 samples than BA.2?
Given the current expectation for BA.4/BA.5 to become a dominant lineage, I believe it makes sense to promote the model to become less conservative with their designations though.

@Sinickle
Copy link
Author

Sinickle commented Jun 6, 2022

Going to close this one after noticing I basically created a duplicate of issue #645.

@Sinickle Sinickle closed this as completed Jun 6, 2022
@AngieHinrichs
Copy link
Member

Thanks @Sinickle, the query makes for a nice-sized example for testing.

If you run pangolin in default (usher) mode, with the --skip-scorpio flag, for the 29 Botswana sequences from your query, then all of them are assigned to BA.4 or BA.5 except for Botswana/R115B13_BHP_AAC84273/2022|EPI_ISL_12398917 and Botswana/R116B46_BHP_AAC84914/2022|EPI_ISL_12473529 which fail QC because they have too many N's/ambiguous bases:

Botswana/R1113B88_BHP_AAC81541/2022|EPI_ISL_12236191|2022-04-12,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.19,Usher placements: BA.4(2/2)
Botswana/R1113B93_BHP_AAC81538/2022|EPI_ISL_12236193|2022-04-12,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.21,Usher placements: BA.4(2/2)
Botswana/R1113B31_BHP_1041393/2022|EPI_ISL_12243747|2022-04-14,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(2/2)
Botswana/R1113B11_BHP_1042053/2022|EPI_ISL_12243742|2022-04-17,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(1/1)
Botswana/R1113B25_BHP_122039864/2022|EPI_ISL_12243761|2022-04-14,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(2/2)
Botswana/R1113B23_BHP_122037740/2022|EPI_ISL_12243759|2022-04-07,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.09,Usher placements: BA.4(1/1)
Botswana/R1113B12_BHP_2022003065/2022|EPI_ISL_12243752|2022-04-20,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.5(2/2)
Botswana/R1113B16_BHP_122040785/2022|EPI_ISL_12243754|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.4(1/1)
Botswana/R1113B05_BHP_1043216/2022|EPI_ISL_12243751|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(1/1)
Botswana/R1113B04_BHP_1043141/2022|EPI_ISL_12243750|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.11,Usher placements: BA.4(1/1)
Botswana/R1113B14_BHP_122040814/2022|EPI_ISL_12243753|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.4(1/1)
Botswana/R114B02_BHP_1043102/2022|EPI_ISL_12243763|2022-04-10,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.4(1/1)
Botswana/R1113B18_BHP_2722004317/2022|EPI_ISL_12243756|2022-04-19,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.05,Usher placements: BA.5(1/1)
Botswana/R114B29_BHP_122036330/2022|EPI_ISL_12243764|2022-04-04,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.09,Usher placements: BA.4(1/1)
Botswana/R115B13_BHP_AAC84273/2022|EPI_ISL_12398917|2022-04-24,Unassigned,,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,fail,Ambiguous_content:0.35,
Botswana/R115B02_BHP_AAC82946/2022|EPI_ISL_12398785|2022-04-21,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R115B01_BHP_AAC82849/2022|EPI_ISL_12398784|2022-04-21,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R115B58_BHP_122038024/2022|EPI_ISL_12398916|2022-04-08,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.19,Usher placements: BA.4(2/2)
Botswana/R116B46_BHP_AAC84914/2022|EPI_ISL_12473529|2022-04-19,Unassigned,,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,fail,Ambiguous_content:0.39,
Botswana/R116B15_BHP_AAC84799/2022|EPI_ISL_12473520|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.1,Usher placements: BA.4(2/2)
Botswana/R116B13_BHP_AAC84828/2022|EPI_ISL_12473518|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R116B94_BHP_122038024/2022|EPI_ISL_12473530|2022-04-08,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.2,Usher placements: BA.4(2/2)
Botswana/R116B08_BHP_8PH600412/2022|EPI_ISL_12473515|2022-04-23,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R116B28_BHP_1045927/2022|EPI_ISL_12473523|2022-04-25,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.4(1/1)
Botswana/R116B12_BHP_AAC84834/2022|EPI_ISL_12473517|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.11,Usher placements: BA.4(2/2)
Botswana/R116B24_BHP_AAC84874/2022|EPI_ISL_12473521|2022-04-27,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.5(3/3)

Scorpio is still having a tough time distinguishing between BA.2, BA.4 and BA.5 with good sensitivity and specificity, and often overrides the usher or pangoLEARN call with Unassigned. There has been a lot of discussion about that in cov-lineages/pangolin#449 . In the meantime, if you can run pangolin in the default usher mode, with --skip-scorpio, it should call BA.4 and BA.5 fairly accurately -- please let us know if you find otherwise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants