Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Sinickle · 2022-06-04T22:07:01Z

Without actually having looked at the training set, I suspect that it might have more lower quality or varied sequences classified as BA.2, than there are for those classified as BA.4/BA.5.

This is an issue with countries that have more dropout in their sequences, but it seems it also is causing BA.4/BA.5 sequences that have some additional mutations to be classified as BA.2.

Let's take Botswana as an example.

There are 29 samples with S:486V and s:452R in the last 3 months.
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A484a%2Cs%3A486v&pangoLineage1=ba.2*&

Only one of them is labeled as either BA.4 or BA.5.
Let's exclude the ones that have a dropout...
Now we are down to 10
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A452r%2Cs%3A486v%2Corf1a%3A116v%2Cn%3A418q%2Corf1a%3A41e&pangoLineage1=ba.2*&aaMutations2=s%3A452r%2Cs%3A486v&pangoLineage2=ba.2*&

Now if we throw in some extra pieces to specify that various residues are set to the wild-type amino acid (as they are in wildtype, BA.2, and BA.4/BA.5)...
https://cov-spectrum.org/explore/Botswana/AllSamples/Past3M/variants?aaMutations=s%3A452r%2Cs%3A486v%2Corf1a%3A116v%2Cn%3A418q%2Corf1a%3A41e%2Corf1a%3A1m%2Cn%3A19g%2Cs%3A3v%2Corf1b%3A1156m&pangoLineage1=ba.2*&aaMutations2=s%3A452r%2Cs%3A486v&pangoLineage2=ba.2*&

Now the only one left is the one labeled as BA.4/BA.5!

...without actually having looked into the training set, I suspect this could be because either there is more BA.2 in the training set than BA.4/BA.5, or that there is lower variance in the BA.4/BA.5 samples than BA.2?
Given the current expectation for BA.4/BA.5 to become a dominant lineage, I believe it makes sense to promote the model to become less conservative with their designations though.

Sinickle · 2022-06-06T13:56:59Z

Going to close this one after noticing I basically created a duplicate of issue #645.

AngieHinrichs · 2022-06-07T00:53:49Z

Thanks @Sinickle, the query makes for a nice-sized example for testing.

If you run pangolin in default (usher) mode, with the --skip-scorpio flag, for the 29 Botswana sequences from your query, then all of them are assigned to BA.4 or BA.5 except for Botswana/R115B13_BHP_AAC84273/2022|EPI_ISL_12398917 and Botswana/R116B46_BHP_AAC84914/2022|EPI_ISL_12473529 which fail QC because they have too many N's/ambiguous bases:

Botswana/R1113B88_BHP_AAC81541/2022|EPI_ISL_12236191|2022-04-12,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.19,Usher placements: BA.4(2/2)
Botswana/R1113B93_BHP_AAC81538/2022|EPI_ISL_12236193|2022-04-12,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.21,Usher placements: BA.4(2/2)
Botswana/R1113B31_BHP_1041393/2022|EPI_ISL_12243747|2022-04-14,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(2/2)
Botswana/R1113B11_BHP_1042053/2022|EPI_ISL_12243742|2022-04-17,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(1/1)
Botswana/R1113B25_BHP_122039864/2022|EPI_ISL_12243761|2022-04-14,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(2/2)
Botswana/R1113B23_BHP_122037740/2022|EPI_ISL_12243759|2022-04-07,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.09,Usher placements: BA.4(1/1)
Botswana/R1113B12_BHP_2022003065/2022|EPI_ISL_12243752|2022-04-20,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.5(2/2)
Botswana/R1113B16_BHP_122040785/2022|EPI_ISL_12243754|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.4(1/1)
Botswana/R1113B05_BHP_1043216/2022|EPI_ISL_12243751|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.07,Usher placements: BA.4(1/1)
Botswana/R1113B04_BHP_1043141/2022|EPI_ISL_12243750|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.11,Usher placements: BA.4(1/1)
Botswana/R1113B14_BHP_122040814/2022|EPI_ISL_12243753|2022-04-20,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.08,Usher placements: BA.4(1/1)
Botswana/R114B02_BHP_1043102/2022|EPI_ISL_12243763|2022-04-10,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.4(1/1)
Botswana/R1113B18_BHP_2722004317/2022|EPI_ISL_12243756|2022-04-19,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.05,Usher placements: BA.5(1/1)
Botswana/R114B29_BHP_122036330/2022|EPI_ISL_12243764|2022-04-04,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.09,Usher placements: BA.4(1/1)
Botswana/R115B13_BHP_AAC84273/2022|EPI_ISL_12398917|2022-04-24,Unassigned,,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,fail,Ambiguous_content:0.35,
Botswana/R115B02_BHP_AAC82946/2022|EPI_ISL_12398785|2022-04-21,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R115B01_BHP_AAC82849/2022|EPI_ISL_12398784|2022-04-21,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R115B58_BHP_122038024/2022|EPI_ISL_12398916|2022-04-08,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.19,Usher placements: BA.4(2/2)
Botswana/R116B46_BHP_AAC84914/2022|EPI_ISL_12473529|2022-04-19,Unassigned,,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,fail,Ambiguous_content:0.39,
Botswana/R116B15_BHP_AAC84799/2022|EPI_ISL_12473520|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.1,Usher placements: BA.4(2/2)
Botswana/R116B13_BHP_AAC84828/2022|EPI_ISL_12473518|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R116B94_BHP_122038024/2022|EPI_ISL_12473530|2022-04-08,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.2,Usher placements: BA.4(2/2)
Botswana/R116B08_BHP_8PH600412/2022|EPI_ISL_12473515|2022-04-23,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.14,Usher placements: BA.4(4/4)
Botswana/R116B28_BHP_1045927/2022|EPI_ISL_12473523|2022-04-25,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.4(1/1)
Botswana/R116B12_BHP_AAC84834/2022|EPI_ISL_12473517|2022-04-27,BA.4,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.11,Usher placements: BA.4(2/2)
Botswana/R116B24_BHP_AAC84874/2022|EPI_ISL_12473521|2022-04-27,BA.5,0.0,,,,,,PUSHER-v1.9,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.12,Usher placements: BA.5(3/3)

Scorpio is still having a tough time distinguishing between BA.2, BA.4 and BA.5 with good sensitivity and specificity, and often overrides the usher or pangoLEARN call with Unassigned. There has been a lot of discussion about that in cov-lineages/pangolin#449 . In the meantime, if you can run pangolin in the default usher mode, with --skip-scorpio, it should call BA.4 and BA.5 fairly accurately -- please let us know if you find otherwise!

Sinickle closed this as completed Jun 6, 2022

hoelzer mentioned this issue Jun 14, 2022

added skip-scorpio parameter fixes #234 replikation/poreCov#235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Sinickle commented Jun 4, 2022

Sinickle commented Jun 6, 2022

AngieHinrichs commented Jun 7, 2022

Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Low quality/mutated BA.4/BA.5 sequences have disproportionately high likelihood to be classified as BA.2 #713

Comments

Sinickle commented Jun 4, 2022

Sinickle commented Jun 6, 2022

AngieHinrichs commented Jun 7, 2022