Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scorpio lineage replacement results in "Unassigned" lineage. #449

Closed
eddieimada opened this issue May 11, 2022 · 10 comments
Closed

Scorpio lineage replacement results in "Unassigned" lineage. #449

eddieimada opened this issue May 11, 2022 · 10 comments

Comments

@eddieimada
Copy link

eddieimada commented May 11, 2022

Dear pangolin developers and users,

After updating to the latest version (4.0.6), I started to observe a high number of "Unassigned" samples, that had been assigned lineages successfully in previous versions. This unassigned calls occurs even when the samples pass QC check.

I noticed that when Scorpio replaces usher lineage inference, the "lineage" field becomes "Unassigned". To confirm this, I disabled scorpio and the lineage field was properly populated (see bellow).

With Scorpio:

taxon,lineage,conflict,ambiguity_score,scorpio_call,scorpio_support,scorpio_conflict,scorpio_notes,version,pangolin_version,scorpio_version,constellation_version,is_designated,qc_status,qc_notes,note
test1,Unassigned,0.0,,Omicron (Unassigned),0.79,0.06,scorpio call: Alt alleles 26; Ref alleles 2; Amb alleles 5; Oth alleles 0,PUSHER-v1.8,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.03,Usher placements: BA.2(3/3); scorpio replaced lineage inference BA.2

Without Scorpio:

taxon,lineage,conflict,ambiguity_score,scorpio_call,scorpio_support,scorpio_conflict,scorpio_notes,version,pangolin_version,scorpio_version,constellation_version,is_designated,qc_status,qc_notes,note
test1,BA.2,0.0,,,,,,PUSHER-v1.8,4.0.6,0.3.17,v0.1.10,False,pass,Ambiguous_content:0.03,Usher placements: BA.2(3/3)

Is anyone else experiencing this?

Thank you.

Versions:
pangolin (v4.0.6)
pangolin-data (v1.8)
constellations (v0.1.10)
scorpio (v0.3.17)
pangolin-assignment (v1.8)

@jacaravas
Copy link

jacaravas commented May 11, 2022

We observed something similar.

With the Constellations v0.1.10 update, we saw the number of "Unassigned" sequences in our database triple. A breakdown of how many of each lineage went to "Unassigned" with the latest update is below. Can you verify that this is working as intended? The changes seem more far reaching than the v0.1.10 update note suggests and the increase in "Unassigned" sequences is substantial.

Also, could you clarify under what conditions Scorpio will override a lineage call with "Unassigned"? I was under the impression that Scorpio/Constellations were focused on refining VoC classifications, but it seems to be doing quite a bit more here.

Number changed to "Unassigned" with v0.1.10 lineage assignment with v0.1.9
32056 BA.2
22535 BA.1
14452 BA.1.1
6740 BA.2.9
4451 BA.2.10
2510 BA.1.15
2302 BA.2.3
2176 BA.1.17.2
1816 BA.1.1.15
1174 BA.1.1.1
1143 BA.1.18
795 BA.1.17
713 BA.1.20
648 BA.2.12
554 BA.1.1.18
502 BA.2.23
449 BA.2.3.3
355 BA.2.12.1
308 BA.2.10.1
299 BA.2.7
208 BA.1.1.7
197 BA.1.15.1
193 BA.2.31
167 BA.2.1
127 BA.1.1.11
126 BA.1.19
115 BA.1.14
114 BA.1.1.14
98 BA.1.14.1
92 BA.1.21.1
84 BA.1.1.16
84 BA.1.9
76 BA.1.21
72 BA.1.16
72 BA.2.6
68 BA.2.17
68 BA.1.13
66 BA.2.22
62 BA.2.5
60 BA.2.3.2
58 BA.2.18
57 BA.2.8
49 BA.2.25
48 BA.2.33
46 BA.1.1.5
46 BA.2.32
46 BA.1.1.2
43 BA.2.2
41 BA.2.15
36 BA.1.8
36 BA.2.16
33 BA.1.17.1
32 BA.1.1.13
32 BA.1.13.1
32 BA.2.9.2
30 BA.1.1.10
29 BA.1.1.4
29 BA.2.21
27 BA.2.3.4
27 BA.2.26
23 BA.2.19
21 BA.1.6
18 BA.2.20
18 BA.2.14
17 BA.1.10
16 BA.1.1.3
14 BA.2.4
13 BA.2.27
12 BA.1.1.8
10 BA.1.5
9 BA.2.34
8 BA.2.25.1
8 BA.1.1.12
8 BA.1.14.2
7 B.1.617.2
7 BA.1.1.6
7 BA.1.15.2
7 BA.1.1.9
7 BA.2.13
6 BA.2.11
6 BA.1.7
6 BA.1.16.1
5 BA.2.9.1
5 BA.1.12
5 BA.1.22
4 BA.1.1.17
3 BA.2.3.1
2 BA.2.24
2 BA.2.29
2 BA.2.30
2 BA.1.2

@aineniamh
Copy link
Member

We removed the 'probable' constellation definition for BA.* sublineages- this was originally intended to avoid false negatives when Omicron was first spreading and the SNP profile of the sublineages was a lot more distinguishable.

At this stage, particularly when BA.2, BA.4 and BA.5 are so similar, we now need to prioritise avoiding false positives in the case of missing SNPs. If you check some of the more recent issues with constellations you'll see people reporting that probable definitions from scorpio had been leading to mis-calls and inappropriate overwriting of UShER assignments. This is why we've now removed probable definitions.

The sequences that will have switched to unassigned are those that don't meet the SNP thresholds defined within scorpio constellations, and previously may have been picked up by the 'probable' definitions but now cannot be.

@eddieimada
Copy link
Author

eddieimada commented May 13, 2022

Hi Áine, thank you for the explanation.

Just to be clear – the "Unassigned" I was referring to was the lineage field, not the one from the scorpio_call field that changed from "Probable Omicron (XXXX)" to "Omicron (Unassigned).

probable definitions from scorpio had been leading to mis-calls and inappropriate overwriting of UShER assignments

If I understood correctly, in cases where scorpio calls where "Unassigned" it should fallback to UShER calls and not overwrite it? it seems that scorpio is still overwriting UShER assignments, but instead it is now overwriting with "Unassigned". In the example I posted UShER assigns BA.2 (3/3), but the lineage field shows "Unassigned".

I believe an unintended effect of this change is that pangolin still replaces the lineage field with the scorpio call, which is now is "Unassigned".

Is this an intended effect of this change? If so, could you elaborate more why an UShER assignment should not be trusted if the scorpio cannot pinpoint a lineage?

Thank you!
PS: closed the issue by mistake. Reopened again.

@KatSteinke
Copy link

We've been seeing this for unambiguous Usher calls as well - we almost have more samples with no reported conflict affected than ones which do have conflict.

@donutbrew
Copy link

donutbrew commented May 25, 2022

Hi team--is there any update on this issue--the final lineage being (incorrectly?) overridden by scorpio? This affects a very large number of sequences. We're seeing the same phenomenon described by @eddieimada

@molly-hetheringtonrauth
Copy link

molly-hetheringtonrauth commented May 25, 2022

Wanted to post this here. At Colorado we are seeing the issue that previously assigned lineages are now unassigned as has been discussed above. However we are only seeing this issue on our ONT runs using the V4.1 artic primers, not on our Illumina runs using the V3 artic primers. We have also noticed that the sequences being called as unassigned have low coverage (~10x read depth) between 22,475-22,775 which corresponds to AA 305-405 in the Spike protein. Also when using the --skip-scorpio flag it seems like most of these unassigned sequences get assigned to BA.2 or it's sublineages and a handful get assigned to BA.4. We are prioritizing getting these sequencing up on GISAID so we can provide GISAID ids. Here are those gisaid ids.
'EPI_ISL_12913466',
'EPI_ISL_12913474',
'EPI_ISL_12913481',
'EPI_ISL_12913488',
'EPI_ISL_12913586',
'EPI_ISL_12913587',
'EPI_ISL_12913486',
'EPI_ISL_12913650',
'EPI_ISL_12913644',
'EPI_ISL_12913646',
'EPI_ISL_12913647',
'EPI_ISL_12913648',
'EPI_ISL_12913649',
'EPI_ISL_12913628',
'EPI_ISL_12913634',
'EPI_ISL_12913620',
'EPI_ISL_12913617',
'EPI_ISL_12913618',
'EPI_ISL_12913619',
'EPI_ISL_12913622',
'EPI_ISL_12913624',
'EPI_ISL_12913623',
'EPI_ISL_12913625',
'EPI_ISL_12913611',
'EPI_ISL_12913614',
'EPI_ISL_12913600',
'EPI_ISL_12913598',
'EPI_ISL_12913564',
'EPI_ISL_12913560',
'EPI_ISL_12913445',
'EPI_ISL_12913430',
'EPI_ISL_12913550',
'EPI_ISL_12913552',
'EPI_ISL_12913553',
'EPI_ISL_12913556',
'EPI_ISL_12913555',
'EPI_ISL_12913542',
'EPI_ISL_12913541',
'EPI_ISL_12913417',
'EPI_ISL_12913535',
'EPI_ISL_12913534',
'EPI_ISL_12913518',
'EPI_ISL_12913536',
'EPI_ISL_12913537',
'EPI_ISL_12913522',
'EPI_ISL_12913507',
'EPI_ISL_12913514',
'EPI_ISL_12913511',
'EPI_ISL_12913516',
'EPI_ISL_12913517',
'EPI_ISL_12913396',
'EPI_ISL_12913491',
'EPI_ISL_12913377',
'EPI_ISL_12913370',
'EPI_ISL_12913493',
'EPI_ISL_12913494',
'EPI_ISL_12913485',
'EPI_ISL_12913596',
'EPI_ISL_12913484',
'EPI_ISL_12913483',
'EPI_ISL_12913487',
'EPI_ISL_12913489',
'EPI_ISL_12913368',
'EPI_ISL_12913369',
'EPI_ISL_12913594',
'EPI_ISL_12913591',
'EPI_ISL_12913590',
'EPI_ISL_12913592',
'EPI_ISL_12913593',
'EPI_ISL_12913595',
'EPI_ISL_12913460',
'EPI_ISL_12913582',
'EPI_ISL_12913462',
'EPI_ISL_12913463',
'EPI_ISL_12913464',
'EPI_ISL_12913465',
'EPI_ISL_12913467',
'EPI_ISL_12913588',
'EPI_ISL_12913589',
'EPI_ISL_12913468',
'EPI_ISL_12913574',
'EPI_ISL_12913575',
'EPI_ISL_12913576',
'EPI_ISL_12913577',
'EPI_ISL_12913457',
'EPI_ISL_12913579',
'EPI_ISL_12913458',
'EPI_ISL_12913459',
'EPI_ISL_12913337',
'EPI_ISL_12932068',
'EPI_ISL_12913482'

@karenbobier
Copy link

We are are also seeing sequences with "Unassigned" for the lineage. ~1/3 of our recent sequences that passed QC having an unassigned lineage. Were using midnight 1200 primers, sequencing on illumina miseq and run pangolin with the staph-b docker image.
pangolearn version: PLEARN-v1.8
pangolin_version: 4.0.6
scorpio_version: 0.3.17
constellation_version: v0.1.10

Some of our sequence with this problem GISAID IDs
EPI_ISL_12882763
EPI_ISL_12882764
EPI_ISL_12882765
EPI_ISL_12882766
EPI_ISL_12882768
EPI_ISL_12882769
EPI_ISL_12882770
EPI_ISL_12882771
EPI_ISL_12882772
EPI_ISL_12882773
EPI_ISL_12882775
EPI_ISL_12882776
EPI_ISL_12882780
EPI_ISL_12882782
EPI_ISL_12882784
EPI_ISL_12882785
EPI_ISL_12882786
EPI_ISL_12882787
EPI_ISL_12882788
EPI_ISL_12882789
EPI_ISL_12882790
EPI_ISL_12882792
EPI_ISL_12882793
EPI_ISL_12882794
EPI_ISL_12882795
EPI_ISL_12882796
EPI_ISL_12882797
EPI_ISL_12882801

@KatSteinke
Copy link

We might be seeing a trend towards fewer unassigned with ONT's Midnight V3 primers, but it's a little early to tell.
Has anything changed about the weighting of ambiguous nucleotides or such? The affected sequences do meet our national quality criteria of <3000 missing with 4.1 as well, so it surprises me somewhat if that's become so harsh.

@AngieHinrichs
Copy link
Member

It's really difficult to find a combination of mutation lists, thresholds and specific allele rules to get both sensitivity and specificity when distinguishing between BA.2, BA.4 and BA.5 in the presence of Ns and false reversions in different regions from different sequencing methods. And @aineniamh has many demands on her time, but is actively working on this now, and I will try to help in what ways I can, being less familiar with scorpio/constellations.

Thanks @molly-hetheringtonrauth and @karenbobier for the specific examples! Those will be really helpful for testing.

In the meantime, one thing we've been considering is possibly making scorpio override pangoLEARN but not UShER. If you are running pangolin with the default UShER mode, consider adding the --skip-scorpio option to prevent the override.

@aineniamh
Copy link
Member

In the latest release (4.1) we no longer overwrite usher calls if they don't meet the scorpio checks as we're confident in the usher calls. In pangoLEARN mode scorpio is still in place as before and will elimiate false positives. Closing this issue as I believe the latest version of pangolin will resolve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants