-
Notifications
You must be signed in to change notification settings - Fork 174
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change accidental Ml:Z tag to Ml:B:C,.
Also explicitly forbid probabilities summing to more than 1.0.
- Loading branch information
1 parent
7fafbdf
commit 039c151
Showing
1 changed file
with
2 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
039c151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this imply that there will never actually be multiple modifications at a position? I am not aware of the intricacies of the biology there
039c151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or more generally, I suppose it would just obtain a new chemical code if there were truly multiple modifications at a single site?
039c151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so! Explicitly, it means that we can record multiple modifications options, of which one will (hopefully!) be correct. Although there can be another independent modification on the opposite strand.
I would guess if there is a modified base type which has the combined characteristics of two different modifications then it should be given its own code. This isn't something I know enough about to explicitly state as fact though! However having seen the hundreds of base modifications at ChEBI, I'm pretty confident that this is how they operate already.
Edit: I should point out the notion of multiple mods at the same loci was developed after discussing this with ONT. Their base caller is trained on a set of known mods and basically emits probabilities for everything at each call. They can (and I think do) trim the list down somewhat for the cases where some probabilities are close to zero, but basically the model is "we have these choices and this is how the probabilities are distributed between them".
It's something we discussed in the early days of short read sequencing for A,C,G,T - rather than emit one base with Phred score, emit all 4 with likelihoods. It can definitely improve consensus generation (I even wrote the code for it in Gap5) and variant calling. Sure you may claim it's A with phred score 10 (p=0.9), but if everything else in the column is a T and your remaining probability is also T (p=0.1) then it's much more likely to be a sequencing error than if the remaining probability was G (p=0.1) and T is extremely low.
I view base mods as basically the same idea, and having multiple choices available can really help consensus calling.
039c151
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbonfield thanks for the reply, that makes sense. interesting historical note too...maybe this would lead to encoding alternative bases as basemods heh