-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valid introns being reported as deletions #33
Comments
Do you mean 22bp?
You may reduce intron penalty with option |
Since these are protein-to-genome alignments, the CIGAR string values for the operator
Do you have a recommendation for how much I should reduce it to? For the example XP_052862054.1, I had to change the value of 24 to get the correct call. But for XP_052871965.1, the value had to be further reduced to 23. Is it okay to reduce it to, say, 18, or even lower? |
Oh, I forgot the factor of 3. The minimum intron size on the perfect condition should be
At least larger than |
I ran $ efetch -db protein -id ARD71195.1 -format fasta > prot.fa
$ efetch -db nucleotide -id NC_069144.1 -format fasta > genome.fa
## removed cs:Z: tag for brevity
$ miniprot genome.fa prot.fa 2>/dev/null
ARD71195.1 624 127 614 - NC_069144.1 62075725 3236135 3250058 633 1470 0 AS:i:888 ms:i:1126 np:i:313 da:i:105 do:i:15 cg:Z:23M5201U38M78N41M106N78M1D11M133V36M6617U65M89V1M2D80M146V85M101V23M
$ miniprot -J 22 genome.fa prot.fa 2>/dev/null
ARD71195.1 624 59 614 - NC_069144.1 62075725 3236135 3329666 675 1698 0 AS:i:951 ms:i:1155 np:i:346 da:i:78 do:i:15 cg:Z:31M79383U19M6D17M2D23M5201U38M78N41M106N78M1D11M133V36M6617U65M89V1M2D80M146V85M101V23M Here, using a lower value for |
With the latest version, it is recommended to apply |
I was actually looking into the
Yup, there's no way to get them all correct, all the time. |
|
I am noticing cases where a protein aligns to the genome such that a valid intron is being reported as a deletion. Is there a setting that can be used to control this behavior? Perhaps, the ability to set minimum intron size. While it is rare to find introns <~70 bp (and they're almost non-existent <~60 bp) in humans, that's not always the case in other organisms.
An example case is shown below:
The issue here is the
22D104M
in the CIGAR string. That should have been a 66 nt intron instead of a 66 nt deletion, as shown in the following screenshot:Another example is XP_052871965.1 aligned to NC_069144.1. Here, the first intron is represented as a deletion:
The text was updated successfully, but these errors were encountered: