Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserted internal stop codons are not penalized #58

Closed
Percud opened this issue Mar 5, 2024 · 2 comments
Closed

Inserted internal stop codons are not penalized #58

Percud opened this issue Mar 5, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Percud
Copy link

Percud commented Mar 5, 2024

Hi,

I am using miniprot to identify genes and pseudogenes in primate genomes. It appears that miniprot (0.12-r237) does not penalize (nor acknowledge) an inserted stop codon.

For instance with these example sequences:

$ cat genomic.fa 
>example
CAAGTCTATGTGGAAGAAATCCCTTGAAAGTGGCTTGAAAAGgttaacacttatttgtgctctcagtttaattaatgaaattggctttatggcaagtactgaatttttcctagaaataatttgtttccctaattttcagAATGGAGTTAAGCATGTCCATGCATTTATTCACACT
$ cat pep.fa
>example_pep
VYVEEIPWKRLEKNGVKHVHAFIHT

The genomic sequence is a pseudogene with a W->* mutation. I can retrieve the correct alignment by lowering the penalty for frameshift/stop (-F):

$ miniprot -F11 --aln --gff genomic.fa pep.fa 
##gff-version 3
##PAF   example_pep     25      0       25      +       example 175     3       175     69      75      0       AS:i:72 ms:i:109        np:i:23 fs:i:0  st:i:1  da:i:-1 do:i:-1 cg:Z:13M97N12M cs:Z::7*tgaW:1*tggR:3~gt97ag:12
##ATN   GTCTATGTGGAAGAAATCCCTTGAAAGTGGCTTGAAAAGgttaacacttatttgtgctctcagtttaattaatgaaattggctttatggcaagtactgaatttttcctagaaataatttgtttccctaattttcagAATGGAGTTAAGCATGTCCATGCATTTATTCACACT
##ATA   V..Y..V..E..E..I..P..*..K..W..L..E..K..                                                                                                 N..G..V..K..H..V..H..A..F..I..H..T..
##AAS   |  |  |  |  |  |  |     |     |  |  |                                                                                                   |  |  |  |  |  |  |  |  |  |  |  |  
##AQA   V  Y  V  E  E  I  P  W  K  R  L  E  K                                                                                                   N  G  V  K  H  V  H  A  F  I  H  T  
example miniprot        mRNA    4       175     109     +       .       ID=MP000001;Rank=1;Identity=0.9200;Positive=0.9200;StopCodon=1;Target=example_pep 1 25
example miniprot        CDS     4       42      41      +       0       Parent=MP000001;Rank=1;Identity=0.8462;StopCodon=1;Target=example_pep 1 13
example miniprot        CDS     140     175     68      +       0       Parent=MP000001;Rank=1;Identity=1.0000;Target=example_pep 14 25

However, with the default parameter, miniprot prefers to insert a stop codon and open two gaps. This inserted stop codon is not acknowledged in the gff.

$ miniprot --aln --gff genomic.fa pep.fa 
##gff-version 3
##PAF   example_pep     25      0       25      +       example 175     3       175     69      81      0       AS:i:66 ms:i:103        np:i:23 fs:i:0  st:i:0  da:i:-1 do:i:-1 cg:Z:7M2D1M2I3M97N12M   cs:Z::7-tgaaag:1+KR:3~gt97ag:12
##ATN   GTCTATGTGGAAGAAATCCCTTGAAAGTGG------CTTGAAAAGgttaacacttatttgtgctctcagtttaattaatgaaattggctttatggcaagtactgaatttttcctagaaataatttgtttccctaattttcagAATGGAGTTAAGCATGTCCATGCATTTATTCACACT
##ATA   V..Y..V..E..E..I..P..*..K..W..-..-..L..E..K..                                                                                                 N..G..V..K..H..V..H..A..F..I..H..T..
##AAS   |  |  |  |  |  |  |        |        |  |  |                                                                                                   |  |  |  |  |  |  |  |  |  |  |  |  
##AQA   V  Y  V  E  E  I  P  -  -  W  K  R  L  E  K                                                                                                   N  G  V  K  H  V  H  A  F  I  H  T  
example miniprot        mRNA    4       175     103     +       .       ID=MP000001;Rank=1;Identity=0.8519;Positive=0.8519;Target=example_pep 1 25
example miniprot        CDS     4       42      35      +       0       Parent=MP000001;Rank=1;Identity=0.7333;Target=example_pep 1 13
example miniprot        CDS     140     175     68      +       0       Parent=MP000001;Rank=1;Identity=1.0000;Target=example_pep 14 25

Best regards,
Riccardo

@lh3 lh3 added the bug Something isn't working label Mar 5, 2024
@lh3
Copy link
Owner

lh3 commented Mar 5, 2024

Yes, you are correct. However, a fix is non-trivial. I will keep this in mind and come back to this problem later. Thanks for the test case.

@lh3 lh3 closed this as completed in f5cc291 Mar 6, 2024
@lh3
Copy link
Owner

lh3 commented Mar 6, 2024

Actually simpler than I thought. Let me know if you still see this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants