You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using miniprot to identify genes and pseudogenes in primate genomes. It appears that miniprot (0.12-r237) does not penalize (nor acknowledge) an inserted stop codon.
The genomic sequence is a pseudogene with a W->* mutation. I can retrieve the correct alignment by lowering the penalty for frameshift/stop (-F):
$ miniprot -F11 --aln --gff genomic.fa pep.fa
##gff-version 3
##PAF example_pep 25 0 25 + example 175 3 175 69 75 0 AS:i:72 ms:i:109 np:i:23 fs:i:0 st:i:1 da:i:-1 do:i:-1 cg:Z:13M97N12M cs:Z::7*tgaW:1*tggR:3~gt97ag:12
##ATN GTCTATGTGGAAGAAATCCCTTGAAAGTGGCTTGAAAAGgttaacacttatttgtgctctcagtttaattaatgaaattggctttatggcaagtactgaatttttcctagaaataatttgtttccctaattttcagAATGGAGTTAAGCATGTCCATGCATTTATTCACACT
##ATA V..Y..V..E..E..I..P..*..K..W..L..E..K.. N..G..V..K..H..V..H..A..F..I..H..T..
##AAS | | | | | | | | | | | | | | | | | | | | | | |
##AQA V Y V E E I P W K R L E K N G V K H V H A F I H T
example miniprot mRNA 4 175 109 + . ID=MP000001;Rank=1;Identity=0.9200;Positive=0.9200;StopCodon=1;Target=example_pep 1 25
example miniprot CDS 4 42 41 + 0 Parent=MP000001;Rank=1;Identity=0.8462;StopCodon=1;Target=example_pep 1 13
example miniprot CDS 140 175 68 + 0 Parent=MP000001;Rank=1;Identity=1.0000;Target=example_pep 14 25
However, with the default parameter, miniprot prefers to insert a stop codon and open two gaps. This inserted stop codon is not acknowledged in the gff.
$ miniprot --aln --gff genomic.fa pep.fa
##gff-version 3
##PAF example_pep 25 0 25 + example 175 3 175 69 81 0 AS:i:66 ms:i:103 np:i:23 fs:i:0 st:i:0 da:i:-1 do:i:-1 cg:Z:7M2D1M2I3M97N12M cs:Z::7-tgaaag:1+KR:3~gt97ag:12
##ATN GTCTATGTGGAAGAAATCCCTTGAAAGTGG------CTTGAAAAGgttaacacttatttgtgctctcagtttaattaatgaaattggctttatggcaagtactgaatttttcctagaaataatttgtttccctaattttcagAATGGAGTTAAGCATGTCCATGCATTTATTCACACT
##ATA V..Y..V..E..E..I..P..*..K..W..-..-..L..E..K.. N..G..V..K..H..V..H..A..F..I..H..T..
##AAS | | | | | | | | | | | | | | | | | | | | | | |
##AQA V Y V E E I P - - W K R L E K N G V K H V H A F I H T
example miniprot mRNA 4 175 103 + . ID=MP000001;Rank=1;Identity=0.8519;Positive=0.8519;Target=example_pep 1 25
example miniprot CDS 4 42 35 + 0 Parent=MP000001;Rank=1;Identity=0.7333;Target=example_pep 1 13
example miniprot CDS 140 175 68 + 0 Parent=MP000001;Rank=1;Identity=1.0000;Target=example_pep 14 25
Best regards,
Riccardo
The text was updated successfully, but these errors were encountered:
Hi,
I am using miniprot to identify genes and pseudogenes in primate genomes. It appears that miniprot (0.12-r237) does not penalize (nor acknowledge) an inserted stop codon.
For instance with these example sequences:
The genomic sequence is a pseudogene with a W->* mutation. I can retrieve the correct alignment by lowering the penalty for frameshift/stop (-F):
However, with the default parameter, miniprot prefers to insert a stop codon and open two gaps. This inserted stop codon is not acknowledged in the gff.
Best regards,
Riccardo
The text was updated successfully, but these errors were encountered: