Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this correct output? #48

Open
y-yoshioka1109 opened this issue Mar 25, 2024 · 3 comments
Open

Is this correct output? #48

y-yoshioka1109 opened this issue Mar 25, 2024 · 3 comments
Assignees

Comments

@y-yoshioka1109
Copy link

Dear developers,

Thank you for the wonderful tool. I used GLABA to predict genes in my species, and I found a potential error in GTF produced by GALBA. Please see example below.
Is the gene "g198.t1" predicted in c0001 an error? This kind of output was seen several time in the GTF file. In addition, when transcripts are retrieved with Gffread, the number of transcripts differs from galba.codingseq. Cloud you give me any advises?

Best regards,


c0001 AUGUSTUS start_codon 1249701 1249703 . + 0 transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS CDS 1249701 1249809 0.69 + 0 transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS exon 1249701 1249809 . + . transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS intron 1249810 1250204 0.7 + . transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS CDS 1250205 1252678 0.73 + 2 transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS exon 1250205 1252678 . + . transcript_id "g187.t1"; gene_id "g187";
c0001 AUGUSTUS stop_codon 1252679 1252681 . + 0 transcript_id "g187.t1"; gene_id "g187";
c0002 AUGUSTUS gene 135767 137434 0 - . g192
c0002 AUGUSTUS transcript 135767 137434 . - . g192.t1
c0002 AUGUSTUS stop_codon 135767 135769 . - 0 transcript_id "g192.t1"; gene_id "g192";
c0002 AUGUSTUS CDS 135770 137434 0.06 - 0 transcript_id "g192.t1"; gene_id "g192";
c0002 AUGUSTUS exon 135770 137434 . - . transcript_id "g192.t1"; gene_id "g192";
c0002 AUGUSTUS start_codon 137432 137434 . - 0 transcript_id "g192.t1"; gene_id "g192";
c0002 AUGUSTUS gene 165221 170696 0.14 + . g198
c0002 AUGUSTUS transcript 165221 170696 0.07 + . g198.t1
c0002 AUGUSTUS start_codon 165221 165223 . + 0 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS CDS 165221 165331 0.54 + 0 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS exon 165221 165331 . + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS intron 165332 165848 0.46 + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS CDS 165849 165942 0.4 + 0 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS exon 165849 165942 . + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS intron 165943 167842 0.48 + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS CDS 167843 167901 0.4 + 2 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS exon 167843 167901 . + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS intron 167902 169085 0.45 + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS CDS 169086 169335 0.34 + 0 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS exon 169086 169335 . + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS intron 169336 170616 0.64 + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS CDS 170617 170693 0.65 + 2 transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS exon 170617 170693 . + . transcript_id "g198.t1"; gene_id "g198";
c0002 AUGUSTUS stop_codon 170694 170696 . + 0 transcript_id "g198.t1"; gene_id "g198";
c0001 AUGUSTUS stop_codon 1283351 1283353 . - 0 transcript_id "g198.t1"; gene_id "g198";
c0001 AUGUSTUS CDS 1283354 1285330 0.79 - 0 transcript_id "g198.t1"; gene_id "g198";
c0001 AUGUSTUS exon 1283354 1285330 . - . transcript_id "g198.t1"; gene_id "g198";
c0001 AUGUSTUS start_codon 1285328 1285330 . - 0 transcript_id "g198.t1"; gene_id "g198";


@y-yoshioka1109 y-yoshioka1109 changed the title It this collect output? It this correct output? Mar 25, 2024
@KatharinaHoff
Copy link
Member

It is possible that Galba produces errors because of Pygustus prediction joining. We have previously implemented a filter to discard genes that have two strands because the developer of Pygustus has left our team and I have no resources to fix it in Pygustus. Obviously, the filter is not working properly, either. Are you using the latest container with Galba?

@KatharinaHoff KatharinaHoff self-assigned this Mar 25, 2024
@y-yoshioka1109
Copy link
Author

Thank you for your response, Katharina. Yes, I am using the latest container with Galba (v1.0.11). Command was below.

singularity exec -B ${PWD}:${PWD} $GALBA_SIF galba.pl
--genome=${genome} --prot_seq=metazoa_obd10_plus_sp.fasta
--threads=48 --workingdir=out_galba

The genes that appear to be in error have no feature of "transcript" in the GTF. Fortunately, only nine were found, so I will try to address them by deleting them from GTF manually.

@y-yoshioka1109 y-yoshioka1109 changed the title It this correct output? Is this correct output? Mar 25, 2024
@KatharinaHoff
Copy link
Member

@MarioStanke this is also a Pygustus problem.... I have several open issues in Galba because sometimes, Pygustus does not report a transcript feature. I will probably implement a fix in Galba (i.e. adding the transcript feature or deleting the features all together), but at some point in time, one of us should look into fixing the source problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants