-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stringtie2utr generates multiple five_prime_UTRs and three_prime_UTRs within a gene #723
Comments
These are 2 UTRs per transcript. The UTRs are spliced. This is not an
error. This results from the stringtie assembly and from the location of
the protein coding gene in that assembled transcript. Or do we have
overlapping coordinates that I now overlooked?
…On Thu, Dec 14, 2023 at 9:04 AM spoonbender76 ***@***.***> wrote:
Hi,
I tried stringtie2utr.py
<https://github.com/Gaius-Augustus/BRAKER/blob/utr_from_stringtie/scripts/stringtie2utr.py>
with the GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff file to add
utr into braker.gtf.
However, I encountered the problem that multiple five_prime_UTRs and
three_prime_UTRs are generated within a gene, the same issue as #716
(comment)
<#716 (comment)>
.
Here are some examples.
chr01 AUGUSTUS gene 1265570 1337711 . + . g25
chr01 AUGUSTUS transcript 1265570 1337711 1 + . g25.t1
chr01 stringtie2utr five_prime_UTR 1265570 1265641 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1267245 1267346 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1268300 1268427 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1268857 1269048 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1270085 1270362 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1271057 1271273 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1273003 1273117 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1274180 1274306 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1275368 1275508 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1276316 1276514 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1277498 1277613 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1279421 1279738 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1281067 1281465 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1283176 1283443 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1284457 1284568 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1287752 1287821 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1288516 1288661 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1289245 1289401 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1289880 1290078 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1290804 1291036 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr five_prime_UTR 1291414 1292379 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS start_codon 1292380 1292382 . + 0 transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS CDS 1292380 1292518 1 + 0 transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS exon 1292380 1292518 . + . transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS intron 1292519 1293897 1 + . transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS CDS 1293898 1294256 1 + 2 transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS exon 1293898 1294256 . + . transcript_id "g25.t1"; gene_id "g25";
chr01 AUGUSTUS stop_codon 1294254 1294256 . + 0 transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr three_prime_UTR 1294257 1294738 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 stringtie2utr three_prime_UTR 1337331 1337711 1000 + . transcript_id "g25.t1"; gene_id "g25";
chr01 gmst gene 1600956 1659382 . - . g39
chr01 gmst transcript 1600956 1659382 . - . g39.t1
chr01 stringtie2utr three_prime_UTR 1600956 1601209 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1601856 1601983 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1602513 1602581 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1603205 1603301 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1612960 1613142 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1613778 1613862 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1630424 1630588 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr three_prime_UTR 1641347 1641473 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 gmst stop_codon 1641474 1641476 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst CDS 1641474 1641483 24.335131 - 1 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst exon 1641474 1641483 24.335131 - 1 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst intron 1641484 1643629 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst CDS 1643630 1643765 24.335131 - 2 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst exon 1643630 1643765 24.335131 - 2 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst intron 1643766 1646726 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst CDS 1646727 1646898 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst exon 1646727 1646898 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 gmst start_codon 1646896 1646898 24.335131 - 0 transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr five_prime_UTR 1646899 1646901 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr five_prime_UTR 1656810 1656979 1000 - . transcript_id "g39.t1"; gene_id "g39";
chr01 stringtie2utr five_prime_UTR 1659327 1659382 1000 - . transcript_id "g39.t1"; gene_id "g39"
—
Reply to this email directly, view it on GitHub
<#723>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJMC6JEQMPRBMHUSPOSYOWLYJKXJBAVCNFSM6AAAAABAUNBKDOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DCMJVG43TENQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you for your response. I'm still a bit puzzled and would appreciate further clarification. As I understand it - it could be wrong - a transcript should have only one single continuous 5' UTR, starting at the beginning of the transcript and ending just before the start codon, and similarly, one single continuous 3' UTR, beginning right after the stop codon and extending to the end of the transcript. Does this situation mean transcript variants have different UTRs (I'm not sure if they really exist or if it's due to assembly reasons) and these UTRs are all added to the annotation? Or are these multiple 5' UTRs just parts of a large 5' UTR? Should I only reserve one 5' UTR and one 3' UTR, or is it okay to just leave it here? |
In eukaryotes, UTRs can be spliced. Less frequently so in the 3'UTR, but it also happens there. This is not to say that all the stringtie assemblies and all the genes are correct. Everything in structural genome annotation may contain errors. |
I guess the issue arose because I used transcriptome data from different varieties of the same species (since I didn't perform transcriptome sequencing on my sequenced material). After reads mapping, it's possible that the edges of transcripts of the same gene appeared different. Of course, this is just a speculation, and I haven't checked it with IGV. |
UTRs inferred from evidence often look differently from reference
annotation UTRs and from evidence in an independent experiment.
ChuanzhengWei ***@***.***> schrieb am Fr. 15. Dez. 2023 um
03:53:
… I guess the issue arose because I used transcriptome data from different
varieties of the same species (since I didn't perform transcriptome
sequencing on my sequenced material). After reads mapping, it's possible
that the edges of transcripts of the same gene appeared different. Of
course, this is just a speculation, and I haven't checked it with IGV.
—
Reply to this email directly, view it on GitHub
<#723 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJMC6JGVN3NOLJKGNGONE4LYJO3RPAVCNFSM6AAAAABAUNBKDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJXGE4DQNRVHE>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I will close this issue because I believe there is nothing wrong with the software. |
Hi,
I tried stringtie2utr.py with the GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff file to add utr into braker.gtf.
However, I encountered the problem that multiple five_prime_UTRs and three_prime_UTRs are generated within a gene, the same issue as #716 (comment).
Here are some examples.
The text was updated successfully, but these errors were encountered: