Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro exons with single bases or two #138

Open
ke-shi opened this issue Jul 31, 2024 · 3 comments
Open

Micro exons with single bases or two #138

ke-shi opened this issue Jul 31, 2024 · 3 comments
Labels

Comments

@ke-shi
Copy link

ke-shi commented Jul 31, 2024

Hi,

We used Helixer to predict genes in plant genome assemblies we have constructed. Approximately 30k genes were often predicted and complete BUSCO scores are always >95%. It seems prediction accuracy is very high. However, recently, we found ~30% genes predicted by Helixer had single-base exons or two-base. -- it may be micro exons. I am wondering such micro exons are frequently found in plant genomes. Or, do we need to customize parameters to eliminate them?

Here is an example with a single-base exon at 1519:

ch01      Helixer gene    1166    1953    .       +       .       ID=ch01g00002
ch01      Helixer mRNA    1166    1953    .       +       .       ID=ch01g00002.1;Parent=ch01g00002
ch01      Helixer exon    1166    1189    .       +       .       ID=ch01g00002.1.exon.1;Parent=ch01g00002.1
ch01      Helixer exon    1519    1519    .       +       .       ID=ch01g00002.1.exon.2;Parent=ch01g00002.1
ch01      Helixer exon    1707    1953    .       +       .       ID=ch01g00002.1.exon.3;Parent=ch01g00002.1
ch01      Helixer CDS     1187    1189    .       +       0       ID=ch01g00002.1.CDS.1;Parent=ch01g00002.1
ch01      Helixer CDS     1519    1519    .       +       0       ID=ch01g00002.1.CDS.2;Parent=ch01g00002.1
ch01      Helixer CDS     1707    1951    .       +       2       ID=ch01g00002.1.CDS.3;Parent=ch01g00002.1
ch01      Helixer five_prime_UTR  1166    1186    .       +       .       ID=ch01g00002.1.five_prime_UTR.1;Parent=ch01g00002.1
ch01      Helixer three_prime_UTR 1952    1953    .       +       .       ID=ch01g00002.1.three_prime_UTR.1;Parent=ch01g00002.1

Thanks,
Kenta

@alisandra
Copy link
Collaborator

Hi Kenta,

Thanks for the feedback. Microexons are not expected in plant genomes at the frequency Helixer predicts them and these are very likely predictive mistakes. It's a known challenge spot/issue for Helixer and occurs especially where there is a high level of uncertainty in the raw predictions. Unfortunately I don't have a quick parameter fix. Maybe increasing the --peak-threshold to improve the overall precision would help some. Hopefully we'll have have a better solution in the future.

@ke-shi
Copy link
Author

ke-shi commented Oct 1, 2024

Hi Alisandra,

Thank you for the reply. I look forward the solution!

Best,
Kenta

@Wan9299
Copy link

Wan9299 commented Dec 4, 2024

i had the same problem in plant genome annotation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants