Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine BRAKER3 and Galba annotation for new, non-model genome #51

Open
YanisChrys opened this issue May 21, 2024 · 0 comments
Open

Combine BRAKER3 and Galba annotation for new, non-model genome #51

YanisChrys opened this issue May 21, 2024 · 0 comments

Comments

@YanisChrys
Copy link

YanisChrys commented May 21, 2024

Hello,

In order to better annotate my reptile genome I have used BRAKER3-ETP and Galba (with ~6 of the closest relative proteomes) and now I am trying to combine them using TSEBRA. The inputs look like this:

BRAKER3 | C:67.6%[S:56.0%,D:11.6%],F:3.6%,M:28.8%,n:7480

GALBA | C:93.3%[S:75.1%,D:18.2%],F:1.6%,M:5.1%,n:7480

This is the command I am using:

singularity exec --bind $WORKDIR /path/to/containers/braker3.sif /opt/TSEBRA/bin/tsebra.py \ --gtf ${galba_gtf},${braker_gtf} \ --hintfiles ${hints_gff2},${hints_gff1} \ --filter_single_exon_genes \ --ignore_tx_phase \ --cfg ${braker_cfg} \ --out ${out_gtf} \ --verbose 1 2> ${log}

I have experimented a lot but I can't find a way to run it that maximises busco and keeps mostly single genes.
I have tried the following:

  • Adding ${augustus_gtf},${genemark_gtf} from BRAKER3 run
  • Using different config files and thresolds:
    • Braker3.cfg, default.cfg, pref_braker1.cfg
    • custom cfg with:
    • e_1 =0.1, e_2-4=0.05
    • e_1-4=0.05 | 0.005 |
    • other combinations where either e_1 is higher or lower than the other 3
    • increasing or decreasing intron_support
    • increasing or decreasing "P" and "E" weights

However these all result in a lot of duplicates (65-87%) and only slightly better busco than galba (92-94%).

Decreasing the e_n values, reduces duplicates but also the busco score.

Perhaps the issue is with transcripts having equally low support so more than 1 overlapping transcripts are kept?

Any advice is welcome. Please feel free to ask for any extra information or log files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant