petap.pl fails on one sample but not others, too few introns #50

ToriEggers · 2022-10-31T23:19:19Z

Hi,
I have four nematode genome samples and I'm running BRAKER with genemark epmode to annotate with protein + genome, with RNA + genome, and then combine the two with TSEBRA. 3 of the genomes process perfectly fine but on another I keep running into a problem with the gmes_petap.pl step no matter the size or evolutionary distance of the protein file that I use (I've tried many, from sister species to all metazoa). Though the protein +genome run fails on this sample, the RNA + genome completes. When running esmode the protein annotation completes. I did these samples close in time, so there was no software updates or changes to my environment between samples.

Any idea as to why this one particular sample won't run like the others?

Error in braker.log:

                      RUNNING GENEMARK-EX

Preparing genemark_evidence file hints from manual hints...
Checking whether file /home/data/jfierst/veggers/DF5033_BRAKER_odb10/genemark_hintsfile.gff contains enough hints and sufficient multiplicity information...

WARNING:
The hints file(s) for GeneMark-EX contain less than 1000 introns. (In total, 6 unique introns are contained.)
Genemark-EX might fail due to the low number of hints.

WARNING:
The hints file(s) for GeneMark-EX contain less than 150 introns with multiplicity >= 4! (In total, 6 unique introns are contained. 0 have a multiplicity >= 4.)
Possibly, you are trying to run braker.pl on data that does not provide sufficient multiplicity information. This will e.g. happen if you try to use introns generated from assembled RNA-Seq transcripts; or if
you try to run braker.pl in epmode with mappings from proteins without sufficient hits per locus. Or if you use the example data set.
A low number of intron hints with sufficient multiplicity may result in a crash of GeneMark-EX (it should not crash with the example data set).

Running GeneMark-EP
changing into GeneMark-EP directory /home/data/jfierst/veggers/DF5033_BRAKER_odb10/GeneMark-EP
cd /home/data/jfierst/veggers/DF5033_BRAKER_odb10/GeneMark-EP
Running gmes_petap.pl
perl /home/data/jfierst/veggers/gmes_linux_64/gmes_petap.pl --verbose --seq /home/data/jfierst/veggers/DF5033_BRAKER_odb10/genome.fa --EP /home/data/jfierst/veggers/DF5033_BRAKER_odb10/genemark_hintsfile.gff --c
ores=8 --gc_donor 0.001 --evidence /home/data/jfierst/veggers/DF5033_BRAKER_odb10/genemark_evidence.gff --soft_mask auto 1>/home/data/jfierst/veggers/DF5033_BRAKER_odb10/GeneMark-EP.stdout 2>/home/data/jfierst
/veggers/DF5033_BRAKER_odb10/errors/GeneMark-EP.stderr

The GeneMark-EP.stderr file is empty

The text was updated successfully, but these errors were encountered:

tomasbruna · 2023-04-14T15:30:54Z

Sorry for the late reply. Is this still an issue or were you able to find a solution? Judging from these error messages, one problem could be that this genome is too small.

vkeggers · 2023-04-15T11:20:52Z

I ran it in es mode for protein+genome and then paired that data with the RNA+genome and ran it through TSEBRA. I don't know if this is necessarily 'correct' or best practice but I got an output. ~14000 genes were reported compared with ~19000 for the other species. The genome is ~74Mb. Is this too small?

I was going to try braker3 that was released recently and see if that changed anything but haven't had the time.

Ultimately I have data but I still don't know why it isn't working when given a protein file.

tomasbruna · 2023-04-15T22:58:49Z

14,000 could be a bit low considering C. elegans (~100 Mbp) has ~20,000 genes in the annotation.

Ultimately I have data but I still don't know why it isn't working when given a protein file.

Apart from BRAKER3, you can also try a new protein-based pipeline, GALBA (preprint available here). It employs miniprot to align the reference proteins and uses the alignments directly to train AUGUSTUS, so it can be helpful in cases when GeneMark-EP fails for whatever reason. I'd recommend extracting nematode protein from the new OrthoDB v11 release and supplementing the protein set with additional nematodes from RefSeq - to get better protein coverage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

petap.pl fails on one sample but not others, too few introns #50

petap.pl fails on one sample but not others, too few introns #50

ToriEggers commented Oct 31, 2022

tomasbruna commented Apr 14, 2023

vkeggers commented Apr 15, 2023

tomasbruna commented Apr 15, 2023

petap.pl fails on one sample but not others, too few introns #50

petap.pl fails on one sample but not others, too few introns #50

Comments

ToriEggers commented Oct 31, 2022

tomasbruna commented Apr 14, 2023

vkeggers commented Apr 15, 2023

tomasbruna commented Apr 15, 2023