You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran Braker to predict gene structure and got the problem in the step running Genmark-EX as below.
I used the reference genome and customized amino acid sequence database with the following command, which worked well with all species in the same genus except for the reference genome of one species. I am NOT use RNA-Seq data. braker.pl --genome="$genome" --prot_seq="$Apodiea_gene_AA"
braker.log
#**********************************************************************************
# RUNNING GENEMARK-EX
#**********************************************************************************
# Sat Jun 22 22:59:29 2024: Preparing genemark_evidence file hints from manual hints...
# Sat Jun 22 22:59:29 2024: Checking whether file /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/genemark_hintsfile.gff contains enough hints and sufficient multiplicity information...
#*********
# WARNING:
# The hints file(s) for GeneMark-EX contain less than 150 introns with multiplicity >= 4! (In total, 2658 unique introns are contained. 16 have a multiplicity >= 4.)
# Possibly, you are trying to run braker.pl on data that does not provide sufficient multiplicity information. This will e.g. happen if you try to use introns generated from assembled RNA-Seq transcripts; or if you try to run braker.pl in epmode with mappings from proteins without sufficient hits per locus. Or if you use the example data set.
# A low number of intron hints with sufficient multiplicity may result in a crash of GeneMark-EX (it should not crash with the example data set).
#*********
# Sat Jun 22 22:59:29 2024: Running GeneMark-EP
# Sat Jun 22 22:59:29 2024: changing into GeneMark-EP directory /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/GeneMark-EP
cd /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/GeneMark-EP
# Sat Jun 22 22:59:29 2024: Running gmes_petap.pl
/home/user/miniforge3/envs/braker3/bin/perl /home/user/proj/sofwtare/gmetp_linux_64/bin/gmes/gmes_petap.pl --verbose
--seq /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/genome.fa
--EP /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/genemark_hintsfile.gff
--cores=8 --gc_donor 0.001 --evidence /output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/genemark_evidence.gff
--soft_mask auto 1>/output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/GeneMark-EP.stdout
2>/output/proj/data/bee_proj_data/gene_annotation/braker_BomMus_results_for_protseq/errors/GeneMark-EP.stderr
output error file
tail -30 gene_annotation_AndBic.40649391.e
The number of mairs aligned (8804/8804 (100%) pairs aligned) is much smaller than other reference genomes (448747/448747 (100%) pairs aligned). It seems that this reference genome is very distant from the protein database. Can you give any hints to address this issue?
[Sat Jun 22 22:58:48 2024] Enqueueing pair 8796/8804 (99.9%). Est. time left: 00:00:01 (hh:mm:ss)
[Sat Jun 22 22:59:27 2024] 8804/8804 (100%) pairs aligned
[Sat Jun 22 22:59:27 2024] Alignment of pairs finished
[Sat Jun 22 22:59:27 2024] Translating coordinates from local pair level to contig level
[Sat Jun 22 22:59:27 2024] Finished spliced alignment
[Sat Jun 22 22:59:27 2024] Flagging top chains
[Sat Jun 22 22:59:28 2024] Processing the output
[Sat Jun 22 22:59:29 2024] Output processed
[Sat Jun 22 22:59:29 2024] ProtHint finished.
ERROR in file /home/user/miniforge3/envs/braker3/bin/braker.pl at line 5414
Failed to execute: /home/user/miniforge3/envs/braker3/bin/perl /home/user/proj/sofwtare/gmetp_linux_64/bin/gmes/gmes_petap.pl --verbos ...
The text was updated successfully, but these errors were encountered:
It's unlikely that the reference proteins would be close enough for some members of the genus but not for others.
This looks like some technical issue, possibly with the assembly of that one genome. You can send me the assembly of one of the genomes where the algorithm works well, the assembly of the problematic one, and the protein database. I'll take a look (please share by email, [email protected], if you don't want your data to appear here).
Hi @tomasbruna,
I ran Braker to predict gene structure and got the problem in the step running Genmark-EX as below.
I used the reference genome and customized amino acid sequence database with the following command, which worked well with all species in the same genus except for the reference genome of one species. I am NOT use RNA-Seq data.
braker.pl --genome="$genome" --prot_seq="$Apodiea_gene_AA"
braker.log
output error file
tail -30 gene_annotation_AndBic.40649391.e
The number of mairs aligned (8804/8804 (100%) pairs aligned) is much smaller than other reference genomes (448747/448747 (100%) pairs aligned). It seems that this reference genome is very distant from the protein database. Can you give any hints to address this issue?
The text was updated successfully, but these errors were encountered: