Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in RUNNING GENEMARK-EX #716

Closed
ChuanzhengWei opened this issue Dec 7, 2023 · 8 comments
Closed

Problems in RUNNING GENEMARK-EX #716

ChuanzhengWei opened this issue Dec 7, 2023 · 8 comments
Assignees
Labels

Comments

@ChuanzhengWei
Copy link

Dear Braker team,
I got an error when using brake3 built by Singularity for gene prediction. I am not sure whether there is a problem with gmetp.pl during the running process. My input file is a protein file and a bam file aligned with hisat2.
This is my input command:

singularity exec /public/home/weichuanzheng/software/singularity/braker3/braker3.sif braker.pl --JAVA_PATH=/public/home/weichuanzheng/software/jdk/bin --threads=8 --species=s349 \
    --genome=/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/s349.nextpolish.fasta.masked \
    --prot_seq=/public/home/weichuanzheng/project/11.Sorghum_genome/06.prot/structure_annotation1.fasta \
    --bam==/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/bamfile/SRR23260553.sorted.bam,............

The following is the specific content of the error report:
In 'braker.log':

# Wed Dec  6 10:37:36 2023: sorting RNA-Seq BAM files
# Wed Dec  6 12:42:05 2023: Running gmetp.pl
/usr/bin/perl /opt/ETP/bin/gmetp.pl --cfg /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_config.yaml --workdir /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP --bam /public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/GeneMark-ETP/etp_data/ --cores 8 --softmask  1>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stdout 2>/public/home/weichuanzheng/project/08.pangenome/03.mask/s349/annotation/braker/errors/GeneMark-ETP.stderr

At the end of 'GeneMark-ETP.stderr':

WARNING: 'ptg000031l_np1212' does not match any sequence in the fasta file. Maybe the two files do not belong together.
error
error, file/folder not found: transcripts_merged.fasta.gff

In 'GeneMark-ETP.stdout':

GeneMarkS: error on last system call, error code 256
Abort program!!!

I would appreciate any suggestions.

@KatharinaHoff
Copy link
Member

KatharinaHoff commented Dec 7, 2023 via email

@ChuanzhengWei
Copy link
Author

My protein file includes sequences from 60 varieties of sorghum, one variety of rice, and one variety of maize. Initially, I faced a problem that did not seem to stem from the protein file itself. After renaming and shortening the headers of the sequences in the genome file, I successfully generated the braker.gff file.

However, I've encountered a new challenge: the generated GFF file does not contain UTRs (Untranslated Regions). I think this issue might be related to the limitations of the container environment, as I am running BRAKER through Singularity due to the lack of root privileges on my system.

Given these constraints, could you please advise on how I might obtain a GFF file that includes UTRs? Any guidance or suggestions you can offer would be greatly appreciated, as this is a critical component of my project.

Thank you in advance for your time and assistance. I look forward to your valuable input.

Best regards

@ChuanzhengWei
Copy link
Author

This is the version I'm using
singularity exec braker3.sif braker.pl --version
braker.pl version 3.0.3

@KatharinaHoff
Copy link
Member

See #587

@ChuanzhengWei
Copy link
Author

I did not find GeneMark-ETP/rnaseq/stringtie/transcripts_merged.gff, so I need to reuse stringtie to obtain a new gff file, and then merge the stringtie.gff and braker.gtf files through stringtie2utr.py?

See #587

@KatharinaHoff
Copy link
Member

Yes, you need to run stringtie. The script is not connected to BRAKER, yet.

@ChuanzhengWei
Copy link
Author

thank you, I successfully obtained a GTF file containing UTRs using stringtie2utr.py, but I've encountered a new issue: there are multiple pieces of information generated for the 5' UTR or 3' UTR of the same gene.like this

    178 Chr01   stringtie2utr   five_prime_UTR  36899   36899   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    179 Chr01   stringtie2utr   five_prime_UTR  37358   37440   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    180 Chr01   stringtie2utr   five_prime_UTR  41705   41825   1000    -       .       transcript_id "g4.t2"; gene_id "g4";
    181 Chr01   stringtie2utr   five_prime_UTR  42029   42456   1000    -       .       transcript_id "g4.t2"; gene_id "g4"

I want to know if this situation is normal.

@KatharinaHoff
Copy link
Member

KatharinaHoff commented Dec 19, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants