Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train.f.gb empty #126

Closed
diriano opened this issue Nov 27, 2019 · 2 comments
Closed

train.f.gb empty #126

diriano opened this issue Nov 27, 2019 · 2 comments

Comments

@diriano
Copy link

diriano commented Nov 27, 2019

My train.f.gb file was empty. But I had plenty of good genes from RNASeq data. It turned out a problem in Augustus-3.3.3/scripts/filterGenesIn_mRNAname.pl that was not parsing correctly the transcript names.

I changed the line

if ( $_ =~ m/transcript_id "(.*)";/ ) {

for this one

if ( $_ =~ m/transcript_id "([0-9_t]*)"/ ) {

And now I have genes in train.f.gb

@phyden
Copy link

phyden commented Nov 29, 2019

Hi,

I observed the same issue. This also matches the observations in #125
The reason is apparently, that GeneMark-ES version 4.48 adds some more attributes after the transcript_id such as cds_type and count, which it did not in version 4.47. This is due to the very unspecific wildcard in the regex .*. It expands to everything between two double quotes, and therefore the extracted transcript_id string will not match the gene_id in the genbank file anymore.

Using an older version of GeneMark-ES would also be a solution, but there is no source where they can be obtained (at least I could not find any) and it's never a good idea to rely on older software unless there is a very good reason for this.

br,
Patrick

edit: I am not very experienced, with perl and augustus, but it might be, that the regex @diriano proposed is not working for all applications of augustus. I would therefore recommend to use the same regex but in a non-greedy variant, which seems to work as well for the needs here:

# instead of
f ( $_ =~ m/transcript_id "(.*)";/ ) {

#simply adding a questionmark (?)
f ( $_ =~ m/transcript_id "(.*?)";/ ) {

edit2: I saw that this was already fixed in a different way in the master branch of Augustus after the last release. With the next releases this issue will probably not recur.

@KatharinaHoff
Copy link
Member

I hope so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants