Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate annotations produced from ancestral + translate #951

Open
corneliusroemer opened this issue May 25, 2022 · 4 comments
Open

Validate annotations produced from ancestral + translate #951

corneliusroemer opened this issue May 25, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@corneliusroemer
Copy link
Member

I've encountered a bug that took me very long to figure out. Augur export reported the following error:

Validating schema of 'auspice/monkeypox_global.json'...
        ERROR: 'nuc' is a required property. Trace: properties - meta - properties - genome_annotations - required
Validation of 'auspice/monkeypox_global.json' failed.

------------------------
Validation of auspice/monkeypox_global.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly. 
------------------------

Now it turns out, that export requires nuc annotations, and these come in usually through aa_mut.json from augur translate.

I was reading in annotations from a .gff into translate, something that's theoretically supported. However, it's actually not possible to read in nuc annotation in the current implementation.

It would have very much sped up debugging if augur translate had warned me (or even errored) when it realised that it was lacking nuc annotations.

I'd propose an error if nuc not output into aa_mut.json:

[Error] Could not read in `nuc` annotations. Please check the annotation in your input file. For `.gff` the line needs to look like this:
MT903344.1	Genbank	source	1	197233	.	+	.	locus_tag=nuc

Related to #881

@huddlej
Copy link
Contributor

huddlej commented Jun 8, 2022

I think this issue arose as part of this Slack conversation. @corneliusroemer, am I correct in this?

@jameshadfield
Copy link
Member

(1 year later...)

The annotations schema now requires 'nuc' to be present (d6246ca) however neither augur ancestral nor augur translate validate their outputs. Reading any node-data file (via NodeDataReader) with an "annotations" block will also validate against the schema, although in this case that's still going to be first encountered in augur export v2.

Conceptually we could have the annotations from ancestral define 'nuc' and translate define the CDSs, and they'll be merged in augur export, however I think it's sensible to require translate to add a 'nuc' block, which is why I made it a required property. If augur export sees multiple annotations.nuc entries it should really ensure they are the same length! (The JSON merging happens within NodeDataReader)

@jameshadfield jameshadfield changed the title Enhancement: warn if augur translate doesn't know about nuc annotation Validate annotations produced from ancestral + translate Aug 30, 2023
@mazeller
Copy link

Just a note, I ran into this issue working on my PRRSV dataset (https://github.com/mazeller/NextClade_Datasets/tree/main/prrsv_yimim_v3). I needed to append the following line to my GFF manually.

DQ478308.1 Genbank source 1 603 . + . locus_tag=nuc

@jameshadfield
Copy link
Member

however I think it's sensible to require translate to add a 'nuc' block, which is why I made it a required property

As of 1d17699 (in master, but not yet released) augur translate will always export this. (I missed this issue when scanning, it's very similar to #953.)

Just a note, I ran into this issue working on my PRRSV dataset (https://github.com/mazeller/NextClade_Datasets/tree/main/prrsv_yimim_v3). I needed to append the following line to my GFF manually.

P.S. recent augur PRs (merged but not released) will fix this, we'll now read the nuc coords from the sequence-region pragma in your GFF ("##sequence-region DQ478308.1 1 603").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

4 participants