Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur translate produces genome annotations that fail validation in augur export #1205

Open
joverlee521 opened this issue Apr 28, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@joverlee521
Copy link
Contributor

Current Behavior

If the reference sequence provided to the augur translate command has invalid characters in a gene name (e.g. spaces), this will eventually lead to an error during augur export v2 validation.

The error message from augur export v2 is not super informative:

Validating schema of 'auspice/zika.json'...
  .meta.genome_annotations {"nuc": {"end": 10769, "start": 1, "strand": "+"…} failed additionalProperties validation for false
  .tree {"name": "NODE_0000000", "node_attrs": {"div": 0…} failed oneOf validation for [{"$ref": "#/$defs/tree"}, {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}]
    validation for arm 0: {"$ref": "#/$defs/tree"}
      .tree.children[…].branch_attrs.mutations {"nuc": ["T329C", "C1209G"], "Capsid Protein": […} failed additionalProperties validation for false
      .tree.children[…].branch_attrs.mutations {"nuc": ["G318T", "G438T", "C1233T", "C1416T", "…} failed additionalProperties validation for false
      .tree.children[…].branch_attrs.mutations {"nuc": ["G406A"], "Capsid Protein": ["A106T"]} failed additionalProperties validation for false
      .tree.children[…].branch_attrs.mutations {"nuc": ["T329C", "T762C", "G1170T", "G1458A", "…} failed additionalProperties validation for false
      .tree.children[…].branch_attrs.mutations {"nuc": ["A3C", "T411A", "T738C", "C858T", "G864…} failed additionalProperties validation for false
      .tree.children[…].branch_attrs.mutations {"nuc": ["T249C", "G416A", "C789T", "T2032C", "T…} failed additionalProperties validation for false
    validation for arm 1: {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}
      .tree {"name": "NODE_0000000", "node_attrs": {"div": 0…} failed type validation for "array"
Validation of 'auspice/zika.json' failed.

------------------------
Validation of auspice/zika.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly. 
------------------------

Expected behavior

The aa-muts.json file produced from augur translate should be valid for augur export v2.

How to reproduce

Steps to reproduce the current behavior:

  1. Add a space in a gene name for the zika tutorial reference.
  2. Run the zika tutorial build
  3. See error in final export step.

Possible solution

  • Relax schema pattern matching. Spaces in gene names do not cause any (obvious) issues in Auspice.
  • Validate output of augur translate to ensure it will not cause errors downstream

Additional context

First saw this issue during Nextstrain office hours on 2023-04-27.

@jameshadfield
Copy link
Member

Gene names really shouldn't have spaces in them as per general guidelines,

Symbols contain only uppercase Latin letters and Arabic numerals, and punctuation is avoided, with an exception for hyphens in specific groups,

but Auspice can display them and so I don't see a problem relaxing the schema. We should strongly recommend that short names without spaces are best, as Auspice will only display these when there is enough space available to draw them on top of the rendered CDS. We will shortly have the ability to export display name and/or description (for each gene/CDS) which may help with this.

@jameshadfield
Copy link
Member

See also #955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Prioritized
Development

No branches or pull requests

2 participants