Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ancestral, translate] node data validation improvements #1440

Merged
merged 5 commits into from
Mar 19, 2024

Conversation

jameshadfield
Copy link
Member

See commit messages for details. Motivated by #1438 (review)

Copy link

codecov bot commented Mar 16, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 68.82%. Comparing base (75fc490) to head (a43536c).

Files Patch % Lines
augur/util_support/node_data_file.py 77.77% 3 Missing and 3 partials ⚠️
augur/utils.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1440      +/-   ##
==========================================
+ Coverage   68.70%   68.82%   +0.11%     
==========================================
  Files          69       69              
  Lines        7574     7595      +21     
  Branches     1858     1860       +2     
==========================================
+ Hits         5204     5227      +23     
+ Misses       2088     2086       -2     
  Partials      282      282              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joverlee521
Copy link
Contributor

joverlee521 commented Mar 18, 2024

Resolves #951

Related to #1205

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up here @jameshadfield!

I left some comments and came away from this PR wondering if we should take the opportunity to redesign how node data is handled within Augur. Maybe it'd be better to discuss through a synchronized meeting?

augur/util_support/node_data_file.py Show resolved Hide resolved
augur/util_support/node_data_file.py Outdated Show resolved Hide resolved
augur/util_support/node_data_file.py Outdated Show resolved Hide resolved
augur/ancestral.py Outdated Show resolved Hide resolved
augur/util_support/node_data_file.py Show resolved Hide resolved
'q' is not a valid formatting type for python (it's a snakemake feature)
NodeDataFile (called by NodeDataReader, which is called by
read_node_data) no longer attempts JSON validation when the requested
validation mode is SKIP.
By leveraging the existing validation designed for reading node-data
files we not only validate against the annotation schema but also
perform other sanity checks on the node-data structure for which a
schema does not (yet) exist.

As part of this we add a `--validation-mode` argument following `augur
export v2` which allows users to skip validation of the output node-data
and (translate only) skip validation of the input node-data.
The help message for `--validation-mode` is very specific:

> 'error' mode causes a non-zero exit status if any validation checks
> failed, while 'warn' does not.

In the actual implementation however we would print an error and exit
with code 2 for any validation check _except_ the augur version check.
Now, for `ValidationMode.WARN` we catch these errors and turn them into
warnings such that augur can exit code 0 as specified.

Note that during validation, any non-ValidationError exception will be
be fatal even in `WARN`. An example of this would be errors loading the
schema itself. See the following review comment for more:
<#1440 (comment)>
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the validation improvements! Looks good to me 👍

This seems like a good direction and in the future we should consider adding validations to any augur commands that produce node-data JSONs. Personally, I'd like to see these validation failures early in the pipeline before the final augur export step.

@jameshadfield jameshadfield merged commit 5413f7a into master Mar 19, 2024
4 of 5 checks passed
@jameshadfield jameshadfield deleted the validation-improvements branch March 19, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants