-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create WDL to validate VAT and add first test #7352
Conversation
Hey @rsasch -- Just thinking out loud here… but does WDL seem like the right "language" to implement these sort of checks? It just feels like there is a ton of boilerplate, the WDL/bash constructs are a bit hard to follow, and I'm sure the development cycles were pretty slow waiting for cromwell to spin up all those VMs, etc. An alternative would be to write all these validations as a python script, which could make better use of structure (ie authenticating once, etc) and I bet would be quite a bit more readable. You could also run it locally if needed for development, debugging and then ultimately wrap it into a single-task WDL so it could be run easily from Terra. Happy to chat more |
@kcibul My reasoning for doing it in WDL is to better integrate with the process that creates the VAT table, and therefore make sure that the person running it has access (and knows the location of) not only to the VAT table but also the intermediary steps (e.g. the annotation JSON files that are output from NIRVANA). Not all of the validation steps need to be all bash; the first one was because it's literally just a call to make sure a table exists and has rows with |
Discussed with @kcibul (and others) offline. Conclusion: the majority of the validation code might end up being non-WDL with just one or two WDL tasks that call pthyon....etc., but it will be divided up into separate WDL tasks for now for development, and once it's done (or mostly) done, it can always be restructured to have the code in whatever place/structure makes the most sense going forward. |
The idea is that this WDL will run all the checks for each release of the VAT table, one call for each validation. The first validation rule ("Validation Check confirms that data is put into the VAT table after completing without an error.") is included as a model for subsequent calls.
validation_results
will contain details of each test result in an array of{"testName": "result details"}
:Example 1 — fail
{ "EnsureVatTableHasVariants": "FAIL: The VAT table spec-ops-aou.rsa_gvs_quickstart.rsa_scratch has no variants in it." }
Example 2 — pass
{ "EnsureVatTableHasVariants": "PASS: The VAT table spec-ops-aou.anvil_100_for_testing.aou_shard_223_vat has 294821 variants in it." },
Example 3 — the test wasn't able to run
{ "EnsureVatTableHasVariants": "Something went wrong. The attempt to count the variants returned: Error in query string: Error processing job 'spec-ops- aou:bqjob_r357c4b6fe6b0c6fb_0000017aac301de7_1': Unrecognized name: vid at [1:24]" }
Closes https://github.com/broadinstitute/dsp-spec-ops/issues/364