-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rc vs 63 vat sop documentation #7879
Conversation
fe992aa
to
b99e8be
Compare
b99e8be
to
d01bf5c
Compare
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7879 +/- ##
================================================
Coverage ? 86.290%
Complexity ? 35190
================================================
Files ? 2170
Lines ? 164888
Branches ? 17786
================================================
Hits ? 142282
Misses ? 16281
Partials ? 6325 |
1899d7b
to
06e9b73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately have to leave these notes here, since the text in question isn't part of the PR diff; apologies if my meaning isn't clear.
They are cleaned up after 24 hours, but this code needs to be tweaked so that you can’t get into a state where duplicates are created here. The real question here is going to be, is there a use case that we might want to run where adding to a VAT that was created say weeks ago is beneficial, but given that calculations occur on a sample summing level, this seems unlikely.
Do we still need this? Seems like it belongs in a ticket (or two) and not documentation.
To check that all of the shards have successfully made it past the first of the sub workflow steps (the most complicated and likely to fail) they will need to be annotated / transformed into json files and put here: gsutil ls [output_path]/annotations/ | wc -l
This (mostly "they will need to be annotated / transformed into json files and put here") makes it sound like the person running the WDL needs to do something else, instead of this being a way to check on the success of the WDL.
Lastly, is the TSV file the only deliverable? Is there anything else that needs to be communicated...etc.? Either way, can this be explicitly added to this doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to harp on this. I think you should change
The pipeline takes in a jointVCF and outputs a table in BigQuery.
to something like
The pipeline takes in jointVCFs (and index files), creates a table in BigQuery, and outputs a bgzipped TSV file containing the contents of that table, which is the only deliverable.
to make it clear what the final "thing" is that is created and delivered to the client/user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good 👍🏻
No description provided.