-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS-765. Scatter the RemoveDuplicates task. #8144
Conversation
Passing VAT from VDS run on Quickstart here: https://app.terra.bio/#workspaces/gvs-dev/GVS%20Quickstart%20v3%20ggrant/job_history/533017b9-2dfb-42ec-83ef-42dfda67f5c1 |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #8144 +/- ##
================================================
Coverage ? 86.238%
Complexity ? 35194
================================================
Files ? 2173
Lines ? 165045
Branches ? 17794
================================================
Hits ? 142332
Misses ? 16387
Partials ? 6326 |
call StripCustomAnnotationsFromSitesOnlyVCF { | ||
input: | ||
input_vcf = SelectVariants.output_vcf, | ||
input_vcf = RemoveDuplicatesFromSitesOnlyVCF.output_vcf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scatter looks good--I think the only thing I would like us to add is some additional validate vat queries because I think this opens us up to a few potential issues that we'll just want to track
( based on sites or site-variants that were split between different shards or ended up that way because of left alignment )
- Ensure that all the rows for a single site have the same AN value: Is there a site that has more than one distinct AN value?
- Ensure that all the rows for a specific site-variant / VID have the same AC value. Is there a site-variant / VID that has more than one distinct AC value?
input: | ||
sites_only_vcf = SelectVariants.output_vcf, | ||
ref = reference | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is scattered now but it looks like it's still using the same resources (memory, disk, cpu etc) that it was when it was unscattered? Perhaps the monitoring script could confirm whether all those resources are still required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good apart from some debug code which presumably will be removed before merge 👍
No description provided.