Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-765. Scatter the RemoveDuplicates task. #8144

Merged
merged 6 commits into from
Jan 9, 2023

Conversation

gbggrant
Copy link
Collaborator

@gbggrant gbggrant commented Jan 4, 2023

No description provided.

@gbggrant
Copy link
Collaborator Author

gbggrant commented Jan 4, 2023

@codecov
Copy link

codecov bot commented Jan 4, 2023

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@20409d9). Click here to learn what that means.
The diff coverage is n/a.

Additional details and impacted files
@@               Coverage Diff                @@
##             ah_var_store     #8144   +/-   ##
================================================
  Coverage                ?   86.238%           
  Complexity              ?     35194           
================================================
  Files                   ?      2173           
  Lines                   ?    165045           
  Branches                ?     17794           
================================================
  Hits                    ?    142332           
  Misses                  ?     16387           
  Partials                ?      6326           

call StripCustomAnnotationsFromSitesOnlyVCF {
input:
input_vcf = SelectVariants.output_vcf,
input_vcf = RemoveDuplicatesFromSitesOnlyVCF.output_vcf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scatter looks good--I think the only thing I would like us to add is some additional validate vat queries because I think this opens us up to a few potential issues that we'll just want to track
( based on sites or site-variants that were split between different shards or ended up that way because of left alignment )

  1. Ensure that all the rows for a single site have the same AN value: Is there a site that has more than one distinct AN value?
  2. Ensure that all the rows for a specific site-variant / VID have the same AC value. Is there a site-variant / VID that has more than one distinct AC value?

input:
sites_only_vcf = SelectVariants.output_vcf,
ref = reference
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is scattered now but it looks like it's still using the same resources (memory, disk, cpu etc) that it was when it was unscattered? Perhaps the monitoring script could confirm whether all those resources are still required.

Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good apart from some debug code which presumably will be removed before merge 👍

@gbggrant gbggrant merged commit 6f2e75a into ah_var_store Jan 9, 2023
@gbggrant gbggrant deleted the gg_VS-765_ScatterRemoveDupsTask branch January 9, 2023 15:57
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants