-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the subpopulation calculations to the VAT creation WDL #7399
Conversation
aa3be2b
to
03b3d53
Compare
bad75e2
to
2bc0c0f
Compare
not all samples
59289ba
to
8ff7985
Compare
} | ||
|
||
################################################################################ | ||
|
||
task MakeSubpopulationFiles { ## TODO why am I seemingly able to get the ancestry file in Terra? is it cached? I should need the SA for it---so actually, but me worth trying the SA use sooner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue--but I would feel more comfortable with the data in an AoU bucket just for completeness I suppose? I'm going to do a follow on pr where I move the ancestry file into an AoU protected bucket and add a SA json to this step
if [[ $NUMRESULTS != "0" ]]; then | ||
echo "PASS: The VAT table ~{fq_vat_table} has a correct calculation for subpopulation" > validation_results.txt | ||
else | ||
echo "FAIL: The VAT table ~{fq_vat_table} has an incorrect calculation for subpopulation" > validation_results.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the vids that are returned from the query to make it easier to debug the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I think they are already returned in the validation_results.txt
FROM | ||
~{fq_vat_table} | ||
WHERE | ||
gvs_all_ac != 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this before we substitute 20 for values less than 20? If so then we don't really need this check. And if not then there could still be a subpop that is lower then 20 even if the all_ac isn't so I'm not sure this check is really doing the correct thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will all get ripped out during de-obfuscation pr -- it was really just so that there wouldn't be validation run at all on the rows where the AC had been obfuscated to 20 (because the math wouldn't check out there)
if [[ $NUMRESULTS != "0" ]]; then | ||
echo "PASS: The VAT table ~{fq_vat_table} has a correct calculation for AC and the AC of subpopulations" > validation_results.txt | ||
else | ||
echo "FAIL: The VAT table ~{fq_vat_table} has an incorrect calculation for AC and the AC of subpopulations" > validation_results.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe log the vids for easier debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes---the vids are logged in the output file
@@ -17,9 +17,37 @@ workflow GvsSitesOnlyVCF { | |||
String? service_account_json_path | |||
File? gatk_override | |||
File AnAcAf_annotations_template | |||
File ancestry_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this wdl seems relatively generic, but sense an ancestry_file is required it seem sub pop specific. Maybe we should rename this and move to the vat dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan was to rename the wdl when it gets split into the sites only and the vat creation wdl
it might make sense to do that as a ticket we tackle sooner rather than later
@@ -403,6 +585,31 @@ task BigQueryLoadJson { | |||
v.gvs_all_ac, | |||
v.gvs_all_an, | |||
v.gvs_all_af, | |||
v.gvs_max_af, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need this line at the start of this task:
echo "project_id = ~{project_id}" > ~/.bigqueryrc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I not need to add the project_id to all of my queries then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you shouldn't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that line just prevents an interactive error message that we sometimes see
@@ -0,0 +1,3 @@ | |||
{ | |||
"ToyLoopOptional.string": "HALP" | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean to check this in and the ToyLoop wdl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg nope
print(seen_subpopulations) | ||
else: | ||
print("WARNING: This list had an unexpected subpopulation", row_subpopulation) | ||
## TODO do I need to close these? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you do need to close them.
If you use a with
clause then you don't, but I don't think that will work within this looping structure
307c4a8
to
c13e16a
Compare
* add subpop list * add hard coded steps for one set of subpop calculations * add sample list * update the WDL to have the subworkflows * parse the subpop ac/an/af in python * update schemas with the new subpop calculations * fix initial state bug for max calc * add gvs subpops * remove saved annotations used for debugging * short term testing in Terra * add ancestry file parsing * add validation step for subpops * adding service account to grab ancestry file * turn off localization optional for auth issues * add service account to subpop calcs * cleaner linux parsing tabs not all samples * add ancestry file * save annotation file for debugging * update subpop python script * add validation for the subpop AC * debugging req SA json insanity * cleaner to use expected file type * this version works--need to unhardcode a bit * #&^%@($&^ WDL!!! * pr review edits * close python file handles * remove from dockstore
This pr adds the subpopulation AC/AN/AF calculations.
It does this by taking in the ancestry table and making sublists of each---then passing that list of samples into the SelectVariants GATK tool
Updated Lucid chart here: https://lucid.app/lucidchart/fee376a4-4b72-481e-a239-a027f7f6ab1f/edit?page=CsG3hy3S1zEH#
Design Doc for this work:
https://docs.google.com/document/d/1FnPu_Jkz2O9rElApAQld0v6iBEFGe22dKarVWcwNxGI/edit
misc:
how should I add the VAT validation to the VAT pipeline? Should it run automatically?
Anvil data version of this table: spec-ops-aou:anvil_100_for_testing.vat_aug19