-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS-775 vat validation shards #8175
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #8175 +/- ##
================================================
Coverage ? 85.880%
Complexity ? 35515
================================================
Files ? 2194
Lines ? 167029
Branches ? 18006
================================================
Hits ? 143444
Misses ? 17204
Partials ? 6381 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task 'DuplicateAnnotations' is failing for me (when run on my QuickStart) with:
/cromwell_root/script: line 76: syntax error near unexpected token )' /cromwell_root/script: line 76:
) > "$out94c3fb6c" 2> "$err94c3fb6c"'
Maybe you're not escaping the table string?
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
@gbggrant do you have a link to your failed run of |
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
f88464f
to
3786d6c
Compare
a9228af
to
2d69f15
Compare
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
scripts/variantstore/variant_annotations_table/GvsValidateVAT.wdl
Outdated
Show resolved
Hide resolved
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv 'SELECT * from | ||
(SELECT contig, position, gvs_all_an, COUNT(DISTINCT gvs_all_an) AS an_count FROM `~{fq_vat_table}` | ||
group by contig, position, gvs_all_an) | ||
where an_count >1' > bq_an_output.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to simplify this and the query below with a HAVING
(i.e. without a nested query) as was done previously.
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv 'SELECT * from | |
(SELECT contig, position, gvs_all_an, COUNT(DISTINCT gvs_all_an) AS an_count FROM `~{fq_vat_table}` | |
group by contig, position, gvs_all_an) | |
where an_count >1' > bq_an_output.csv | |
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv ' | |
SELECT contig, position, gvs_all_an, COUNT(DISTINCT gvs_all_an) AS an_count | |
FROM `~{fq_vat_table}` | |
GROUP BY contig, position, gvs_all_an | |
HAVING an_count > 1 | |
' > bq_an_output.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to make sure that for multiple rows with the same VID, there aren't multiple values of AN (and below the AC!)
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv 'SELECT * from | ||
(SELECT contig, position, vid, gvs_all_ac, COUNT(DISTINCT gvs_all_ac) AS ac_count FROM `~{fq_vat_table}` | ||
group by contig, position, vid, gvs_all_ac) | ||
where ac_count >1' > bq_ac_output.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv 'SELECT * from | |
(SELECT contig, position, vid, gvs_all_ac, COUNT(DISTINCT gvs_all_ac) AS ac_count FROM `~{fq_vat_table}` | |
group by contig, position, vid, gvs_all_ac) | |
where ac_count >1' > bq_ac_output.csv | |
bq query --nouse_legacy_sql --project_id=~{query_project_id} --format=csv ' | |
SELECT contig, position, gvs_all_ac, COUNT(DISTINCT gvs_all_ac) AS ac_count | |
FROM `~{fq_vat_table}` | |
GROUP BY contig, position, gvs_all_ac | |
HAVING ac_count > 1 | |
' > bq_ac_output.csv |
@@ -129,7 +136,7 @@ workflow GvsValidateVat { | |||
SubpopulationMax.pass, | |||
SubpopulationAlleleCount.pass, | |||
SubpopulationAlleleNumber.pass, | |||
ClinvarSignificance.pass, | |||
DuplicateAnnotations.pass, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to replace ClinvarSignificance
with DuplicateAnnotations
? If so, you should get rid of the task itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet---I want to check with Lee about how to handle this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave a comment explaining that ClinvarSignificance has been removed temporarily
Add additional validation around duplicated rows in the VAT
This has a successful run (except for one failure that is because it's being run on way less data)
https://job-manager.dsde-prod.broadinstitute.org/jobs/07ddde58-ac0d-4229-9f96-d093f5c11682
The failed test is:
SpotCheckForAAChangeAndExonNumberConsistency
Perhaps we want to update this to not run this test if there are less than 10k samples?
Yes we do:
Here's the ticket for that:
https://broadworkbench.atlassian.net/browse/VS-878