-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job Add labels to BQ operations from GATK (Issues-199) #7115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good -- comments left within. I think there's still some work to be done in the calling classes, let's work with the following labels:
label -> value
gatk_tool -> FQN of the GATK tool, can be obtained in the tool constructor
gatk_execution_id -> UUID, can be generated in the tool constructor (e.g. ExtractCohortEngine)
(multiple) -> passed in to tool as a (list?) of key-value pairs. We just propagate to the tool. We'll set these in our WDLs to be consistent (ie joint-calling-wdl-id)
I don't think each query itself needs it's own UUID (ie if I call SampleList 5 times, should there be 5 unique ids for that?). I could be convinced otherwise though.
public static TableResult executeQuery(final String queryString, final boolean runQueryInBatchMode) { | ||
return executeQuery(getBigQueryEndPoint(), queryString, runQueryInBatchMode); | ||
|
||
// TODO: add Collections.EMPTY_MAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a todo? or can this be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be deleted
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Outdated
Show resolved
Hide resolved
@@ -384,7 +394,9 @@ public static StorageAPIAvroReader executeQueryWithStorageAPI(final String query | |||
") AS\n" + | |||
queryString; | |||
|
|||
executeQuery(queryStringIntoTempTable, runQueryInBatchMode); | |||
// TODO: add label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to-done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I going to use this from now on. I laughed way too hard at this
src/main/java/org/broadinstitute/hellbender/utils/bigquery/UID.java
Outdated
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtilsUnitTest.java
Outdated
Show resolved
Hide resolved
056d16c
to
659abf4
Compare
659abf4
to
ee01670
Compare
Add labels to Sample list, Extract Cohort, and Extract features
Editing cdode according to the comments
ee01670
to
dc8057d
Compare
Add labels to Sample list, Extract Cohort, and Extract features
Editing cdode according to the comments
dc8057d
to
3d6911f
Compare
Your tests are failing because the labels generated are invalid. Here are the rules for labels: https://cloud.google.com/bigquery/docs/labels-intro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you saw my main comment from the last review (ie not attached to a line of code). Specifically around "label -> value". If this is feeling too big, maybe let's separate this PR into to pieces.
- Changes the BigQueryUtils to support labels, and callers to pass
null
in for the labels - Changes to callers to pass in the right sets of labels
wdyt?
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtils.java
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtilsUnitTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/broadinstitute/hellbender/utils/bigquery/BigQueryUtilsUnitTest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/variantdb/nextgen/ExtractCohortEngine.java
Outdated
Show resolved
Hide resolved
Change labels to null to make the PR focus on BQ labels
…te/gatk into ms_add_labels_to_BQ
Clean up, base of PR comments
@@ -328,8 +328,8 @@ private double getQUALapproxFromSampleRecord(GenericRecord sampleRecord) { | |||
// Non-AS QualApprox (used for qualapprox filter) is simply the sum of the AS values (see GnarlyGenotyper) | |||
if (s.contains("|")) { | |||
|
|||
// take the sum of all non-* alleles | |||
// basically if our alleles are '*,T' or 'G,*' we want to ignore the * part | |||
// take the average of all non-* alleles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this change? I don't think it's an average
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, why that changed
final TableResult result = BigQueryUtils.executeQuery(BigQueryUtils.getBigQueryEndPoint(executionProjectId) , sampleListQueryString, false); | ||
|
||
// Execute the query: | ||
final TableResult result = BigQueryUtils.executeQuery(sampleListQueryString, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this back to the invocation that supplied the projectId, but also includes your null
for the labels. e.g.
final TableResult result = BigQueryUtils.executeQuery(BigQueryUtils.getBigQueryEndPoint(executionProjectId) , sampleListQueryString, false, null);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* Added labels to Big Query Job Add labels to Sample list, Extract Cohort, and Extract features * Update BigQueryUtilsUnitTest.java * Cleaning up code Editing cdode according to the comments * Added labels to Big Query Job Add labels to Sample list, Extract Cohort, and Extract features * Update BigQueryUtilsUnitTest.java * Cleaning up code Editing cdode according to the comments * Change piplines labels to null Change labels to null to make the PR focus on BQ labels * update labels * Clean code Clean up, base of PR comments * Update date labels in test class * Update labels with out ':' and edit comment and code for PR * Remove underscores from labels * Shorten label * test GRADLE * Format code Co-authored-by: Marianie-Simeon <[email protected]>
Addresses
https://github.com/broadinstitute/dsp-spec-ops/issues/199
Commit Summary
-Add labels to Sample list, Extract Cohort, and Extract features
Output
For Sample list, Extract Cohort and Extract features
Testing: