Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Add labels to BQ operations from GATK (Issues-199) #7115

Merged
merged 16 commits into from
Mar 19, 2021

Conversation

Marianie-Simeon
Copy link

@Marianie-Simeon Marianie-Simeon commented Mar 1, 2021

Addresses

https://github.com/broadinstitute/dsp-spec-ops/issues/199

Commit Summary

-Add labels to Sample list, Extract Cohort, and Extract features

  • Edit Bigquery Utils Testing
  • Add labels as an input to a Bigquery functions
  • Created UID class for a unique identifier

Output

For Sample list, Extract Cohort and Extract features

"query", "Run_SampleTable_<UID>"
"query", "extract_cohort_<UID>"
"query", "extract_features_<UID>"

Testing:

"test_query", "get_all_records_<UID>"
"test_query", "test_where_clause_<UID>"
"test_query", "test_batch_mode_<UID>"
"test_query", "test_specified_execute_query_<UID>"
"test_query", "test_storage_api_<UID>"
"test_query", "test_storage_api_with_empty_dataset_<UID>"

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- comments left within. I think there's still some work to be done in the calling classes, let's work with the following labels:

label -> value
gatk_tool -> FQN of the GATK tool, can be obtained in the tool constructor
gatk_execution_id -> UUID, can be generated in the tool constructor (e.g. ExtractCohortEngine)

(multiple) -> passed in to tool as a (list?) of key-value pairs. We just propagate to the tool. We'll set these in our WDLs to be consistent (ie joint-calling-wdl-id)

I don't think each query itself needs it's own UUID (ie if I call SampleList 5 times, should there be 5 unique ids for that?). I could be convinced otherwise though.

public static TableResult executeQuery(final String queryString, final boolean runQueryInBatchMode) {
return executeQuery(getBigQueryEndPoint(), queryString, runQueryInBatchMode);

// TODO: add Collections.EMPTY_MAP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a todo? or can this be deleted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be deleted

@@ -384,7 +394,9 @@ public static StorageAPIAvroReader executeQueryWithStorageAPI(final String query
") AS\n" +
queryString;

executeQuery(queryStringIntoTempTable, runQueryInBatchMode);
// TODO: add label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to-done?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I going to use this from now on. I laughed way too hard at this

@gatk-bot
Copy link

gatk-bot commented Mar 9, 2021

Travis reported job failures from build 33100
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33100.1 logs
cloud openjdk11 33100.14 logs

Marianie-Simeon added 3 commits March 9, 2021 15:39
Add labels to Sample list, Extract Cohort, and Extract features
Editing cdode according to the comments
@gatk-bot
Copy link

gatk-bot commented Mar 9, 2021

Travis reported job failures from build 33109
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33109.1 logs
cloud openjdk11 33109.14 logs

Marianie-Simeon added 3 commits March 10, 2021 09:14
Add labels to Sample list, Extract Cohort, and Extract features
Editing cdode according to the comments
@kcibul kcibul force-pushed the ms_add_labels_to_BQ branch from dc8057d to 3d6911f Compare March 10, 2021 14:16
@kcibul
Copy link
Contributor

kcibul commented Mar 10, 2021

Your tests are failing because the labels generated are invalid. Here are the rules for labels: https://cloud.google.com/bigquery/docs/labels-intro

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you saw my main comment from the last review (ie not attached to a line of code). Specifically around "label -> value". If this is feeling too big, maybe let's separate this PR into to pieces.

  1. Changes the BigQueryUtils to support labels, and callers to pass null in for the labels
  2. Changes to callers to pass in the right sets of labels

wdyt?

@gatk-bot
Copy link

gatk-bot commented Mar 10, 2021

Travis reported job failures from build 33118
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33118.1 logs
cloud openjdk11 33118.14 logs

Marianie-Simeon added 4 commits March 11, 2021 15:49
Change labels to null to make the PR focus on BQ labels
Clean up, base of PR comments
@gatk-bot
Copy link

gatk-bot commented Mar 11, 2021

Travis reported job failures from build 33170
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk11 33170.14 logs
cloud openjdk8 33170.1 logs

@gatk-bot
Copy link

gatk-bot commented Mar 11, 2021

Travis reported job failures from build 33172
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33172.1 logs
cloud openjdk11 33172.14 logs

@gatk-bot
Copy link

gatk-bot commented Mar 11, 2021

Travis reported job failures from build 33175
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk11 33175.14 logs
cloud openjdk8 33175.1 logs

@@ -328,8 +328,8 @@ private double getQUALapproxFromSampleRecord(GenericRecord sampleRecord) {
// Non-AS QualApprox (used for qualapprox filter) is simply the sum of the AS values (see GnarlyGenotyper)
if (s.contains("|")) {

// take the sum of all non-* alleles
// basically if our alleles are '*,T' or 'G,*' we want to ignore the * part
// take the average of all non-* alleles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change? I don't think it's an average

Copy link
Author

@Marianie-Simeon Marianie-Simeon Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, why that changed

final TableResult result = BigQueryUtils.executeQuery(BigQueryUtils.getBigQueryEndPoint(executionProjectId) , sampleListQueryString, false);

// Execute the query:
final TableResult result = BigQueryUtils.executeQuery(sampleListQueryString, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this back to the invocation that supplied the projectId, but also includes your null for the labels. e.g.

final TableResult result = BigQueryUtils.executeQuery(BigQueryUtils.getBigQueryEndPoint(executionProjectId) , sampleListQueryString, false, null);

@gatk-bot
Copy link

gatk-bot commented Mar 15, 2021

Travis reported job failures from build 33193
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33193.1 logs
cloud openjdk11 33193.14 logs

@gatk-bot
Copy link

gatk-bot commented Mar 15, 2021

Travis reported job failures from build 33210
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk11 33210.14 logs
cloud openjdk8 33210.1 logs

@gatk-bot
Copy link

gatk-bot commented Mar 16, 2021

Travis reported job failures from build 33221
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk11 33221.14 logs
cloud openjdk8 33221.1 logs

Marianie-Simeon added 2 commits March 16, 2021 12:34
@gatk-bot
Copy link

gatk-bot commented Mar 18, 2021

Travis reported job failures from build 33253
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 33253.1 logs
cloud openjdk8 33253.1 logs

@Marianie-Simeon Marianie-Simeon requested a review from kcibul March 18, 2021 17:19
Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@Marianie-Simeon Marianie-Simeon merged commit c7dcac4 into ah_var_store Mar 19, 2021
@Marianie-Simeon Marianie-Simeon deleted the ms_add_labels_to_BQ branch March 19, 2021 18:23
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* Added labels to Big Query Job

Add labels to Sample list, Extract Cohort, and Extract features

* Update BigQueryUtilsUnitTest.java

* Cleaning up code

Editing cdode according to the comments

* Added labels to Big Query Job
Add labels to Sample list, Extract Cohort, and Extract features

* Update BigQueryUtilsUnitTest.java

* Cleaning up code

Editing cdode according to the comments

* Change piplines labels to null

Change labels to null to make the PR focus on BQ labels

* update labels

* Clean code

Clean up, base of PR comments

* Update date labels in test class

* Update labels with out ':' and edit comment and code for PR

* Remove underscores from labels

* Shorten label

* test GRADLE

* Format code

Co-authored-by: Marianie-Simeon <[email protected]>
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants