-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Job and BatchPredictionJob classes #79
feat: Job and BatchPredictionJob classes #79
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Minor requests.
Hey @tswast , looping you in here for your BigQuery expertise. Context: We're implementing a helper method Here are the lines in question:
PTAL and share your thoughts when you get a chance 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for BigQuery usage. I have some concerns about constructing both Storage and BigQuery clients within a client method, but depending on the performance expectations of this could be considered a minor issue.
google/cloud/aiplatform/jobs.py
Outdated
|
||
# BigQuery Destination, return QueryJob | ||
elif output_info.bigquery_output_dataset: | ||
bq_client = bigquery.Client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about this. Ideally you'd re-use the credentials from uCAIP client.
Also, there's a risk of leaking sockets when you create clients on-the-fly. Not as big a deal for REST clients, but definitely a concern for gRPC clients. googleapis/google-cloud-python#9790 googleapis/google-cloud-python#9457
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bq_client = bigquery.Client() | |
bq_client = bigquery.Client( | |
credentials=self.api_client._transport._credentials | |
) |
^ This change would build a BigQuery Client using the same credentials as uCAIP's JobServiceClient
.
In regards to the leaking sockets, would the solution referenced in that issue work? See below
# Close sockets opened by BQ Client
bq_client._http._auth_request.session.close()
bq_client._http.close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change for credentials LGTM. (Storage should get similar treatment).
It's a little trickier in our case, because we want the client to live for the lifetime of the RowIterator.
Unless you want to convert to full list of rows / pandas dataframe before returning? In which case all the API requests would be made here and we could close the client when done (FWIW, the client does have a close
function in BQ. https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.close)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the credentials change on both BQ and Storage.
Re: closing connections - this is indeed tricky since the method is meant to return an iterator. However your comment made realize a larger issue of us instantiating a GAPIC client for every instance of a high-level SDK object. I'm capturing this in b/174111905.
Will merge this blocking PR for now, thanks for calling this issue out!
…python-aiplatform into batch-prediction-job-class
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
* Create and move all constants to constants.py * Fix tests after constants.py, drop unused vars * Init Job and BatchPredictionJob class, unit tests * Address all reviewer comments * Update docstring to bigquery.table.RowIterator * Get GCS/BQ clients to use same creds as uCAIP
PR for batch prediction method in Model class to follow once this PR and #66 get merged.
Includes the following:
status()
method for Jobiter_outputs()
method which returns either a BQQueryJob
or list of GCSBlob
sjobs.py
google-cloud-bigquery
constants.py
to have single source of truth on re-used constants (i.e. supported parameters, API constants)Fixes b/169783178, b/171074104 🦕