-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use same auth for Dataproc + GCS #224
Use same auth for Dataproc + GCS #224
Conversation
@jtcohen6 I updated the code a bit and tested locally, everything works well |
) | ||
# Create the job config. | ||
job = { | ||
"placement": {"cluster_name": self.credential.dataproc_cluster_name}, | ||
"pyspark_job": {"main_python_file_uri": "gs://{}/{}".format(self.credential.gcs_bucket, filename)}, | ||
} | ||
operation = job_client.submit_job_as_operation( | ||
operation = self.job_client.submit_job_as_operation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: pass in timeout
+ retry
, if/as provided in profile configs (docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a todo in the main ticket to keep track of it, I think we can close this one and do it as a separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could we just use Google's own Retry()
decorator for this? and plumb through user-provided configurations if provided, but in order to have sensible defaults. See: #231
@@ -863,33 +863,41 @@ def __init__(self, credential): | |||
credential (_type_): _description_ | |||
""" | |||
self.credential = credential | |||
|
|||
self.GoogleCredentials = BigQueryConnectionManager.get_credentials(credential) | |||
self.storage_client = storage.Client(project=self.credential.database, credentials=self.GoogleCredentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also raise a clear validation error if the user forget to install the [dataproc]
extras
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you feel like it is actually better to have a enable_python
flag in dbt_project.yml
to control all logic/import related to python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in Slack - let's hold off on doing this for now, but I'd be curious to hear if it comes up in beta feedback!
resolves CT-861 (just opened)
Try using the same authentication methods for Dataproc + GCS that we're using for BigQuery, so we don't need to rely on the
GOOGLE_APPLICATION_CREDENTIALS
env var being set.I still haven't been able to get this running on my local machine, unfortunately.
job_client.submit_job_as_operation
returns: