Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use default google cloud project if not supplied? #2828

Closed
max-sixty opened this issue Oct 10, 2020 · 5 comments · Fixed by #2908
Closed

Use default google cloud project if not supplied? #2828

max-sixty opened this issue Oct 10, 2020 · 5 comments · Fixed by #2908
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@max-sixty
Copy link
Contributor

Describe the feature

Currently using BigQuery requires defining your project in profiles.yml: https://docs.getdbt.com/reference/warehouse-profiles/bigquery-profile/

Google Cloud APIs generally fall back to the default project when one isn't specified. This is helpful for code that runs in multiple project environments — it'll reference the datasets in whatever project it's running in.

So the feature would be to align dbt with that standard, and allow for:

my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      # project: [GCP project id] <- uses the current project
      dataset: [the name of your dbt dataset] # You can also use "schema" here
      threads: [1 or more]
      timeout_seconds: 300
      location: US # Optional, one of US or EU
      priority: interactive
      retries: 1

Describe alternatives you've considered

Currently we have something like:

nimbus:
  target: main
  outputs:
    user: main:
      type: bigquery
      method: oauth
      project: "{{ env_var('PROJECT', 'project_foo') }}"

...and set $PROJECT to the result of gcloud config get-value project. This is OK, but some cruft.

(and if I'm missing something and there's any easy solution to this, that would be gratefully received!)

Who will this benefit?

BigQuery users, particularly those running across dev and prod environments

Are you interested in contributing this feature?

Not right now, given my other OSS work, but would be keen to contribute to dbt at some point!

Thank you!

@max-sixty max-sixty added enhancement New feature or request triage labels Oct 10, 2020
@jtcohen6
Copy link
Contributor

Thanks for the detailed proposal, @max-sixty. I think it makes sense to fall back to the default project configured by the gcloud user / service account if it's not specified in profiles.yml.

This isn't a change we would prioritize, and I'm glad to see you have a workaround in the meantime. I imagine it could be quite straightforward. I'll mark this a good first issue, for whenever you (or another community member) has the time.

@jtcohen6 jtcohen6 added bigquery good_first_issue Straightforward + self-contained changes, good for new contributors! and removed triage labels Oct 12, 2020
@max-sixty
Copy link
Contributor Author

I'm happy to have a look into this — any ideas on where to start? I'm not familiar with the code base.

  • Should the default be set when profiles.yml is parsed? Or when it's used? — Probably when it's used, and then caching it — would reduce the performance cost of shelling out; but maybe it's simplest to do at the parsing stage?
  • Where is profiles.yml parsed? What's the python object that holds the results?
  • dbt already pulls the project here — could we use that?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 23, 2020

You're looking in the right place: /plugins/bigquery/dbt/adapters/bigquery/connections.py.

Here is where dbt parses the database (with project as alias) out of the profile:
https://github.com/fishtown-analytics/dbt/blob/e945bca1d9ea4a7e32144bba462bf61655537264/plugins/bigquery/dbt/adapters/bigquery/connections.py#L90-L92

And here is where dbt uses that database value to generate a connection:
https://github.com/fishtown-analytics/dbt/blob/e945bca1d9ea4a7e32144bba462bf61655537264/plugins/bigquery/dbt/adapters/bigquery/connections.py#L214

In between those two is the code you linked to. I think it'd be a fairly straightforward change to add some logic that checks if database is none, and sets it to the user's default project_id accordingly.

@yu-iskw
Copy link
Contributor

yu-iskw commented Jan 29, 2021

@max-sixty @jtcohen6 I have a question about the new way to create a credentials with the oauth method. Before we set the default scopes to the credentials. But, the new way doesn't pass the scopes option of google.auth.default. So, it is impossible to refer to tables whose data sources are spreadsheets on Google Drive because of the lack of the scope for Google Drive.

If I am correct, there is no way to set scopes to a google cloud "account" directly. In addition, there is no way to
I know we can also use other methods, such as service-account. That is, there is no way to pass scopes with the oauth method. So, it would be great to provide a method to pass scopes even with the oauth method. What do you think?

@yu-iskw
Copy link
Contributor

yu-iskw commented Jan 29, 2021

There are two approaches to set scopes to the oauth method, if there is no way to grant scopes to a google cloud account.

  1. Pass the SCOPE constant to google.auth.default for the oauth method too.
  2. Add a sub map to pass scopes to profiles.yml like below.
my-bigquery-db:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: [GCP project id]
      dataset: [the name of your dbt dataset] # You can also use "schema" here
      threads: [1 or more]
      timeout_seconds: 300
      location: US # Optional, one of US or EU
      priority: interactive
      retries: 1
      scopes:
        - https://www.googleapis.com/auth/bigquery
        - https://www.googleapis.com/auth/cloud-platform
        - https://www.googleapis.com/auth/drive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants