-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BigQuery] Raw token authentication method #2802
Comments
@davehughes I'm broadly supportive of the need you're describing. To clarify what you mean by service token: it sounds like this is a temporary access token generated via oauth, as in generateAccessToken, and that you handle generating and refreshing tokens through a separate automated process. Is that right? The reason I ask is because we're planning to work later this year on supporting BigQuery oauth as a connection mechanism (#2344), with the goal of hooking into GSuite SSO in dbt Cloud. I could be wrong, but it sounds like there's potential overlap between the core work we'd need there and an even more robust version of the auth mechanism you're interested in. |
Yes, we use generateAccessToken to generate the tokens that we'd want to pass in here, and your general understanding is correct. In #2344, dbt still has creds to connect to the remote service and generate/fetch tokens, each of which is used for service auth until the token expires (at which point it can be refreshed with another call to the token_uri). Here, the token is completely opaque - not refreshable, may not have the right scopes, may already be expired, etc. - and it's the job of the external automated process to make sure scopes and expiry are properly set up. dbt would just use the token directly for service auth or die trying. |
😇 @davehughes I checked out the docs that you linked to, and it sure seems like this one is doable. I see:
The thing to watch out here would be the failure mode where a
Can you just share a tiny bit about how you're creating a Credentials object in your proof-of-concept code? Are you creating a google.oauth2.credentials.Credentials object directly, and if so, are you just specifying a If so, then I think this change will indeed pair really well with #2344 (though there are two distinct code changes!). If that sounds about right to you, then we'd be really happy to accept a PR for this one :) |
Hmm...I hadn't seen the doc on the max access token lifetime, which might throw a bit of a wrench in here, though setting up the lifetime extension and choosing an appropriately long lifetime won't be a problem. We don't usually see As for creating a Credentials object, I resorted to just creating a small class that quacks like it should (this is the hacky bit!):
|
Yeah - I think that's fair. I do think that dbt runs which take > 1 hour are not-uncommon, but I think you get to add constraints around things like this for security-minded features! I bet we can swing this implementation using the |
I'd be interested to see that PR! And I'll try to find some time to experiment with |
check out the PR here: #2805 |
@davehughes PR #2805 is going to add support for providing a raw access token in a |
Yep, just pulled that down and confirmed that it works for me with the following settings:
Passing a bad or expired token errors with the Thanks @drewbanin @jtcohen6 for the nice solution. I'll look forward to this landing. 😎 |
Awesome, glad to hear it! |
Describe the feature
The existing auth methods for BigQuery provide a great experience for interactive users, allowing them to transparently provision access tokens in a variety of ways. However, as an automation implementer on a team trying to programmatically wield dbt on behalf of customers, I'd like to be able to bypass these niceties and just inject an externally provisioned token for access.
In our typical non-dbt scenarios, we create service accounts for customers, have them grant specific limited permissions to those accounts for our service's operations, then use our master account to issue scoped tokens for those accounts/operations when they need to execute. When running dbt operations, I'd like to avoid writing our master account credentials file to disk (as required by the
service-account
/service-account-json
methods) to protect against reflected file attacks and/or potential vulnerabilities in dbt itself, which could potentially exfiltrate these creds. While I don't see these as particularly likely scenarios, I'd love to have the ability to directly control the blast radius via smaller scoped credentials.I propose a new BigQuery auth method named 'service-token' and an additional field named
service_token
to provide its value, e.g.:Describe alternatives you've considered
With the addition of the
impersonate_service_account
field added in 0.18.0, I can connect as delegated services, but still need to provide the full master service keyfile rather than a more limited credential.Additional context
This is specific to BigQuery database connections and automation use-cases. I think it's fair to say that standard interactive users would almost never want to use this authentication mode directly.
Who will this benefit?
Are you interested in contributing this feature?
Yes, happy to contribute to making this a reality. I have a simple (if slightly hacky) proof-of-concept that I can polish into a real PR if this seems interesting.
The text was updated successfully, but these errors were encountered: