Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] [RFC] attempt to allow impersonation from service account #28

Closed
wants to merge 1 commit into from

Conversation

mistercrunch
Copy link
Contributor

@mistercrunch mistercrunch commented Oct 24, 2018

This is a request for comment. I'm working on impersonation in Apache Superset for Lyft.

Trying to find a way to impersonate a user from a service account. For context Superset exposes a hook allowing mutation of the connection object at runtime, where we can alter connection parameters at runtime while having a handle on the user information.

Found this with_subject method which seems promising. I feel like I'm almost there. When I call it prior to running a query though I get

('unauthorized_client: Client is unauthorized to retrieve access tokens using this method.', '{\n "error" : "unauthorized_client",\n "error_description" : "Client is unauthorized to retrieve access tokens using this method."\n}')

Requesting comments on people at Google that can point us in the right direction around impersonating users from services account.

@tswast @sumedhsakdeo @betodealmeida

@betodealmeida
Copy link

@mistercrunch, take a look at betodealmeida/gsheets-db-api#3

Do you know if the service account you're using has delegation enabled? (See screenshot in the link above)

@tswast
Copy link
Collaborator

tswast commented Oct 24, 2018

@theacodes Do you have any experience with service account delegation and the google-auth library?

@mistercrunch
Copy link
Contributor Author

mistercrunch commented Oct 24, 2018

@betodealmeida idk whether the service account I use has delegation enabled, how can I figure it out? Also would be good to have a more accurate error message if that's the issue.

@theacodes
Copy link

@tswast is this what you're after? https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.service_account.html#domain-wide-delegation

@tswast
Copy link
Collaborator

tswast commented Oct 25, 2018

From what I can tell, you need to be an administrator for a gSuite domain in order for domain-wide delegation to show up as an option on the service account info page.

From a project associated with my personal domain, where I have a gSuite paid account:

delegation-redacted

@tswast
Copy link
Collaborator

tswast commented Oct 25, 2018

@mistercrunch
Copy link
Contributor Author

This is helpful, I will find someone with the master key and get them to check the magic box.

@nishantrayan
Copy link

hey @tswast . @mistercrunch and I tested the proposed solution and the with_subject approach did not work. We created a service_account with domain level delegation and used it with the with_subject=<user_email> and got the same error ('unauthorized_client: Client is unauthorized to retrieve access tokens using this method.', '{\n "error" : "unauthorized_client",\n "error_description" : "Client is unauthorized to retrieve access tokens using this method."\n}')
do you have any other recommendation.

@tswast
Copy link
Collaborator

tswast commented Nov 2, 2018

@nishantrayan That error sounds like the client ID in the service account key file hasn't been added to the Google Admin console for the gSuite domain, yet. See: https://developers.google.com/admin-sdk/directory/v1/guides/delegation#delegate_domain-wide_authority_to_your_service_account

Another option would be to have Superset guide users user through this flow: https://cloud.google.com/bigquery/docs/authentication/end-user-installed We'd have to modify pybigquery to accept a refresh token, client ID, and client secret for google-auth user-based credentials. Ideally Superset would be able to parameterize the connection string for the data source to provide a different refresh credential per user.

@mistercrunch
Copy link
Contributor Author

Just wanted to add that:

  • when we disabled the with_subject call, we were able to use the connector just fine
  • when using my personal account that we were impersonating in the BigQuery UI, I can use it just fine

@tswast
Copy link
Collaborator

tswast commented Nov 9, 2018

I was just able to make impersonation work with BigQuery resources on my personal gsuite domain. The key is that the client ID associated with the service account must be allowed to access all the needed scopes.

Code:

from google.oauth2 import service_account
from google.cloud import bigquery

credentials = service_account.Credentials.from_service_account_file(
    "/Users/swast/keys/my-domain-wide-service-account-key.json")
credentials = credentials.with_subject("[email protected]")

client = bigquery.Client(project='my-project', credentials=credentials)
print(client.query(
    "SELECT country_name "
    "FROM `bigquery-public-data.utility_us.country_code_iso` "
    "WHERE country_name LIKE 'U%'").to_dataframe())

Results before configuring Google Admin:

$ python impersonation.py
Traceback (most recent call last):
  File "impersonation.py", line 9, in <module>
    print(client.query("SELECT country_name FROM `bigquery-public-data.utility_us.country_code_iso` WHERE country_name LIKE 'U%'").to_dataframe())
...
google.auth.exceptions.RefreshError: ('unauthorized_client: Client is unauthorized to retrieve access tokens using this method.', '{\n  "error": "unauthorized_client",\n  "error_description": "Client is unauthorized to retrieve access tokens using this method."\n}')

After configuring Google Admin:
impersonation-client-redacted

# swast @ swast-macbookpro2 in ~/src/scratch/bigquery [10:23:58] C:1
$ python impersonation.py
                           country_name
0                                Uganda
1                            Uzbekistan
2                  United Arab Emirates
3                               Ukraine
4                        United Kingdom
5                         United States
6  United States Minor Outlying Islands
7                               Uruguay

@tswast
Copy link
Collaborator

tswast commented Nov 9, 2018

To get to the page to enter secrets, a domain admin must go to the security section of admin.google.com and click on the advanced section.

admin-advanced

If not enabled, you'll also need to check "Enable API Access" in the Google Admin security panel.

admin-enable-api

Regarding which scopes to use, a list of all the Google scopes is at https://developers.google.com/identity/protocols/googlescopes

Relevant ones you may want include:

  • https://www.googleapis.com/auth/cloud-platform
  • https://www.googleapis.com/auth/bigquery
  • https://www.googleapis.com/auth/spreadsheets
  • https://www.googleapis.com/auth/drive

@tswast
Copy link
Collaborator

tswast commented Mar 7, 2019

What's the current status of this PR? Was impersonation decided to be too risky?

As I said in #28 (comment), I think an end-user auth flow is probably the right way to do this. Originally I thought that meant Superset would need to store refresh tokens, but potentially those could be stored as session data or in localStorage in the user's browser instead.

@sumedhsakdeo
Copy link
Contributor

sumedhsakdeo commented Mar 7, 2019

FWIW, #35 adds scopes to credentials.
@tswast do you think after my PR is merged which relates to scopes,
the issue @mistercrunch was seeing would go away. "('unauthorized_client: Client is unauthorized to retrieve access tokens using this method.', '{\n "error" : "unauthorized_client",\n "error_description" : "Client is unauthorized to retrieve access tokens using this method."\n}')"

@tswast
Copy link
Collaborator

tswast commented Mar 8, 2019

@sumedhsakdeo I actually had an video conference meeting about this PR a while back. Think the issue at that time was the setting in the gSuite Admin I linked to in #28 (comment). You're right that the scopes issue can have a similar error when running queries to drive/sheets.

@tswast
Copy link
Collaborator

tswast commented Sep 24, 2020

Per #42 (comment), it sounds like Superset is pursuing OAuth, which will avoid the need for a superuser service account key that can impersonate anyone.

@tswast tswast closed this Sep 24, 2020
@Asturias-sam
Copy link

@tswast we have big query instances where the access is maintained using google-groups, Is there any way via which we can impersonate user in Superset and users will be allowed to query only the dataset which they have access ?

@tswast
Copy link
Collaborator

tswast commented Feb 2, 2021

@Asturias-sam I proposed a new parameter to the connection string for access_token / refresh_token, but haven't implemented it yet. #42 (comment)

Currently it involves several steps to make this work:

  1. Implement Use OAuth to authenticate against BigQuery #42. As a workaround, you may try setting connect_args in the call to create_engine. Ideally this library would pass through the client you set, though since it creates its own client there may be a conflict here.

  2. The underlying DB-API connect function takes a BigQuery Client as a parameter. https://googleapis.dev/python/bigquery/latest/dbapi.html#google.cloud.bigquery.dbapi.Connection

  3. A BigQuery client can be constructed with a user's credentials by following OAuth 2.0. There are some basic instructions here: https://cloud.google.com/bigquery/docs/authentication/end-user-installed Though it may differ slightly if you are building a web application.

@tswast tswast mentioned this pull request Dec 19, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants