Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth0 for Authentication on all hubs? #611

Open
scottyhq opened this issue May 18, 2020 · 25 comments
Open

Auth0 for Authentication on all hubs? #611

scottyhq opened this issue May 18, 2020 · 25 comments

Comments

@scottyhq
Copy link
Member

Thanks @rabernat for prototyping authentication with Auth0 - which seems to be working as of #606 with this config (https://github.com/pangeo-data/pangeo-cloud-federation/compare/3d0aee8..ddc49c4?diff=unified).

I like the idea of using Auth0, since it seems like it will simplify user management and also if I understand correctly, could be used to map user names to cloud provider temporary credentials (https://auth0.com/docs/integrations/aws/tokens). That last piece will simplify issues such as creating per-user temporary buckets (#610).

A few points to clarify first though.

  1. Should each hub setup its own auth0 account?
  2. I'd like to keep Github organization whitelists. As far as I can tell the existing config drops that, but it should be possible moving this configuration to auth0 https://github.com/jupyterhub/oauthenticator/blob/master/oauthenticator/github.py#L171
@rabernat
Copy link
Member

  1. Should each hub setup its own auth0 account?

No, I think not. We should have one master user list. And then we should use roles to define who can access what.

2. I'd like to keep Github organization whitelists. As far as I can tell the existing config drops that, but it should be possible moving this configuration to auth0

I think we need to discuss more generally what is going to be our access policy going forward.

A further question is whether we can continue to link globus credentials to user accounts if we move to auth0.

@TomAugspurger
Copy link
Member

@rabernat what's the latest on auth for staging.hub.pangeo.io? Seeing 400 errors when I try to log in

In the hub logs:

400 GET /hub/oauth_callback?error=unauthorized&error_description=Access%20denied.&state=[secret]

@rabernat
Copy link
Member

I have not changed anything recently. It was working when I set it up. One thing that may have changed is that our pro trial expired and we are not on a free account. But I didn't think we were using any non-free features.

I've added you as an admin on the auth0 account in case you want to poke around.

@rabernat
Copy link
Member

I am seeing this in the auth0 logs:

{
  "date": "2020-06-10T13:05:08.791Z",
  "type": "f",
  "description": "Access denied.",
  "connection": "github",
  "connection_id": "con_y7f3b7QKRj9bchj5",
  "client_id": "JSZSCh5HjiYcBqOHA1b5Q4LH2AlLxuw2",
  "client_name": "hub.pangeo.io",
  "ip": "173.17.254.127",
  "user_agent": "Firefox 77.0.0 / Mac OS X 10.14.0",
  "details": {
    "body": {},
    "qs": {
      "code": "323abe26a61c5a6196ea",
      "state": "gxyVZl9Kl_bj8tBaUDMCFOWOQd0deWjr"
    },
    "connection": "github",
    "error": {
      "message": "Access denied.",
      "oauthError": "unauthorized",
      "type": "oauth-authorization"
    },
    "session_id": "S-poVs2OQlCU-zLAupHTK2OPPdkXRgsz"
  },
  "hostname": "pangeo.auth0.com",
  "user_id": "github|1312546",
  "user_name": "[email protected]",
  "strategy": "github",
  "strategy_type": "social",
  "audience": "https://pangeo.auth0.com/userinfo",
  "scope": [
    "openid",
    "profile",
    "email"
  ],
  "log_id": "90020200610130509863000208403678249960113713533213999186",
  "_id": "90020200610130509863000208403678249960113713533213999186",
  "isMobile": false
}

This suggests the error is on the github side.

@rabernat
Copy link
Member

Poking around, I saw that we have two github OAuth apps connected to Auth0:

image

Is this really necessary?

@TomAugspurger
Copy link
Member

Not sure.

Just noting that

OAUTH_CALLBACK_URL: "https://staging.hub.pangeo.io/hub/oauth_callback"
is probably incorrect for dev-prod, since it's URL is hub.pango.io.

@scottyhq
Copy link
Member Author

While moving the aws-uswest2 auth to Auth0, I set up access restriction by GitHub Org. This is done with an environment variable under 'advanced settings' for each Auth0 App:

Screenshot 2020-06-10 09 37 52

I'm still able to log into staging.hub.pangeo.io.... @TomAugspurger - make sure 1) you're logged into your standard github account in the browser you're using. 2) you might need to set your pangeo-data github membership to 'public'. 3) potentially clear your browser history.

@rabernat
Copy link
Member

  1. you might need to set your pangeo-data github membership to 'public'

This should only be necessary for the org whitelist no?

I see the org whitelist as unnecessary now that we have an additional layer of user management capabilities. I'm curious what you see as the value of using org whitelists.

@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 10, 2020 via email

@rabernat
Copy link
Member

I just tried to log in to staging.hub.pangeo.io and got

400 : Bad Request
OAuth error: Access denied.

@scottyhq
Copy link
Member Author

Apologies, turns out I left an Auth0 'Rule' enabled while experimenting with AWS permissions that was blocking login attempts on https://staging.hub.pangeo.io. Seems to be fixed now. For future Rules we can add code to the top to make them only apply to specific applications (https://auth0.com/rules/simple-user-whitelist-for-app).

If you want any github user to be able to sign in, just remove the REQUIRED_GITHUB_ORGS variable under 'Application Metadata' for the hub.pangeo.io application.

I prefer keeping that setting because we want to limit the number of users on our AWS-infrastructure for the time being. This isn't ideal because it is invite-only and requires admins to add members to github orgs, but has worked okay over the last year to manage our credit budget and time. That said, I'd prefer new github login attempts to be accompanied by a form that requests additional information (e.g. 'research goals', 'dataset of interest', 'computing needs'), and these new users could be approved with the click of a button by administrators...

@rabernat
Copy link
Member

If you want any github user to be able to sign in, just remove the REQUIRED_GITHUB_ORGS variable under 'Application Metadata' for the hub.pangeo.io application.

@scottyhq - This doesn't make sense to me. Application metadata (i.e. app_metadata) is assigned on a per-user basis, not per application.

I'm working on auth for the new GCP cluster in #622

@rabernat
Copy link
Member

Now that we have had a chance to play around with auth0, we should establish some standard practices across our hubs.

I think we want basically the same thing for all the hubs--the ability to gather more information about the users when they log in, and the ability to control who has access.

Some questions:

  • Do we want one auth0 "application" per hub? Or can we get by with one application for all the hubs (plus fancy rules)
  • How should we manage access? Should we use auth0 "Roles"? I would prefer not to rely on github groups
  • How do we incorporate a manual approval process into the creation of new accounts?

@rabernat
Copy link
Member

Summary from some discussion with @jhamman yesterday.

We want to find the simplest way to exercise more control over user privileges across our clusters. I thought about it and think the following could work:

  • All users have to fill out a google form which populates a spreadsheet
  • We configure the form to an email to a list of admins
  • There is an extra column for each cluster (e.g. us-central1-b) which us admins manually check to approve the user
  • We create a very simple API to query this spreadsheet (could even run in the same heroku account as pangeo-gallery-bot)
  • We call this API in an Auth0 rule, as described in this example and assign users an appropriate auth0 role based on the response
  • We have another rule that matches roles to cluster login rights

I think we can make this work. I can create the API if @scottyhq can create the rules.

@scottyhq
Copy link
Member Author

This doesn't make sense to me. Application metadata (i.e. app_metadata) is assigned on a per-user basis, not per application.

That is per-user 'app_metadata'. There is also 'Application Metadata' under the application advanced settings:

image

@scottyhq
Copy link
Member Author

@rabernat and @jhamman . Thanks for discussing an approach to unified Auth. Some thoughts:

I think we can make this work. I can create the API if @scottyhq can create the rules.

I like the approach outlined, but unfortunately I don't want to spend time on this. This task is better suited to somebody with Javascript programming experience (rules are javascript), and it seems like this could be a big enough task to merit having someone dedicated to hub administration and user management. That said, I do think it would be great to have a straightforward user database/spreadsheet with queryable access and a form that users can request access with (maybe with a 6 month expiration?)! I'd suggest trying to implement something like this on a test hub so as to not interfere with currently active hubs.

@rabernat
Copy link
Member

rabernat commented Jun 30, 2020

We cannot afford to leave these hubs open to the general public any more. So we need a creative solution here that doesn't require any work.

edit: I guess that is the github org option.

@rabernat
Copy link
Member

rabernat commented Jul 1, 2020

I'm going to try to get auth0 to recognize specific github teams

@rabernat
Copy link
Member

rabernat commented Jul 2, 2020

I believe I have successfully configured the gcp hub to use github teams via auth0. To log on to the clusters, you need to be a member of @pangeo-data/us-central1-b-gcp. Can someone besides me try to check that this is the case.

Edit: I have also verified that membership does not have to be public. This was a stumbling block in the previous setup.

I made this work by creating a rule similar to @scottyhq's rule for github orgs. It uses a piece of app_metadata called roles. The value of roles is a list of github teams.

I called this rule "Github Team Membership Whitelist". It looks like this

// Auth0 custom rule
// Check if the user is a member of all required Github organizations
//
// This rule need the following configurations values 
// REQUIRED_GITHUB_TEAMS: github teams logins (fmt 'org-name/team-name', coma separated)
//
// Note: This rule should be used in conjuction with `add-github-orgs-to-user-meta.js`
//       Be sure to setup it to run after the first one.

function (user, context, callback) {
    if (
        // pass if provider is not github
        context.connectionStrategy !== 'github' ||
        // or if no org configured 
        !context.clientMetadata.REQUIRED_GITHUB_TEAMS
    ) {
        return callback(null, user, context);
    }

    const requiredTeams = context.clientMetadata.REQUIRED_GITHUB_TEAMS.split(',');

    if (
        user.app_metadata && user.app_metadata.roles &&
        requiredTeams.some(r => user.app_metadata.roles.includes(r))
    ) {
        return callback(null, user, context);
    }

    return callback(new UnauthorizedError('Access denied.'));
}

The team membership is populated in the other rule "Add GitHub Organization Team Membership to Application Metadata"

// https://gravitational.com/blog/aws-github-sso/
// Uses first org in REQUIRED_GITHUB_ORGS application list?
function (user, context, callback) {
  // access token to talk to github API
  var github = user.identities.filter(function (id){
     return id.provider === 'github';
  })[0];
  var access_token = github.access_token;
  request.get({
      url: "https://api.github.com/user/teams",
      headers: {
        // use token authorization to talk to github API
        "Authorization": "token "+access_token,
        // Remember the Application name registered in github?
        // use it to set User-Agent or request will fail
        "User-Agent": "Auth0",
      }
    }, function(err, res, data){
        user.err = err;
        if(data){
          // extract github team names to array
          var github_teams = JSON.parse(data).map(function(team){
            return team.organization.login + "/" + team.slug;
          });
          // add teams to the application metadata
          user.app_metadata = user.app_metadata || {};
          // update the app_metadata that will be part of the response
          user.app_metadata.roles = github_teams;

          // persist the app_metadata update
          auth0.users.updateAppMetadata(user.user_id, user.app_metadata)
          .then(function(){
              callback(null, user, context);
          })
          .catch(function(err){
              callback(err);
          });
        }
    });
 }

@TomAugspurger
Copy link
Member

I just got a 400, which hopefully is the desired outcome?



400 : Bad Request

OAuth error: Access denied.

Hopefully we can customize that to tell people how to join.

@rabernat
Copy link
Member

rabernat commented Jul 2, 2020

Yes, and you should now be able to log in.

@scottyhq
Copy link
Member Author

scottyhq commented Jul 2, 2020

@rabernat - I like the approach to using Teams, but for the AWS hub we're using multiple github orgs. The rule as-is I think assumes everyone is under pangeo-data. As described now I think we should add per-application rules as described here (https://auth0.com/rules/simple-user-whitelist-for-app):

  // only enforce for us-central1-b.gcp.pangeo.io
  // bypass this rule for all other apps
  if(context.clientName !== 'us-central1-b.gcp.pangeo.io'){
    return callback(null, user, context);
  }

@rabernat
Copy link
Member

rabernat commented Jul 3, 2020

The rule as-is I think assumes everyone is under pangeo-data.

Which rule are you referring to here? My teams rule? AFAIK it does not make that assumption. The team is a combination of org-name/team-name.

@scottyhq
Copy link
Member Author

scottyhq commented Jul 3, 2020

@rabernat - yes, your new rule adds all team memberships for a user to app_metadata. The problem was that REQUIRED_GITHUB_TEAMS = pangeo-data/us-central1-b-gcp was also added to the aws-uswest2.pangeo.io application, blocking people who tried to login yesterday. I'd rather not create a pangeo-data/us-west-2-aws team for the time being, so I just removed the environment variable under the aws hub application.

@rabernat
Copy link
Member

rabernat commented Jul 3, 2020

The problem was that REQUIRED_GITHUB_TEAMS = pangeo-data/us-central1-b-gcp was also added to the aws-uswest2.pangeo.io application

This must have just been a total mistake on my part. So sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants