Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

ederign
Copy link
Member

@ederign ederign commented Nov 27, 2024

Description

In Kubeflow's Central Dashboard, the kubeflow-userid header carries the user's email address for authentication purposes. When a user accesses applications through the dashboard, each request includes this header, allowing the application to identify the user.

On the BFF, we are using kubeflow-userid header to do Authorization via SubjectAccessReview on all endpoints. For all endpoints we only authorize requests from users that can "get", "list" services via ClusterRole.

One thing that we need to double-check with other people from community (@tarilabs please chime in), is that instead of a ClusterRole, we should not do SubjectAccessReview for a given namespace. I've implemented in a 'cluster level' access because my understanding is that model registries access are not restricted by namespace.

So in short, the BFF authorize every user that has get/list for services authorization cluster-wide.

Another important aspect is that for ALL kubernetes APIs calls (including SubjectAccessReview), the BFF is using the service account token for it (after SubjectAccessReview check).

As Kubeflow don't have any bearerToken information, I've removed this piece for codebase. I'm always happy to reintroduce it as soon as have this need, i.e. if any distribution needs to do authentication differently.

In this PR:

  • clients/ui/bff/README.md: added header info and some clarifications on readme;
  • clients/ui/bff/internal/api/app.go: added a middleware that is executed on ALL endpoints to SubjectAccessReview checks;
  • clients/ui/bff/internal/api/errors.go: utility function to deal with forbiddenResponse
  • clients/ui/bff/internal/api/healthcheck_handler.go, clients/ui/bff/internal/models/health_check.go and clients/ui/bff/internal/repositories/health_check.go: added kubeflow-userid info to the healthcheck. This is the only entrypoint that don't do SubjectAccessReview
  • clients/ui/bff/internal/api/middleware.go: removed all code related to bearerToken and also implemented the middleware for access control
  • clients/ui/bff/internal/integrations/http.go: removed bearerToken header that we pass for model registry apis
  • clients/ui/bff/internal/integrations/k8s.go: we now have two clients here, one ControllerRuntimeClient(cached) and KubernetesNativeClient(for operations like SubjectAccessReview). I've also created the PerformSAR method that is the core piece of this PR
  • clients/ui/bff/internal/mocks/k8s_mock.go: on the mock, I've created the default rbac on envtest for [email protected]
  • clients/ui/manifests/user-rbac/kubeflow-dashboard-rbac.yaml: the RBAC of our make kind-deployment test functionality
  • clients/ui/scripts/deploy_kind_cluster.sh: removed the kuberentes token in favor of warning user of the required headers

How Has This Been Tested?

curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/healthcheck                                           (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:18:50 GMT
Content-Length: 102

{
	"status": "available",
	"system_info": {
		"version": "1.0.0"
	},
	"user-id": "[email protected]"
}
curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/model_registry                                        (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:19:26 GMT
Content-Length: 413

{
	"data": [
		{
			"name": "model-registry",
			"displayName": "Model Registry",
			"description": "Model Registry Description"
		},
		{
			"name": "model-registry-bella",
			"displayName": "Model Registry Bella",
			"description": "Model Registry Bella description"
		},
		{
			"name": "model-registry-dora",
			"displayName": "Model Registry Dora",
			"description": "Model Registry Dora description"
		}
	]
}
curl -i -H "kubeflow-userid: shouldnotwork" localhost:4000/api/v1/model_registry                                           (kind-kubeflow-ui/default)
HTTP/1.1 403 Forbidden
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:19:45 GMT
Content-Length: 65

{
	"error": {
		"code": "403",
		"message": "access denied"
	}
}

Using the Header Editor Chrome Extension:

With wrong user name:
Screenshot 2024-11-27 at 4 29 54 PM

With right user name:
Screenshot 2024-11-27 at 4 30 31 PM

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

@ederign
Copy link
Member Author

ederign commented Nov 27, 2024

@Griffin-Sullivan @alexcreasy or @lucferbux, One thing that I didn't do is test with the real model registry (not the mocked one). Can you please do this when you try this PR?

I've tested with kubernetes (not our env test mock) but not inside the cluster (waiting for @Griffin-Sullivan PR)

curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/model_registry                                        (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:23:13 GMT
Content-Length: 154

{
	"data": [
		{
			"name": "model-registry-service",
			"displayName": "Kubeflow Model Registry",
			"description": "An example model registry"
		}
	]
}
```

Signed-off-by: Eder Ignatowicz <[email protected]>
@tarilabs
Copy link
Member

tarilabs commented Nov 29, 2024

One thing that we need to double-check with other people from community (@tarilabs please chime in)

This is governed by the @kubeflow/wg-manifests-leads KF/Platform team, and ought to be cross checked with them, imho.

edit: Slack channel if preferred: https://cloud-native.slack.com/archives/C073W572LA2

my understanding is that model registries access are not restricted by namespace

Indeed because Model Registry is meant as a platform-level service.

@kimwnasptd
Copy link
Member

Happy to help, maybe a thread could help in slack to sort some things out faster. But could also do a review here, if it could help

@juliusvonkohout
Copy link
Member

"model registries access are not restricted by namespace" maybe I misunderstood something, but we definitely need a way to limit access to models. Namespaces can be from different departments that are not allowed to access other departments models. For me that is a core criteria for graduation.

@tarilabs
Copy link
Member

Namespaces can be from different departments that are not allowed to access other departments models.

It is my understanding that if you deploy into a (user) namespace, is considered a tenant, hence you'd have access to registry and metadata of that instance only if indeed you have access to that ns.

It is my understanding also that if you consider model registry a platform service, you'd make said instance available to the whole kubeflow scope.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Dec 1, 2024

Namespaces can be from different departments that are not allowed to access other departments models.

It is my understanding that if you deploy into a (user) namespace, is considered a tenant, hence you'd have access to registry and metadata of that instance only if indeed you have access to that ns.

It is my understanding also that if you consider model registry a platform service, you'd make said instance available to the whole kubeflow scope.

Yes, available to all kubeflow namespaces, but with different permissions per namespace.
One shared instance with ACLs.

@tarilabs
Copy link
Member

tarilabs commented Dec 1, 2024

I'm not sure I fully follow @juliusvonkohout 100%, in such case pardon me.

The main example we have, also in the manifests, is for a common Model Registry to the whole KF platform scope. This is helpful as MR is touted as a single pane of glass and shared platform service. I think this was accepted a while back. As a platform service. I think this is a sensible default example given the scope of MR.

Of course, downstream admins/distros, may opt for a different setup.
For one alternative example (not in KF/manifests) an admin may decide to use the KF ("user") namespace per Org/Team/Unit. In that case a deployment to said ns would make MR available only to those with the appropriate RBAC and supporting config to that ns.
For an alternative example, an admin may decide that a given Persona may have rights to access KF namespace1 and namespace2. In that case, analogous consideration applies.

In summary, I don't see a reason to apply different tenancy design than what is architected in the linked doc already. Hope this clarifies?

@tarilabs
Copy link
Member

tarilabs commented Dec 1, 2024

Also, to substantiate (in the hope it helps) with a picture what I've summarized in #599 (comment)
here is a proposed example diagram:

image

which is my understanding of how the tenancy design from docs is respected wrt to model registry backend deployment(s).

The "default" can be intended as the deployment instance we have in examples already, for the reasons in above #599 (comment)

Nothing stops admin/distros to have additional instances, for instance serving different scope of their orgs at the "platform level" (here pictured as "another", not in the manifest examples).

As mentioned, we comply tenancy design from docs with admin/distros can opt to have one or more instances in their designated KF ("user") namespaces, here in this example diagram since both personas are represented to have access to both t_A and t_B, can accordingly access MR REST service. ie both can also access registries X and Y.
As for another example, admin/distros can opt to have one or more instances in yet another designated KF ("user") namespaces, and since in this example only persona_B is represented to have access t_B are the only one with access with those metadata. ie only them can also access registry Z.

This is wrt Model Registry backend. As mentioned, I encourage the discussion as the deployment model of UI may differ.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Dec 2, 2024

@tarilabs so as far as i understand you plan to have no user isolation at all and no multi-tenancy. Everything is just shared and accessible/abusable from each namespace/department/user

That is strictly against what we have so far in kubeflow where we have a shared instance with proper multi-tenancy support (KFP, notebooks, Kserve, Volumes, Tensorboards, Katib, ...). This way we can have scalable zero-overhead namespaces.

It also sounds a bit contradictory to what is stated in the first post of this PR

@tarilabs
Copy link
Member

tarilabs commented Dec 2, 2024

@tarilabs so as far as i understand you plan to have no user isolation at all and no multi-tenancy. Everything is just shared and accessible/abusable from each namespace/department/user

I'm not sure where you get this understanding?
Per my understanding, if I take the tenant namespace deployments from example in #599 (comment) , those will be isolated to the designated ns.

@juliusvonkohout
Copy link
Member

Per my understanding, if I take the tenant namespace deployments from example in #599 (comment) , those will be isolated to the designated ns.

per namespace overhead is exactly what we want to prevent, that is no shared instance.

@tarilabs
Copy link
Member

tarilabs commented Dec 2, 2024

per namespace overhead is exactly what we want to prevent, that is no shared instance

I'm not following. When I have a KServe workload (Isvc that translates in case of Raw to the Deployment directly), that is deployed to the designated ns, unless I recall this incorrectly (but I'm taking just 1 example) 🤔

@juliusvonkohout
Copy link
Member

per namespace overhead is exactly what we want to prevent, that is no shared instance

I'm not following. When I have a KServe workload (Isvc that translates in case of Raw to the Deployment directly), that is deployed to the designated ns, unless I recall this incorrectly (but I'm taking just 1 example) 🤔

The inferenceservice itself yes. Also for KFP the pipelines. But not the kserve or pipelines controller/shared deployments. Only the actual workload. In the end you probably will have a menu entry in the centraldashboard sidebar and based on the namespace you are in, you can see different things. You should not be able to see things from other peoples namespaces if you are not a member. One shared global "default" space as for example shared Kubeflow Pipelines is also fine. So based on what i have read now we just had a missunderstanding about the shared instance part and the per namespace parts.

@tarilabs
Copy link
Member

tarilabs commented Dec 2, 2024

So based on what i have read now we just had a missunderstanding about the shared instance part and the per namespace parts

that is my impression too, I was a bit caught by surprised based on a previous comment 😅 ; we've complied with the direction of the general design of KF, but we've also taken the feedback based on the specific project used/reused (mlmd, but also about the mysql like per the other thread you noticed, but I'm digressing...).

definitely there are (as always) details for improvements too, which can be discussed/planned accordingly imho

@ederign ederign marked this pull request as draft December 2, 2024 17:19
@ederign
Copy link
Member Author

ederign commented Dec 2, 2024

Created this slack thread to chat about it: https://cloud-native.slack.com/archives/C073W572LA2/p1733158941696939

@lucferbux
Copy link
Contributor

Hi there, I'm leaving here the outcome of the conversation too, so we'll follow the logic we currently have in midstream tweaking it a little bit for kubeflow and the ability to have multi tenancy, this is a temporary solution as we can do some follow ups, but so far for the 1.10 release the plan would be:

  1. We’ll do a SelfSubjectAccessReview to see if the user has cluster admin access to list all services
  2. If so, we’ll list all services with a label selector to filter the ones for Model Registries and we’ll disable de namespace selector since we’ll be getting everything
  3. If not:
    • We’ll enable the namespace selector in kubeflow
    • We’ll parametrize the GET call to support addig the namespace
    • We’ll create a SelfSubjectRulesReview to list the service names a user has access in that namespace
    • Fetch then all those services and filter by label

@ederign
Copy link
Member Author

ederign commented Dec 3, 2024

@lucferbux thanks! I'll implement on this way.

Copy link
Contributor

@lucferbux lucferbux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, we can wait till have the other enhancements or just merge this and do a follow up.

```
# GET /v1/healthcheck
curl -i localhost:4000/api/v1/healthcheck
curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/healthcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan go get the swagger automatically generated, but can you tweak the swagger spec to add that info (meaning that we need that header), try securitySchemes

@@ -32,6 +32,8 @@ npm run build

This is the default context for running a local UI. Make sure you build the project using the instructions above prior to running the command below.

You will need to inject your requests with a kubeflow-userid header for authorization purposes. For example, you can use the [Header Editor](https://chromewebstore.google.com/detail/eningockdidmgiojffjmkdblpjocbhgh) extension in Chrome to set the kubeflow-userid header to [email protected].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the standalone app there's an enhancement im thinking, mocking a login so it can be injected in the header, I can refine that later, but this is just fine.

@@ -0,0 +1,21 @@
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things for this:

  1. admin-rbac.yaml should be a base file now that RBAC by serviceaccount is mandatory.
  2. So user-rbac should only have kubeflo-dashboard-rbac.yaml (I would rename it to something like default-user-role.yaml or something like that.
  3. If you wanna be pro-kustomize, just change kustomization.yaml, adding - ../base as resource, so this is a proper kustomize overlay
  4. If so, change the deploy script to only deploy the overlay and everything will be deployed.

- admin-rbac.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just what I commented in the other review.


# Step 5: Port-forward the service
echo "Port-fowarding Model Registry UI..."
echo "Port-forwarding Model Registry UI..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I thought I fixed it, thanks.

@ederign ederign marked this pull request as ready for review December 4, 2024 16:10
@lucferbux
Copy link
Contributor

/approve

@lucferbux
Copy link
Contributor

/lgtm

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucferbux

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lucferbux
Copy link
Contributor

We can fix those nits in a follow up pr

@google-oss-prow google-oss-prow bot merged commit ea1afd2 into kubeflow:main Dec 4, 2024
16 checks passed
@ederign ederign mentioned this pull request Dec 9, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants