feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

ederign · 2024-11-27T22:21:48Z

Description

In Kubeflow's Central Dashboard, the kubeflow-userid header carries the user's email address for authentication purposes. When a user accesses applications through the dashboard, each request includes this header, allowing the application to identify the user.

On the BFF, we are using kubeflow-userid header to do Authorization via SubjectAccessReview on all endpoints. For all endpoints we only authorize requests from users that can "get", "list" services via ClusterRole.

One thing that we need to double-check with other people from community (@tarilabs please chime in), is that instead of a ClusterRole, we should not do SubjectAccessReview for a given namespace. I've implemented in a 'cluster level' access because my understanding is that model registries access are not restricted by namespace.

So in short, the BFF authorize every user that has get/list for services authorization cluster-wide.

Another important aspect is that for ALL kubernetes APIs calls (including SubjectAccessReview), the BFF is using the service account token for it (after SubjectAccessReview check).

As Kubeflow don't have any bearerToken information, I've removed this piece for codebase. I'm always happy to reintroduce it as soon as have this need, i.e. if any distribution needs to do authentication differently.

In this PR:

clients/ui/bff/README.md: added header info and some clarifications on readme;
clients/ui/bff/internal/api/app.go: added a middleware that is executed on ALL endpoints to SubjectAccessReview checks;
clients/ui/bff/internal/api/errors.go: utility function to deal with forbiddenResponse
clients/ui/bff/internal/api/healthcheck_handler.go, clients/ui/bff/internal/models/health_check.go and clients/ui/bff/internal/repositories/health_check.go: added kubeflow-userid info to the healthcheck. This is the only entrypoint that don't do SubjectAccessReview
clients/ui/bff/internal/api/middleware.go: removed all code related to bearerToken and also implemented the middleware for access control
clients/ui/bff/internal/integrations/http.go: removed bearerToken header that we pass for model registry apis
clients/ui/bff/internal/integrations/k8s.go: we now have two clients here, one ControllerRuntimeClient(cached) and KubernetesNativeClient(for operations like SubjectAccessReview). I've also created the PerformSAR method that is the core piece of this PR
clients/ui/bff/internal/mocks/k8s_mock.go: on the mock, I've created the default rbac on envtest for [email protected]
clients/ui/manifests/user-rbac/kubeflow-dashboard-rbac.yaml: the RBAC of our make kind-deployment test functionality
clients/ui/scripts/deploy_kind_cluster.sh: removed the kuberentes token in favor of warning user of the required headers

How Has This Been Tested?

curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/healthcheck                                           (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:18:50 GMT
Content-Length: 102

{
	"status": "available",
	"system_info": {
		"version": "1.0.0"
	},
	"user-id": "[email protected]"
}

curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/model_registry                                        (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:19:26 GMT
Content-Length: 413

{
	"data": [
		{
			"name": "model-registry",
			"displayName": "Model Registry",
			"description": "Model Registry Description"
		},
		{
			"name": "model-registry-bella",
			"displayName": "Model Registry Bella",
			"description": "Model Registry Bella description"
		},
		{
			"name": "model-registry-dora",
			"displayName": "Model Registry Dora",
			"description": "Model Registry Dora description"
		}
	]
}

curl -i -H "kubeflow-userid: shouldnotwork" localhost:4000/api/v1/model_registry                                           (kind-kubeflow-ui/default)
HTTP/1.1 403 Forbidden
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:19:45 GMT
Content-Length: 65

{
	"error": {
		"code": "403",
		"message": "access denied"
	}
}

Using the Header Editor Chrome Extension:

With wrong user name:

With right user name:

Merge criteria:

All the commits have been signed-off (To pass the DCO check)

The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work.
Code changes follow the kubeflow contribution guidelines.

If you have UI changes

The developer has added tests or explained why testing cannot be added.
Included any necessary screenshots or gifs if it was a UI change.
Verify that UI/UX changes conform the UX guidelines for Kubeflow.

Signed-off-by: Eder Ignatowicz <[email protected]>

ederign · 2024-11-27T22:23:23Z

@Griffin-Sullivan @alexcreasy or @lucferbux, One thing that I didn't do is test with the real model registry (not the mocked one). Can you please do this when you try this PR?

I've tested with kubernetes (not our env test mock) but not inside the cluster (waiting for @Griffin-Sullivan PR)

curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/model_registry                                        (kind-kubeflow-ui/default)
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Wed, 27 Nov 2024 22:23:13 GMT
Content-Length: 154

{
	"data": [
		{
			"name": "model-registry-service",
			"displayName": "Kubeflow Model Registry",
			"description": "An example model registry"
		}
	]
}
```

Signed-off-by: Eder Ignatowicz <[email protected]>

tarilabs · 2024-11-29T10:17:40Z

One thing that we need to double-check with other people from community (@tarilabs please chime in)

This is governed by the @kubeflow/wg-manifests-leads KF/Platform team, and ought to be cross checked with them, imho.

edit: Slack channel if preferred: https://cloud-native.slack.com/archives/C073W572LA2

my understanding is that model registries access are not restricted by namespace

Indeed because Model Registry is meant as a platform-level service.

kimwnasptd · 2024-11-29T15:41:53Z

Happy to help, maybe a thread could help in slack to sort some things out faster. But could also do a review here, if it could help

juliusvonkohout · 2024-11-30T19:10:41Z

"model registries access are not restricted by namespace" maybe I misunderstood something, but we definitely need a way to limit access to models. Namespaces can be from different departments that are not allowed to access other departments models. For me that is a core criteria for graduation.

tarilabs · 2024-11-30T21:39:13Z

Namespaces can be from different departments that are not allowed to access other departments models.

It is my understanding that if you deploy into a (user) namespace, is considered a tenant, hence you'd have access to registry and metadata of that instance only if indeed you have access to that ns.

It is my understanding also that if you consider model registry a platform service, you'd make said instance available to the whole kubeflow scope.

juliusvonkohout · 2024-12-01T09:44:05Z

Namespaces can be from different departments that are not allowed to access other departments models.

It is my understanding that if you deploy into a (user) namespace, is considered a tenant, hence you'd have access to registry and metadata of that instance only if indeed you have access to that ns.

It is my understanding also that if you consider model registry a platform service, you'd make said instance available to the whole kubeflow scope.

Yes, available to all kubeflow namespaces, but with different permissions per namespace.
One shared instance with ACLs.

tarilabs · 2024-12-01T12:33:44Z

I'm not sure I fully follow @juliusvonkohout 100%, in such case pardon me.

The main example we have, also in the manifests, is for a common Model Registry to the whole KF platform scope. This is helpful as MR is touted as a single pane of glass and shared platform service. I think this was accepted a while back. As a platform service. I think this is a sensible default example given the scope of MR.

Of course, downstream admins/distros, may opt for a different setup.
For one alternative example (not in KF/manifests) an admin may decide to use the KF ("user") namespace per Org/Team/Unit. In that case a deployment to said ns would make MR available only to those with the appropriate RBAC and supporting config to that ns.
For an alternative example, an admin may decide that a given Persona may have rights to access KF namespace1 and namespace2. In that case, analogous consideration applies.

In summary, I don't see a reason to apply different tenancy design than what is architected in the linked doc already. Hope this clarifies?

tarilabs · 2024-12-01T15:44:45Z

Also, to substantiate (in the hope it helps) with a picture what I've summarized in #599 (comment)
here is a proposed example diagram:

which is my understanding of how the tenancy design from docs is respected wrt to model registry backend deployment(s).

The "default" can be intended as the deployment instance we have in examples already, for the reasons in above #599 (comment)

Nothing stops admin/distros to have additional instances, for instance serving different scope of their orgs at the "platform level" (here pictured as "another", not in the manifest examples).

As mentioned, we comply tenancy design from docs with admin/distros can opt to have one or more instances in their designated KF ("user") namespaces, here in this example diagram since both personas are represented to have access to both t_A and t_B, can accordingly access MR REST service. ie both can also access registries X and Y.
As for another example, admin/distros can opt to have one or more instances in yet another designated KF ("user") namespaces, and since in this example only persona_B is represented to have access t_B are the only one with access with those metadata. ie only them can also access registry Z.

This is wrt Model Registry backend. As mentioned, I encourage the discussion as the deployment model of UI may differ.

juliusvonkohout · 2024-12-02T15:55:42Z

@tarilabs so as far as i understand you plan to have no user isolation at all and no multi-tenancy. Everything is just shared and accessible/abusable from each namespace/department/user

That is strictly against what we have so far in kubeflow where we have a shared instance with proper multi-tenancy support (KFP, notebooks, Kserve, Volumes, Tensorboards, Katib, ...). This way we can have scalable zero-overhead namespaces.

It also sounds a bit contradictory to what is stated in the first post of this PR

tarilabs · 2024-12-02T15:58:52Z

@tarilabs so as far as i understand you plan to have no user isolation at all and no multi-tenancy. Everything is just shared and accessible/abusable from each namespace/department/user

I'm not sure where you get this understanding?
Per my understanding, if I take the tenant namespace deployments from example in #599 (comment) , those will be isolated to the designated ns.

juliusvonkohout · 2024-12-02T16:00:38Z

Per my understanding, if I take the tenant namespace deployments from example in #599 (comment) , those will be isolated to the designated ns.

per namespace overhead is exactly what we want to prevent, that is no shared instance.

tarilabs · 2024-12-02T16:02:52Z

per namespace overhead is exactly what we want to prevent, that is no shared instance

I'm not following. When I have a KServe workload (Isvc that translates in case of Raw to the Deployment directly), that is deployed to the designated ns, unless I recall this incorrectly (but I'm taking just 1 example) 🤔

juliusvonkohout · 2024-12-02T16:11:11Z

per namespace overhead is exactly what we want to prevent, that is no shared instance

I'm not following. When I have a KServe workload (Isvc that translates in case of Raw to the Deployment directly), that is deployed to the designated ns, unless I recall this incorrectly (but I'm taking just 1 example) 🤔

The inferenceservice itself yes. Also for KFP the pipelines. But not the kserve or pipelines controller/shared deployments. Only the actual workload. In the end you probably will have a menu entry in the centraldashboard sidebar and based on the namespace you are in, you can see different things. You should not be able to see things from other peoples namespaces if you are not a member. One shared global "default" space as for example shared Kubeflow Pipelines is also fine. So based on what i have read now we just had a missunderstanding about the shared instance part and the per namespace parts.

tarilabs · 2024-12-02T16:16:32Z

So based on what i have read now we just had a missunderstanding about the shared instance part and the per namespace parts

that is my impression too, I was a bit caught by surprised based on a previous comment 😅 ; we've complied with the direction of the general design of KF, but we've also taken the feedback based on the specific project used/reused (mlmd, but also about the mysql like per the other thread you noticed, but I'm digressing...).

definitely there are (as always) details for improvements too, which can be discussed/planned accordingly imho

ederign · 2024-12-02T17:20:14Z

Created this slack thread to chat about it: https://cloud-native.slack.com/archives/C073W572LA2/p1733158941696939

lucferbux · 2024-12-03T17:21:34Z

Hi there, I'm leaving here the outcome of the conversation too, so we'll follow the logic we currently have in midstream tweaking it a little bit for kubeflow and the ability to have multi tenancy, this is a temporary solution as we can do some follow ups, but so far for the 1.10 release the plan would be:

We’ll do a SelfSubjectAccessReview to see if the user has cluster admin access to list all services
If so, we’ll list all services with a label selector to filter the ones for Model Registries and we’ll disable de namespace selector since we’ll be getting everything
If not:
- We’ll enable the namespace selector in kubeflow
- We’ll parametrize the GET call to support addig the namespace
- We’ll create a SelfSubjectRulesReview to list the service names a user has access in that namespace
- Fetch then all those services and filter by label

ederign · 2024-12-03T18:00:17Z

@lucferbux thanks! I'll implement on this way.

lucferbux

Some nits, we can wait till have the other enhancements or just merge this and do a follow up.

lucferbux · 2024-12-03T20:12:14Z

clients/ui/bff/README.md

 ```
 # GET /v1/healthcheck
-curl -i localhost:4000/api/v1/healthcheck
+curl -i -H "kubeflow-userid: [email protected]" localhost:4000/api/v1/healthcheck


We plan go get the swagger automatically generated, but can you tweak the swagger spec to add that info (meaning that we need that header), try securitySchemes

lucferbux · 2024-12-03T20:16:16Z

clients/ui/frontend/docs/dev-setup.md

@@ -32,6 +32,8 @@ npm run build

 This is the default context for running a local UI.  Make sure you build the project using the instructions above prior to running the command below.

+You will need to inject your requests with a kubeflow-userid header for authorization purposes. For example, you can use the [Header Editor](https://chromewebstore.google.com/detail/eningockdidmgiojffjmkdblpjocbhgh) extension in Chrome to set the kubeflow-userid header to [email protected].


For the standalone app there's an enhancement im thinking, mocking a login so it can be injected in the header, I can refine that later, but this is just fine.

lucferbux · 2024-12-03T20:30:57Z

clients/ui/manifests/user-rbac/kubeflow-dashboard-rbac.yaml

@@ -0,0 +1,21 @@
+apiVersion: rbac.authorization.k8s.io/v1


A couple of things for this:

admin-rbac.yaml should be a base file now that RBAC by serviceaccount is mandatory.

So user-rbac should only have kubeflo-dashboard-rbac.yaml (I would rename it to something like default-user-role.yaml or something like that.

If you wanna be pro-kustomize, just change kustomization.yaml, adding - ../base as resource, so this is a proper kustomize overlay

If so, change the deploy script to only deploy the overlay and everything will be deployed.

lucferbux · 2024-12-03T20:31:21Z

clients/ui/manifests/user-rbac/kustomization.yaml

Just what I commented in the other review.

lucferbux · 2024-12-03T20:31:41Z

clients/ui/scripts/deploy_kind_cluster.sh


 # Step 5: Port-forward the service
-echo "Port-fowarding Model Registry UI..."
+echo "Port-forwarding Model Registry UI..."


Wow I thought I fixed it, thanks.

lucferbux · 2024-12-04T16:14:04Z

/approve

lucferbux · 2024-12-04T16:14:08Z

/lgtm

google-oss-prow · 2024-12-04T16:14:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucferbux

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~clients/ui/OWNERS~~ [lucferbux]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lucferbux · 2024-12-04T16:14:21Z

We can fix those nits in a follow up pr

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints

079ee23

Signed-off-by: Eder Ignatowicz <[email protected]>

google-oss-prow bot requested review from andreyvelich, Griffin-Sullivan and lucferbux November 27, 2024 22:21

github-actions bot added the Area/UI label Nov 27, 2024

google-oss-prow bot added the size/L label Nov 27, 2024

fixing lint

4371613

Signed-off-by: Eder Ignatowicz <[email protected]>

ederign marked this pull request as draft December 2, 2024 17:19

google-oss-prow bot added the do-not-merge/work-in-progress label Dec 2, 2024

milosjava mentioned this pull request Dec 3, 2024

Data WG roadmap for KF 1.10 kubeflow/community#769

Open

lucferbux mentioned this pull request Dec 3, 2024

Adds ability to change logging verbosity and adds more debug telemetry #594

Merged

7 tasks

lucferbux suggested changes Dec 3, 2024

View reviewed changes

ederign marked this pull request as ready for review December 4, 2024 16:10

google-oss-prow bot removed the do-not-merge/work-in-progress label Dec 4, 2024

google-oss-prow bot requested a review from alexcreasy December 4, 2024 16:10

google-oss-prow bot assigned lucferbux Dec 4, 2024

google-oss-prow bot added approved lgtm labels Dec 4, 2024

google-oss-prow bot merged commit ea1afd2 into kubeflow:main Dec 4, 2024
16 checks passed

tarilabs mentioned this pull request Dec 8, 2024

[pull] main from kubeflow:main opendatahub-io/model-registry#152

Merged

ederign mentioned this pull request Dec 9, 2024

feat(bff): user info endpoint #630

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

ederign commented Nov 27, 2024

ederign commented Nov 27, 2024

tarilabs commented Nov 29, 2024 •

edited

Loading

kimwnasptd commented Nov 29, 2024

juliusvonkohout commented Nov 30, 2024

tarilabs commented Nov 30, 2024

juliusvonkohout commented Dec 1, 2024 •

edited

Loading

tarilabs commented Dec 1, 2024

tarilabs commented Dec 1, 2024

juliusvonkohout commented Dec 2, 2024 •

edited

Loading

tarilabs commented Dec 2, 2024

juliusvonkohout commented Dec 2, 2024

tarilabs commented Dec 2, 2024

juliusvonkohout commented Dec 2, 2024

tarilabs commented Dec 2, 2024

ederign commented Dec 2, 2024

lucferbux commented Dec 3, 2024

ederign commented Dec 3, 2024

lucferbux left a comment

lucferbux Dec 3, 2024

lucferbux Dec 3, 2024

lucferbux Dec 3, 2024

lucferbux Dec 3, 2024

lucferbux Dec 3, 2024

lucferbux commented Dec 4, 2024

lucferbux commented Dec 4, 2024

google-oss-prow bot commented Dec 4, 2024

lucferbux commented Dec 4, 2024

		@@ -32,6 +32,8 @@ npm run build

		This is the default context for running a local UI. Make sure you build the project using the instructions above prior to running the command below.

		You will need to inject your requests with a kubeflow-userid header for authorization purposes. For example, you can use the [Header Editor](https://chromewebstore.google.com/detail/eningockdidmgiojffjmkdblpjocbhgh) extension in Chrome to set the kubeflow-userid header to [email protected].

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

feat(bff): use 'kubeflow-userid' header to authorize BFF endpoints #599

Conversation

ederign commented Nov 27, 2024

Description

How Has This Been Tested?

Merge criteria:

ederign commented Nov 27, 2024

tarilabs commented Nov 29, 2024 • edited Loading

kimwnasptd commented Nov 29, 2024

juliusvonkohout commented Nov 30, 2024

tarilabs commented Nov 30, 2024

juliusvonkohout commented Dec 1, 2024 • edited Loading

tarilabs commented Dec 1, 2024

tarilabs commented Dec 1, 2024

juliusvonkohout commented Dec 2, 2024 • edited Loading

tarilabs commented Dec 2, 2024

juliusvonkohout commented Dec 2, 2024

tarilabs commented Dec 2, 2024

juliusvonkohout commented Dec 2, 2024

tarilabs commented Dec 2, 2024

ederign commented Dec 2, 2024

lucferbux commented Dec 3, 2024

ederign commented Dec 3, 2024

lucferbux left a comment

Choose a reason for hiding this comment

lucferbux Dec 3, 2024

Choose a reason for hiding this comment

lucferbux Dec 3, 2024

Choose a reason for hiding this comment

lucferbux Dec 3, 2024

Choose a reason for hiding this comment

lucferbux Dec 3, 2024

Choose a reason for hiding this comment

lucferbux Dec 3, 2024

Choose a reason for hiding this comment

lucferbux commented Dec 4, 2024

lucferbux commented Dec 4, 2024

google-oss-prow bot commented Dec 4, 2024

lucferbux commented Dec 4, 2024

tarilabs commented Nov 29, 2024 •

edited

Loading

juliusvonkohout commented Dec 1, 2024 •

edited

Loading

juliusvonkohout commented Dec 2, 2024 •

edited

Loading