-
Notifications
You must be signed in to change notification settings - Fork 211
feat(mvp): enabling service mesh for ODH #818
feat(mvp): enabling service mesh for ODH #818
Conversation
I'm not doing a full review, I just checked the description of the PR and I would like to better understand this point. Can you please clarify the implication of this? For example is it possible to create two different |
@danielezonca - The implication of this point was pointing out the limitation that when using this PR/manifests, the KfDef can only be created in the namespace As for:
I believe it is (could be wrong on this - experts please correct me if so) - although it is worth noting that currently, service mesh integration does not support multiple KfDefs in one cluster. |
@danielezonca is it a requirement or a supported scenario? I am wondering if you can have multiple controllers supporting this - as you will end up with e.g. nb ctrl deployed for both namespaces but watching for the same resources cluster-wide. |
Yes I think this usually is up to the user so we will probably need to find some alternative solution
I think it is supported and expected to work but I will let others to confirm because I'm not 100% sure :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following instructions in enabling_ossm.md
in a fresh cluster. It fails with the following status in the KfDef:
Status:
Conditions:
Last Update Time: 2023-06-07T23:29:05Z
Reason: (kubeflow.error): Code 500 with message: kfApp Apply failed for kustomize: (kubeflow.error): Code 500 with message: error evaluating kustomization manifest for control-plane: error mapping rest config no matches for kind "ServiceMeshControlPlane" in version "maistra.io/v2"
Status: True
Type: Degraded
Last Update Time: 2023-06-07T23:29:05Z
Reason: Kubeflow Deployment completed
Status: True
Type: Available
EDIT: So, for whatever reason, I was running createSubscription
commands in different order. So, I ran first operator-sdk run bundle quay.io/cgarriso/opendatahub-operator-bundle:dev-0.0.2 --namespace openshift-operators --timeout 5m0s
, then I ran createSubscription "servicemeshoperator"
. I ended up with the following message in the OS console:
So, for now, invoking operator-sdk run
should be the last one to allow proper installation of the other things.
EDIT: After everything is installed and up, when trying to access the dashboard I just see a blank page. Authorino logs have:
{"level":"info","ts":1686182958.5667002,"logger":"authorino.service.auth","msg":"incoming authorization request","request id":"5be7790e-3986-9ff7-97d2-4ebbb4f2149b","object":{"source":{"address":{"Address":{"SocketAddress":{"address":"10.128.10.19","PortSpecifier":{"PortValue":34824}}}}},"destination":{"address":{"Address":{"SocketAddress":{"address":"10.128.8.105","PortSpecifier":{"PortValue":8443}}}},"principal":"apps-crc.testing"},"request":{"http":{"id":"5be7790e-3986-9ff7-97d2-4ebbb4f2149b","method":"GET","path":"/","host":"opendatahub.apps.ehernand-tsdemo.osapps.com","scheme":"https"}}}}
{"level":"info","ts":1686182958.5893471,"logger":"authorino.service.auth","msg":"outgoing authorization response","request id":"5be7790e-3986-9ff7-97d2-4ebbb4f2149b","authorized":false,"response":"PERMISSION_DENIED","object":{"code":7,"message":"Unauthorized"}}
Maybe the principal
not matching the host in the request is an issue?
Thanks for review @israel-hdez. As for the authorino logs - I will have a look. Haven't seen this one before. Perhaps you could share a bit more details of what's your exact environment? That might help me reproducing it. |
It was a fresh OSD cluster. I was following the instructions and had no luck. |
service-mesh/control-plane/base/resource-templates/oauth-configmap.yaml
Outdated
Show resolved
Hide resolved
UpdateWith the 8ca7a6d we changed the approach of overwriting upstream settings (such as In addition, How to use itSample `KfDef` manifestapiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
name: odh-mesh
spec:
applications:
- kustomizeConfig:
parameters:
- name: namespace
value: istio-system
repoRef:
name: manifests
path: service-mesh/control-plane
name: control-plane
- kustomizeConfig:
overlays:
- service-mesh
repoRef:
name: manifests
path: odh-common
name: odh-common
- kustomizeConfig:
overlays:
- service-mesh
- dev
repoRef:
name: manifests
path: odh-dashboard
name: odh-dashboard
- kustomizeConfig:
overlays:
- service-mesh
repoRef:
name: manifests
path: odh-notebook-controller
name: odh-notebook-controller
- kustomizeConfig:
repoRef:
name: manifests
path: odh-project-controller
name: odh-project-controller
- kustomizeConfig:
repoRef:
name: manifests
path: notebook-images
name: notebook-images
- kustomizeConfig:
parameters:
- name: namespace
value: auth-provider
repoRef:
name: manifests
path: service-mesh/authorino
name: authorino
repos:
- name: manifests
uri: https://github.com/maistra/odh-manifests/tarball/service-mesh-integration
version: service-mesh-integration ImportantLoose ends(pretty much all that is mentioned in the PR description), but worth noting are:
About manifests structureOne important thing for all the manifest files which are going to be used in
Please take note of the comments - |
@harshad16 PTAL #818 (comment) |
Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Cameron Garrison <[email protected]> Co-authored-by: Wambugu “Innocent” Kironji <[email protected]> This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components. To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps. Notes: - Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this. - We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources. - We do not support a migration path for pre-existing ODH deployments yet. - We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible? - We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit.
re-add mistakenly removed cert secret
move secret to correct kustomization section
fix jq download link
Co-authored-by: Cameron Garrison <[email protected]>
Change from creating hmac and token secrets for oauth2 secrets
e3c1680
to
5a813fd
Compare
They are optionally fetched from a config map, otherwise controller uses defaults
@LaVLaS Someone in odh slack once told me this is not supported. I don't know why exactly, but I suppose having multiple instances of central software like notebook controller, odh notebook controller, dashboard could create problems. Of course, if that is so, then the question is why opendatahub-operator in namespace openshift-operators is not running in singleNamespace mode, but in mode "allNamespaces" when it comes to e.g. KfDef evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review in progress
a small change suggested for time being
With c095a34 we do not have namespaces hardcoded anymore. |
In specific cluster setups (e.g. ROSA), the audience for token review is configured not to be
/cc @harshad16 Related ticket: https://issues.redhat.com/browse/OSSM-4356 |
One downside of the current approach is that instead of applying patches for the SMCP we are installing a new control plane. This won't work for everyone - e.g. on ROSA you might need to set We are changing this approach to apply patches that are relevant for us instead and starting from the assumption you have Related ticket: https://issues.redhat.com/browse/OSSM-4359 |
In 313cba2 @israel-hdez brings basic support of ModelMesh |
* add ds proj migration to init job
In 312c623 we introduce migrating pre-existing data science projects into the mesh in our init-job. |
Co-authored-by: Cameron Garrison <[email protected]>
c99a385 changes the approach to patching SMCP instead of creating one. Therefore you should install SMCP for your particular cluster setup first (we provided basic one in the doc) before applying |
f47da8f addresses a concern related to the use of a wildcard "*" in the hosts field of the ODH-dashboard's Gateway and VirtualService resources. By using the wildcard, traffic from unintended sources could potentially be routed to our services. This is now narrowed to the host of the dashboard instead. |
@bartoszmajsak This deployment of ServiceMesh with the dashboard worked as intended. I did have an issue with the dashboard being able to spawn a notebook but I haven't had a chance to check any logs to see where the issue is located. Outside of that we should be free to merge this into the feature branch and iterate from there for anyone needing to do a dev deployment of ODH w/ ServiceMesh The current layout of the manifests, initJob and code changes provide a good template for what we need to port over to the new odh operator codebase. |
Thanks for validating it @LaVLaS.
We will have a look at this. Can you provide more details on how exactly you tried to spawn a notebook and what were the symptoms? |
@harshad16 I would like to "rebase and merge" this work, but since you said that your review is in progress I wanted to check if there's more feedback you could share. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for waiting on my review
In the next pr , we can switch to image based on notebook-controller:
https://github.com/opendatahub-io/kubeflow/releases/tag/v1.6.1-3
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LaVLaS The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0069472
into
opendatahub-io:service-mesh-integration
* feat(mvp): enabling service mesh for ODH Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Cameron Garrison <[email protected]> Co-authored-by: Wambugu “Innocent” Kironji <[email protected]> This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components. To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps. Notes: - Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this. - We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources. - We do not support a migration path for pre-existing ODH deployments yet. - We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible? - We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit. * chore: re-add mistakenly removed cert secret (#11) re-add mistakenly removed cert secret * fix: move secret to correct kustomization section (opendatahub-io#14) move secret to correct kustomization section * update OWNERS, add aslak * fix: update jq download link (#15) fix jq download link * use service-mesh-integration branch for kfdef * fix: corrects kfdef file name in the docs * fix(manifests): switches to use patches and overlays (opendatahub-io#18) Co-authored-by: Cameron Garrison <[email protected]> * fix: create hmac and token secrets as secrets rather than cfgmap (opendatahub-io#19) Change from creating hmac and token secrets for oauth2 secrets * chore(init-job): adds annotation to root namespace (#21) * feat: adds service-mesh env variables They are optionally fetched from a config map, otherwise controller uses defaults * chore: creates service-mesh specific OWNERS file * fix: resolves ns in init-job (#22) * fix: extracts port and raw host (opendatahub-io#23) * fix(job): defaults OAUTH_PORT to 443 if not defined in issuer * chore: removes obsolete route (#24) * feat: basic support of ModelMesh (opendatahub-io#17) * feat: migrate existing DS project namespaces to enable SM (#25) * add ds proj migration to init job * feat: patches SMCP instead of creating one (#26) Co-authored-by: Cameron Garrison <[email protected]> * fix: specifies host in ODH dashboard gateway and virtualservice (opendatahub-io#27) --------- Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Edgar Hernández <[email protected]>
* feat(mvp): enabling service mesh for ODH Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Cameron Garrison <[email protected]> Co-authored-by: Wambugu “Innocent” Kironji <[email protected]> This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components. To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps. Notes: - Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this. - We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources. - We do not support a migration path for pre-existing ODH deployments yet. - We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible? - We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit. * chore: re-add mistakenly removed cert secret (#11) re-add mistakenly removed cert secret * fix: move secret to correct kustomization section (opendatahub-io#14) move secret to correct kustomization section * update OWNERS, add aslak * fix: update jq download link (#15) fix jq download link * use service-mesh-integration branch for kfdef * fix: corrects kfdef file name in the docs * fix(manifests): switches to use patches and overlays (opendatahub-io#18) Co-authored-by: Cameron Garrison <[email protected]> * fix: create hmac and token secrets as secrets rather than cfgmap (opendatahub-io#19) Change from creating hmac and token secrets for oauth2 secrets * chore(init-job): adds annotation to root namespace (#21) * feat: adds service-mesh env variables They are optionally fetched from a config map, otherwise controller uses defaults * chore: creates service-mesh specific OWNERS file * fix: resolves ns in init-job (#22) * fix: extracts port and raw host (opendatahub-io#23) * fix(job): defaults OAUTH_PORT to 443 if not defined in issuer * chore: removes obsolete route (#24) * feat: basic support of ModelMesh (opendatahub-io#17) * feat: migrate existing DS project namespaces to enable SM (#25) * add ds proj migration to init job * feat: patches SMCP instead of creating one (#26) Co-authored-by: Cameron Garrison <[email protected]> * fix: specifies host in ODH dashboard gateway and virtualservice (opendatahub-io#27) --------- Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Edgar Hernández <[email protected]>
* feat(mvp): enabling service mesh for ODH Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Cameron Garrison <[email protected]> Co-authored-by: Wambugu “Innocent” Kironji <[email protected]> This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components. To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps. Notes: - Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this. - We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources. - We do not support a migration path for pre-existing ODH deployments yet. - We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible? - We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit. * chore: re-add mistakenly removed cert secret (maistra#11) re-add mistakenly removed cert secret * fix: move secret to correct kustomization section (maistra#14) move secret to correct kustomization section * update OWNERS, add aslak * fix: update jq download link (maistra#15) fix jq download link * use service-mesh-integration branch for kfdef * fix: corrects kfdef file name in the docs * fix(manifests): switches to use patches and overlays (maistra#18) Co-authored-by: Cameron Garrison <[email protected]> * fix: create hmac and token secrets as secrets rather than cfgmap (maistra#19) Change from creating hmac and token secrets for oauth2 secrets * chore(init-job): adds annotation to root namespace (maistra#21) * feat: adds service-mesh env variables They are optionally fetched from a config map, otherwise controller uses defaults * chore: creates service-mesh specific OWNERS file * fix: resolves ns in init-job (maistra#22) * fix: extracts port and raw host (maistra#23) * fix(job): defaults OAUTH_PORT to 443 if not defined in issuer * chore: removes obsolete route (maistra#24) * feat: basic support of ModelMesh (maistra#17) * feat: migrate existing DS project namespaces to enable SM (maistra#25) * add ds proj migration to init job * feat: patches SMCP instead of creating one (maistra#26) Co-authored-by: Cameron Garrison <[email protected]> * fix: specifies host in ODH dashboard gateway and virtualservice (maistra#27) --------- Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Edgar Hernández <[email protected]>
* feat(mvp): enabling service mesh for ODH Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Cameron Garrison <[email protected]> Co-authored-by: Wambugu “Innocent” Kironji <[email protected]> This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components. To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps. Notes: - Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this. - We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources. - We do not support a migration path for pre-existing ODH deployments yet. - We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible? - We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit. * chore: re-add mistakenly removed cert secret (#11) re-add mistakenly removed cert secret * fix: move secret to correct kustomization section (#14) move secret to correct kustomization section * update OWNERS, add aslak * fix: update jq download link (#15) fix jq download link * use service-mesh-integration branch for kfdef * fix: corrects kfdef file name in the docs * fix(manifests): switches to use patches and overlays (#18) Co-authored-by: Cameron Garrison <[email protected]> * fix: create hmac and token secrets as secrets rather than cfgmap (#19) Change from creating hmac and token secrets for oauth2 secrets * chore(init-job): adds annotation to root namespace (#21) * feat: adds service-mesh env variables They are optionally fetched from a config map, otherwise controller uses defaults * chore: creates service-mesh specific OWNERS file * fix: resolves ns in init-job (#22) * fix: extracts port and raw host (#23) * fix(job): defaults OAUTH_PORT to 443 if not defined in issuer * chore: removes obsolete route (#24) * feat: basic support of ModelMesh (#17) * feat: migrate existing DS project namespaces to enable SM (#25) * add ds proj migration to init job * feat: patches SMCP instead of creating one (#26) Co-authored-by: Cameron Garrison <[email protected]> * fix: specifies host in ODH dashboard gateway and virtualservice (#27) --------- Co-authored-by: Bartosz Majsak <[email protected]> Co-authored-by: Edgar Hernández <[email protected]>
Description
This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components.
To deploy ODH with Service Mesh, follow along with our instructions found in
enabling_ossm.md
to find a reference KfDef along with other steps.Notes:
Current implementation relies on some hardcoded namespaces: usingopendatahub
for the KfDef namespace, andistio-system
+auth-provider
for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcodingopendatahub
, we are looking for suggestions on how to avoid this.secret-generator
to create ourOauthClient
resource as we need the secret to be propagated to another resource - in the future, we may extend thesecret-generator
to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible?autoInject
, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit.istio-system
andauth-provider
generated by the KfDef creation are not deleted by the operator upon KfDef deletion. This could be something to address in the future ODH operator rewrite.How Has This Been Tested?
Tested by creating our KfDef in CRC clusters.
istio-system
, that minimal ODH components are created inopendatahub
and have istio-proxy injected, and that authorino pod is created inauth-provider
and is injected with istio-proxy container.ServiceMeshMember
resource and requisite service mesh annotations added to namespace.Merge criteria: