-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Ensure odh-model-controller Deployment Waits for ConfigMap #361
Fix: Ensure odh-model-controller Deployment Waits for ConfigMap #361
Conversation
Signed-off-by: mtrujillo <[email protected]>
Hi @trujillm. Thanks for your PR. I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@trujillm I know this is draft but could you please add unit ? |
@@ -105,7 +105,6 @@ spec: | |||
configMapKeyRef: | |||
name: odh-model-controller-parameters | |||
key: nim-state | |||
optional: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we keep it as optional?
I mean, the os.GetEnv is ok if the env is not set( with the suggested change), however, if for some reason it is not set in during the startup, it would prevent odh to start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spolti I believe the intent is to block the deployment until the configmap is ready which I believe is the expected behavior for having the optional value defaulted to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we keep it as optional?
I mean, the os.GetEnv is ok if the env is not set( with the suggested change)
We have two different default behaviours, for IBM the default is "removed", and for everyone else, it's "managed". So we need the ConfigMap to be configured properly by the operator-controller and we cannot decide a default value without the ConfigMap.
however, if for some reason it is not set in during the startup, it would prevent odh to start.
Is this feasible? Both the Deployment and the ConfigMap are created by the same kustomization file. The ConfigMap is used and probably required by other parts too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for IBM the default is "removed",
How is this achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomerFi I understand why you choose removed
for the condition. I am ok with it.
However, I have a question about the IBM case. How can you set the default value removed
?
The configmap manifests will be reside in RHOAI operator so it can not be editable after release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagin the pipelins running for IBM set the DataScienceCluster.spec.kserve.nim.managedState to "removed", this gets translated into the params.env before executing the kustomization file that creates both the ConfigMap and the Deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Opendatahub operator have the logic, it makes sense then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading back my comment, I think I was unclear. When I wrote: We have two different default behaviours
, of course, we don't actually have two defaults, sorry about that. My point was, we can't decide what default value to use, hence we must rely on the ConfigMap that always has the correct value for us.
Signed-off-by: mtrujillo <[email protected]>
Signed-off-by: mtrujillo <[email protected]>
/retest |
@trujillm: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR but I have a question about IBM case
@@ -105,7 +105,6 @@ spec: | |||
configMapKeyRef: | |||
name: odh-model-controller-parameters | |||
key: nim-state | |||
optional: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomerFi I understand why you choose removed
for the condition. I am ok with it.
However, I have a question about the IBM case. How can you set the default value removed
?
The configmap manifests will be reside in RHOAI operator so it can not be editable after release.
/rerun-all |
/ok-to-test |
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jooho, trujillm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7806732
into
opendatahub-io:incubating
…datahub-io#361) * initial commit to fix NIM and deployment issues Signed-off-by: mtrujillo <[email protected]> * update to set default on nimState Signed-off-by: mtrujillo <[email protected]> * updated optinal to false Signed-off-by: mtrujillo <[email protected]> --------- Signed-off-by: mtrujillo <[email protected]> (cherry picked from commit 7806732)
…datahub-io#361) * initial commit to fix NIM and deployment issues Signed-off-by: mtrujillo <[email protected]> * update to set default on nimState Signed-off-by: mtrujillo <[email protected]> * updated optinal to false Signed-off-by: mtrujillo <[email protected]> --------- Signed-off-by: mtrujillo <[email protected]>
…datahub-io#361) Signed-off-by: mtrujillo <[email protected]>
…datahub-io#361) * initial commit to fix NIM and deployment issues Signed-off-by: Marcus Trujillo <[email protected]> * update to set default on nimState Signed-off-by: Marcus Trujillo <[email protected]> * updated optinal to false Signed-off-by: Marcus Trujillo <[email protected]> --------- Signed-off-by: Marcus Trujillo <[email protected]> (cherry picked from commit 7806732)
* fix: added accept header to nim manifest pull requests accepting oci (#362) * fix: added accept header to nim manifest pull requests accepting oci Signed-off-by: Tomer Figenblat <[email protected]> * chore: removed left-over debugging message Signed-off-by: Tomer Figenblat <[email protected]> --------- Signed-off-by: Tomer Figenblat <[email protected]> (cherry picked from commit f7fdf0a) * Fix: Ensure odh-model-controller Deployment Waits for ConfigMap (#361) * initial commit to fix NIM and deployment issues Signed-off-by: Marcus Trujillo <[email protected]> * update to set default on nimState Signed-off-by: Marcus Trujillo <[email protected]> * updated optinal to false Signed-off-by: Marcus Trujillo <[email protected]> --------- Signed-off-by: Marcus Trujillo <[email protected]> (cherry picked from commit 7806732) * fix: nim containers terminated prematurely (#363) Signed-off-by: Tomer Figenblat <[email protected]> Co-authored-by: Mikhail Mikhailitchenko <[email protected]> (cherry picked from commit c92b19f) * chore: added gocyclo no linting comment on the main function Signed-off-by: Tomer Figenblat <[email protected]> --------- Signed-off-by: Tomer Figenblat <[email protected]> Co-authored-by: Marcus Trujillo <[email protected]>
Fix: Ensure
odh-model-controller
Deployment Waits for ConfigMap & Adjust NIM_STATE HandlingIssue
odh-model-controller
creates both the Deployment and the ConfigMap.optional: true
for the ConfigMap, meaning it can start before the ConfigMap is available.NIM_STATE
environment variable, causing the backend to ignore Accounts.Fix 1: Ensure ConfigMap Availability
optional: false
for the ConfigMap in the Deployment.Fix 2: Adjust
NIM_STATE
HandlingNIM_STATE
.NIM_STATE
is not specified, it defaults to enabled instead of off.Impact
Ensures consistent deployment behavior.
Prevents the backend from ignoring Accounts due to missing
NIM_STATE
.Enables NIM by default when not explicitly set.
The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work