Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show KServe defaultDeploymentMode in DSC status #1465

Merged

Conversation

grdryn
Copy link
Member

@grdryn grdryn commented Dec 29, 2024

Description

JIRA: https://issues.redhat.com/browse/RHOAIENG-16240

The dashboard would like to be able to read this value, and should
only read it from the status.

The value for the status is always read from the KServe ConfigMap,
rather than ever assuming that what is set in the DSC, or what the
operator defaults to, is correct: this is because that ConfigMap can
be annotated to be unmanaged by the operator, in which case a user may
have changed the value to something else, and the status should
reflect this reality.

How Has This Been Tested?

  • Manually during development
    • including setting the ConfigMap to not be managed, and changing the value from what the operator would have set, and seeing that reflected in the Kserve and DSC CRs
  • e2e test has been updated to check for the field

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@grdryn
Copy link
Member Author

grdryn commented Dec 30, 2024

/test images

@grdryn grdryn force-pushed the RHOAIENG-16240_status branch from 4679db4 to 3635ba3 Compare December 30, 2024 00:54
@grdryn
Copy link
Member Author

grdryn commented Dec 30, 2024

/retest

2 similar comments
@grdryn
Copy link
Member Author

grdryn commented Dec 31, 2024

/retest

@grdryn
Copy link
Member Author

grdryn commented Jan 1, 2025

/retest

@biswassri
Copy link
Contributor

biswassri commented Jan 2, 2025

@grdryn Is this PR a pre-requisite or a follow-up PR to #1455? Both are marked with the same JIRA. I just read from the linked JIRA comments that there could be an overlap.

@grdryn
Copy link
Member Author

grdryn commented Jan 2, 2025

@biswassri I think #1455 is a misunderstanding of what's needed. I've asked about it in the jira issue.

@grdryn
Copy link
Member Author

grdryn commented Jan 2, 2025

/retest

@grdryn grdryn force-pushed the RHOAIENG-16240_status branch from 3635ba3 to 64128f9 Compare January 2, 2025 20:07
@grdryn
Copy link
Member Author

grdryn commented Jan 2, 2025

/retest

ray and kueue e2e test flakes

@grdryn
Copy link
Member Author

grdryn commented Jan 2, 2025

/retest

ds pipelines test flake 🤔

@@ -157,6 +157,7 @@ func (s *componentHandler) NewComponentReconciler(ctx context.Context, mgr ctrl.
WithAction(deploy.NewAction(
deploy.WithCache(),
)).
WithAction(setStatusFields).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure 🤔 it looks like that will get called only during the reconciliation in the DSC controller, right? I think we should also have this set in the Kserve CR status for consistency, and to do that I think it needs to be updated here. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmmmm, why we do not add the "set status" in customizeKserveConfigMap() directly?
e.g afterinferenceServiceConfigMap.Data[DeployConfigName] = string(deployDataBytes)
then we can have a default value "Serverless" for defaultDeploymentMode , only if deployData.DefaultDeploymentMode != string(defaultmode) { it sets a different value.

--
tbh, i am not fond of setting component's status in DSC.
if possible, it should only be in the kserve CR 's .status. or we are duplicating all things in both places.
e.g modelreg namespace is only in the DSC .spec and ModelReg CR .spec , but not DSC .status.
it would be good for dashboard to check component CR to get each's .status.
but this is out of the scope for this PR, if the requirment is to get it done in DSC .status.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we do not add the "set status" in customizeKserveConfigMap() directly?

The main reason that I chose to keep this separate, and after the deploy action, is that we want to read the ConfigMap from the cluster rather than relying on it from the resources list, because the user may decide to manage it themselves, and change the value from what the operator would have set it to (the value that would be in the resources list). Doing it after the deploy action should mean that the ConfigMap at least always exists on the cluster at this point. If there wasn't that edge case of unmanaged configmap, then it would make sense to do it in customizeKserveConfigMap().

it would be good for dashboard to check component CR to get each's .status.

Yeah maybe, I'm not sure what's best there. 🤔 @lburgazzoli do you have any thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me get back from PTO :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lburgazzoli ah apologies 🙈 in any case I think this is not a blocking conversation 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't worry 😉

controllers/components/kserve/kserve_controller_actions.go Outdated Show resolved Hide resolved
@grdryn grdryn force-pushed the RHOAIENG-16240_status branch from 64128f9 to e0bf6e4 Compare January 3, 2025 17:07
JIRA: https://issues.redhat.com/browse/RHOAIENG-16240

The dashboard would like to be able to read this value, and should
only read it from the status.

The value for the status is always read from the KServe ConfigMap,
rather than ever assuming that what is set in the DSC, or what the
operator defaults to, is correct: this is because that ConfigMap can
be annotated to be unmanaged by the operator, in which case a user may
have changed the value to something else, and the status should
reflect this reality.
@grdryn grdryn force-pushed the RHOAIENG-16240_status branch from e0bf6e4 to e051140 Compare January 3, 2025 17:13
Copy link

openshift-ci bot commented Jan 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asanzgom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jan 3, 2025
@grdryn
Copy link
Member Author

grdryn commented Jan 3, 2025

/retest

kueue test flake (I guess these flakes are timeouts)

@grdryn
Copy link
Member Author

grdryn commented Jan 3, 2025

/retest

training operator this time. 🔍 we should try to figure out why all of these tests are failing intermittently now.

@grdryn
Copy link
Member Author

grdryn commented Jan 3, 2025

/retest

Ray this time. It often seems to be at the "Validate_Disabling_<Name>_Component" check.

 === RUN   TestOdhOperator/validate_installation_of_ray_component/e2e-test-dsc/Validate_Disabling_Ray_Component
    ray_test.go:75: 
        	Error Trace:	/go/src/github.com/opendatahub-io/opendatahub-operator/tests/e2e/ray_test.go:75
        	Error:      	Received unexpected error:
        	            	component kuberay-operator deployment rayis not ready
        	Test:       	TestOdhOperator/validate_installation_of_ray_component/e2e-test-dsc/Validate_Disabling_Ray_Component
        	Messages:   	error testing ray component enabled field 

@grdryn
Copy link
Member Author

grdryn commented Jan 4, 2025

/retest

data science pipelines this time:

 === RUN   TestOdhOperator/validate_installation_of_datasciencepipelines_component/e2e-test-dsc/Validate_Disabling_DataSciencePipelines_Component
    datasciencepipelines_test.go:81: 
        	Error Trace:	/go/src/github.com/opendatahub-io/opendatahub-operator/tests/e2e/datasciencepipelines_test.go:81
        	Error:      	Received unexpected error:
        	            	component DataSciencePipelines is disabled, should not have deployments in NS opendatahub any more
        	Test:       	TestOdhOperator/validate_installation_of_datasciencepipelines_component/e2e-test-dsc/Validate_Disabling_DataSciencePipelines_Component
        	Messages:   	error testing DataSciencePipelines component enabled field

@grdryn grdryn mentioned this pull request Jan 4, 2025
5 tasks
@lburgazzoli
Copy link
Contributor

@grdryn I do have some idea about the reason of sic failures, but I need to validate them. I'll do so on Tuesday when back

@grdryn
Copy link
Member Author

grdryn commented Jan 6, 2025

/retest

data science pipelines again:

 === RUN   TestOdhOperator/validate_installation_of_datasciencepipelines_component/e2e-test-dsc/Validate_Disabling_DataSciencePipelines_Component
    datasciencepipelines_test.go:81: 
        	Error Trace:	/go/src/github.com/opendatahub-io/opendatahub-operator/tests/e2e/datasciencepipelines_test.go:81
        	Error:      	Received unexpected error:
        	            	component DataSciencePipelines is disabled, should not have deployments in NS opendatahub any more
        	Test:       	TestOdhOperator/validate_installation_of_datasciencepipelines_component/e2e-test-dsc/Validate_Disabling_DataSciencePipelines_Component
        	Messages:   	error testing DataSciencePipelines component enabled field

@openshift-merge-bot openshift-merge-bot bot merged commit 326e4fc into opendatahub-io:main Jan 6, 2025
8 checks passed
@grdryn grdryn deleted the RHOAIENG-16240_status branch January 6, 2025 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants