-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement model registry inference service reconciliation #135
Implement model registry inference service reconciliation #135
Conversation
Skipping CI for Draft Pull Request. |
837a40e
to
15935b0
Compare
15935b0
to
5d1c108
Compare
As agreed we decided to implement this workflow where the model-controller has a more "passive" behavior with respect to the model registry reconcilication. As described in the PR description, the new reconciler will monitor the
PS: all commits must be squashed as soon as the PR is approved or directly during the merge |
5d1c108
to
6c60f51
Compare
namespace: default | ||
labels: | ||
"mr-inference-service-id": "4" | ||
finalizers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This finalizer will be added to the isvc created by mr controller? The finalizer of the isvc will not be removed by mr controller, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm not exactly, the finalizer will be added my the odh-model-controller itself (see https://github.com/opendatahub-io/odh-model-controller/pull/135/files#diff-e628c4483237f3dba685d802623c535845221a01b2bd2e3782a82ab9df48320aR120-R129) and then it will be removed by the model controller as well during the deletion.
The ISVC are going to be created by the dashboard as it is right now
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
29f6c8c
to
e368740
Compare
In the scope of testing of Model Registry in openshift-ci: - make a Shell script which invokes some REST calls to MR, - so to make sure the REST endpoint is responsive, - then create a K8s ISVC on the cluster, - and display the MR InferenceService entities. Later, in a subsequent issue/PR, once: opendatahub-io/odh-model-controller#135 is merged, the last bulletpoint can be automated and placed under test in the final part of this script so to make sure the K8s ISVC on the cluster reflected as a precise MR InferenceService entity.
Co-authored-by: Edgar Hernández <[email protected]> Signed-off-by: Andrea Lamparelli <[email protected]>
e368740
to
c9cbe07
Compare
Co-authored-by: Edgar Hernández <[email protected]> Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
Signed-off-by: Andrea Lamparelli <[email protected]>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lampajr I'm good with the changes.
I haven't tried this live. I'll try it on my Friday.
I think the only pending comment is the one from Vedant about the manifests. Let me know when you are ready. If I don't see any issue in my live testing, I'll merge once you confirm this is ready to go.
In the scope of testing of Model Registry in openshift-ci: - make a Shell script which invokes some REST calls to MR, - so to make sure the REST endpoint is responsive, - then create a K8s ISVC on the cluster, - and display the MR InferenceService entities. Later, in a subsequent issue/PR, once: opendatahub-io/odh-model-controller#135 is merged, the last bulletpoint can be automated and placed under test in the final part of this script so to make sure the K8s ISVC on the cluster reflected as a precise MR InferenceService entity.
Signed-off-by: Andrea Lamparelli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried this live. I'll try it on my Friday.
I think the only pending comment is the one from Vedant about the manifests. Let me know when you are ready. If I don't see any issue in my live testing, I'll merge once you confirm this is ready to go.
Thsnk @israel-hdez I hope I added all details with regard to "how to test" this feature in a real cluster.
@VedantMahabaleshwarkar I agree with all your comments and I think I was able to simplify the dev
manifests with my latest commit but unfortunately I wasn't able to fix the crd/external
issue - I mean I couldn't find a way to include all those CRDs just in the overlays/dev
because of
Error: accumulating resources: 2 errors occurred:
* accumulateFile error: "accumulating resources from '../../crd/external/serving.kserve.io_inferenceservices.yaml': security; file '/home/alampare/repos/odh-model-controller/config/crd/external/serving.kserve.io_inferenceservices.yaml' is not in or below '/home/alampare/repos/odh-model-controller/config/overlays/dev'"
* loader.New error: "error loading ../../crd/external/serving.kserve.io_inferenceservices.yaml with git: url lacks host: ../../crd/external/serving.kserve.io_inferenceservices.yaml, dir: got file 'serving.kserve.io_inferenceservices.yaml', but '/home/alampare/repos/odh-model-controller/config/crd/external/serving.kserve.io_inferenceservices.yaml' must be a directory to be a root, get: invalid source string: ../../crd/external/serving.kserve.io_inferenceservices.yaml"
on the other hand I do not see where those crd/external/kustomization
is used, hence I don't see why keeping all those CRDs enable would be an issue. Is it referenced from other repos?
- monitoring.coreos.com_servicemonitors.yaml | ||
- maistra.io_servicemeshmemberrolls.yaml | ||
- maistra.io_servicemeshmembers.yaml | ||
- telemetry.istio.io_telemetries.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with you @VedantMahabaleshwarkar but I don't know how to reference to crd/external
from the overlays/dev
without directly changing crd/external/kustomization.yaml
- I tried to directly add all CRDs into the overlays/dev/kustomization
but I get:
Error: accumulating resources: 2 errors occurred:
* accumulateFile error: "accumulating resources from '../../crd/external/serving.kserve.io_inferenceservices.yaml': security; file '/home/alampare/repos/odh-model-controller/config/crd/external/serving.kserve.io_inferenceservices.yaml' is not in or below '/home/alampare/repos/odh-model-controller/config/overlays/dev'"
* loader.New error: "error loading ../../crd/external/serving.kserve.io_inferenceservices.yaml with git: url lacks host: ../../crd/external/serving.kserve.io_inferenceservices.yaml, dir: got file 'serving.kserve.io_inferenceservices.yaml', but '/home/alampare/repos/odh-model-controller/config/crd/external/serving.kserve.io_inferenceservices.yaml' must be a directory to be a root, get: invalid source string: ../../crd/external/serving.kserve.io_inferenceservices.yaml"
any idea?
* add Shell script openshift-ci make some REST call to MR In the scope of testing of Model Registry in openshift-ci: - make a Shell script which invokes some REST calls to MR, - so to make sure the REST endpoint is responsive, - then create a K8s ISVC on the cluster, - and display the MR InferenceService entities. Later, in a subsequent issue/PR, once: opendatahub-io/odh-model-controller#135 is merged, the last bulletpoint can be automated and placed under test in the final part of this script so to make sure the K8s ISVC on the cluster reflected as a precise MR InferenceService entity. * omit OCP_CLUSTER_NAME to be valorized once on openshift-ci
Hi @israel-hdez @VedantMahabaleshwarkar , thanks a lot for all your suggestions!! I think I was able to address all of them, I also tested again in a real cluster and it worked for me (following the instructions I provided in the PR description). |
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to try this and runs fine.
This is low risk because it is turned off by default. So, I'll merge.
I left one comment for your consideration for later improvement.
} | ||
Expect(grpcPort).ToNot(BeNil()) | ||
|
||
mlmdAddr = fmt.Sprintf("localhost:%d", *grpcPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change the code because of this connection to localhost.
This is somewhat specific to Kind. I was able to workaround by doing a port-forward, but given the tests re-deploy the registry, the port-forward needs to be re-opened before each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for pointing this out! I will keep track of this in a separate issue 💪
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: israel-hdez, lampajr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes https://github.com/opendatahub-io/model-registry/issues/249
Description
Implement the model registry InferenceService reconciliation, the implemented workflow is the opposite of the one proposed in #124. Here the direction goes from
Cluster
toModelRegistry
.The new reconciler will monitor
InferenceService
CRs having pre-definedlabels
, based on thoselabels
will sync the model registry by keeping track of every deployment that occurred in the cluster.Then will update the
InferenceService
CR by linking it to the model registry record using a specificlabel
.How Has This Been Tested?
e2e tests
Added e2e tests under
test/e2e/
folder, currently I am running those tests using Kind cluster:kind create cluster --config test/config/kind-e2e-config.yaml
IMG=quay.io/$USER/odh-model-controller:$(git rev-parse HEAD)
make IMG=$IMG docker-build
kind load docker-image $IMG
make IMG=${IMG} deploy-dev
make e2e-test
manual test
Setup cluster
By default this new model-registry for model-serving reconciler/controller is disabled in
odh-model-controller
, in order to properly enable it you should just start the controller with--model-registry-enabled
flag.https://github.com/lampajr/odh-model-controller/tarball/lampajr20231219_gh249_reconciler_from_isvc_test
already contains this change in the configuration.git clone https://github.com/opendatahub-io/model-registry-operator.git
make KUBECTL=oc install
make KUBECTL=oc deploy
Setup DS project
demo-model-registry-20240104
kustomize build config/samples | oc apply -f -
(make sureoc
is pointing to the data science project create in step 1.)model-server
Upload models
For the sake of simplicity let's upload some
onnx
files in the local minio instance (bucket name ismodels
):models/mnist/v12/mnist-12.onnx
models/mnist/v8/mnist-8.onnx
Fill up the model registry
:8080
of the model registry proxyMR_HOSTNAME="<endpoint>"
You can use the following curls to inspect model registry content:
Test model controller workflow
As soon as you create an
InferenceService
CR in the project, the odh-model-controller will create theServingEnvironment
in the model registry having thename == namespace
(i.e., DS project), you can inspect its ID:InferenceService
with proper links to the registered model (IDs
might be different):modelregistry.opendatahub.io/registered-model-id
identifies the registered model we would like to deploymodelregistry.opendatahub.io/model-version-id
identifies the specific version to deploy, if omitted (as in this case) let's deploy the latest version for that registered modelAt this point, the odh-model-controller will monitor
InferenceService
CRs and based on theirlabels
will create the corresponding records in model registryTo check the model registry you can run:
Expected output:
Check the ISVC has been correctly linked to the newly created record in model registry:
Expected output (note
modelregistry.opendatahub.io/inference-service-id
label):InferenceService
(you can do this using the ODH dashboard as well)Check record in model registry is correctly update, note the
UNDEPLOYED
state:Expected output:
Merge criteria: