Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Create cert Secret and update KServe local gateway #221

Merged
merged 16 commits into from
Jun 27, 2024

Conversation

Jooho
Copy link
Contributor

@Jooho Jooho commented Jun 12, 2024

Description:

This is one of tasks for Private Endpoint Tasks (opendatahub-io/kserve#371)

This PR support the following:

  • add Serving Cert annotation to runtime Service
  • create the same secret that was created by Serving cert in the istio-system
  • update kserve local gateway for the isvc

Jira: https://issues.redhat.com/browse/RHOAIENG-7921

(NOTE) Loopy was tested on fedora only.

Test

Env setup

  • Using source

    mkdir /tmp/test
    cd /tmp/test
    git clone [email protected]:Jooho/loopy.git
    cd loopy
    make init  
    
  • Using docker image

docker run --name loopy -it  quay.io/jooholee/loopy:latest /bin/bash

Create a cluster information script

cd loopy

cat cluster_info.sh
CLUSTER_CONSOLE_URL=https://console-openshift-console.XXXX
CLUSTER_API_URL=https://api.XXX:443
CLUSTER_ADMIN_ID=admin
CLUSTER_ADMIN_PW=password
CLUSTER_TOKEN=sha256~XXX
CLUSTER_TYPE=ROSA

Install Kserve with a new manifests

./loopy playbooks run  odh-stable-install-kserve-serverless-on-existing-cluster -p CUSTOM_KSERVE_MANIFESTS=https://github.com/jooho/kserve/tarball/test-manifest  -p CUSTOM_ODH_MODEL_CONTROLLER_MANIFESTS=https://github.com/jooho/odh-model-controller/tarball/test-manifests -vvv -i ./cluster.sh

Deploy sample sklearn model

./loopy units run test-kserve-sklearn-v2-iris-role -vvv -i ./cluster_info.sh 

Checks

  1. Verify if runtime Service add serving cert annotation
oc get service sklearn-example-isvc-iris-v2-rest -ojsonpath='{.metadata.annotations}'
  1. Verify if Serving Cert Secret is created
oc get secret sklearn-example-isvc-iris-v2-rest
  1. Verify if Kserve gateway have the Server
kubectl get gateway kserve-local-gateway -n istio-system -o jsonpath='{.spec.servers[?(@.port.name=="sklearn-example-isvc-iris-v2-rest")]}'

Copy link
Contributor

openshift-ci bot commented Jun 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jooho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jooho
Copy link
Contributor Author

Jooho commented Jun 14, 2024

/test unit

1 similar comment
@Jooho
Copy link
Contributor Author

Jooho commented Jun 14, 2024

/test unit

}

func (r *KserveGatewayReconciler) Delete(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error {
log.V(1).Info(fmt.Sprintf("Deleting serving cert secret(%s) in the namespace(%s)", isvc.Name, isvc.Namespace))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced the other reconcile and they were using info so I followed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, but, on normal operation the fewer info messages we have the better.
IMHO, this is one example of informational messages that could be set to debug only.

return err
}

log.V(1).Info(fmt.Sprintf("Deleting the Server(%s) from KServe local gateway in the istio-system namespace", isvc.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

controllers/utils/init.go Outdated Show resolved Hide resolved
@Jooho Jooho force-pushed the internal_hostname_gateway branch 2 times, most recently from 5e76504 to 04dc6dd Compare June 19, 2024 13:56
Signed-off-by: jooho lee <[email protected]>
@Jooho Jooho force-pushed the internal_hostname_gateway branch from e81094f to 5390771 Compare June 21, 2024 14:16
Signed-off-by: jooho lee <[email protected]>
Copy link
Contributor

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the MutatingWebhook needs patches similar to what we have for the ValidatingWebhook.
field patch and webhook patch

go.mod Outdated Show resolved Hide resolved
controllers/inferenceservice_controller.go Outdated Show resolved Hide resolved
controllers/webhook/kserve_service_mutator.go Outdated Show resolved Hide resolved
Comment on lines 41 to 43
var meshNamespace string
var destSecretName string
var portName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are globals... I wonder if it won't lead to race conditions...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. I changed destSecretName,portName but I believe meshNamespace should be ok because it is always the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

controllers/reconcilers/kserve_isvc_gateway_reconciler.go Outdated Show resolved Hide resolved
config/webhook/manifests.yaml Outdated Show resolved Hide resolved
@Jooho Jooho force-pushed the internal_hostname_gateway branch 3 times, most recently from 6160559 to 038c090 Compare June 22, 2024 08:39
@Jooho Jooho force-pushed the internal_hostname_gateway branch from 038c090 to 1de88b0 Compare June 22, 2024 12:26
@Jooho
Copy link
Contributor Author

Jooho commented Jun 22, 2024

@israel-hdez FYI, I removed mutating webhook and re-add isvc_service_cert_reconciler because it turned out kserve controller rollback the annotation.

@Jooho
Copy link
Contributor Author

Jooho commented Jun 24, 2024

/retest

Copy link
Contributor

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some additional comment. It may be easy to just reply so that I understand better.

If you, by chance, would push some code changes, I saw some code commented in _test.go files. It would be good if you can clean-up them.

if service.Annotations == nil {
service.Annotations = make(map[string]string)
}
service.Annotations = desiredService.Annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this code is back, the old comment still stands...

Comment on lines +79 to +80
log.V(1).Info(fmt.Sprintf("Waiting for the creation of the serving certificate Secret(%s) in %s namespace", isvc.Name, isvc.Namespace))
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like my comment in the kserve_isvc_service_cert_reconciler.go around here, you may want a Watch on the controller.

...it happens that, typically, the model load takes more time than the provisioning of the certificate. Thus, the model status update would lead to also reconciling the gateway and the svc. So, skipping the Watch may sound fine, but it is still safer to add it (in case you find how). The motivation is the same as past review: I'm not sure if it is safe to make assumptions on timing/order of unrelated events.

Comment on lines +247 to +252
// Remove old secret if src secret is updated
if preDestSecret != nil {
if err := r.client.Delete(ctx, preDestSecret); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoszmajsak @rcernich This will delete a no longer useful TLS secret and create a new update one with the same name (e.g. at rotation).

Do you know if the ingress gateway will reload it without any changes to the Gateway? Or does it need to have a new name, and the Gateway must be updated to effectively reload it?

// Recreate copied secrt when src secret is updated
if !reflect.DeepEqual(srcCertSecret.Data, copiedCertSecret.Data) {
log.V(1).Info(fmt.Sprintf("Recreating for serving certificate Secret(%s) in %s namespace", copiedCertSecret.Name, meshNamespace))
if err := r.copyServingCertSecretFromIsvcNamespace(ctx, srcCertSecret, copiedCertSecret); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the Watch on the controller, I'm failing to understand how this will catch the certificate rotation, because looks like none of the watched & owned resources will be updated and a reconcile won't be triggered.

I think you should add these lines, over here so that units can catch this (in case there is an issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented from here, watch can not catch the Secret because it does not have the specific level. I need to think more to solve this rotation part but at the moment, Watch is not the right approach.

@Jooho Jooho merged commit 2ea55ef into opendatahub-io:main Jun 27, 2024
4 of 5 checks passed
@Jooho
Copy link
Contributor Author

Jooho commented Jun 27, 2024

This PR was somehow merged accidentally so I created another PR to follow up (#229)

@Jooho Jooho mentioned this pull request Jun 27, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants