-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change behavior when deleting default SriovNetworkNodePolicy and SriovOperatorConfig #566
Change behavior when deleting default SriovNetworkNodePolicy and SriovOperatorConfig #566
Conversation
This commit changes the behavior of the webhook when deleting the default SriovNetworkNodePolicy and SriovOperatorConfig. This change is needed so that we can smoothly uninstall Helm releases that have the webhooks enabled. Without this change, when running `helm uninstall`, it fails because the webhook doesn't allow deletion of such resources mentioned above. Since these Webhook Configurations resources are deployed via the controller itself and not Helm, it's much more difficult to handle the lifecycle of them via Helm in the current state. Instead, it's easier to send a warning that these resources should not be deleted. Signed-off-by: Vasilis Remmas <[email protected]>
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 7338056271
💛 - Coveralls |
in helm i see we only create the default policy so during un-install i expect just that to fail. cluster would remain with sriovOperatorConfig obj. is that not the case ? generally for both obj's the relevant controller will first re-create the default obj before proceeding with reconcile. as an alternative, we can create default obj via helm as post install hook and remove in pre delete hooks. |
Adding @zeeke as he is also working on some issues with the namespace been removed and objects gets stuck because the webhook pod is already deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an helm expert, but I'm ok with deleting the default policy, as it gets recreated by the controller.
In theory, this problem (and others I'm having by deleting the operator's namespace) should have been addressed by the shutdown procedure:
But it seems to not work for reasons that aren't very clear to me yet.
LGTM
Yes that's the case. If Helm faces an error when deleting an object it manages, the helm release will keep lingering in the cluster in errored state. So, the uninstall is not clean and may leave objects lingering in the cluster.
👍
I think the removal won't work in that case. We will need to bypass the webhook to do so, and this means either of the following (and there might be more):
I think what this PR does is the simplest we can do today without too much of an impact. Especially if the operator itself will recreate the objects anyway. |
I agree, i think the creation of these default objects is something we would need to discuss, but for now lets not tie between the two. Im OK with this change. |
Requires and is based on: #709 This PR adds support for auto generated cert-manager certificates when user enables the SR-IOV Network Operator Admission Controllers via the Helm value `sriov-network-operator.operator.admissionControllers.enabled`. This PR won't work until: * a new image of SR-IOV Network Operator is published and the following value is updated https://github.com/vasrem/network-operator/blob/f125b8a67772fc31af6ced0b24c9249531e5e542/deployment/network-operator/values.yaml#L146 * a new image of SR-IOV Network Operator Webhook is published and the following value is updated https://github.com/vasrem/network-operator/blob/f125b8a67772fc31af6ced0b24c9249531e5e542/deployment/network-operator/values.yaml#L152 * This is needed to ensure smooth `helm uninstall` operation. Depends on k8snetworkplumbingwg/sriov-network-operator#566.
Opening this PR for discussion as it looks to me like the simplest way forward. I'm not sure how bad is it to delete the default CRs and if warning the user instead of blocking the user is sufficient.
This commit changes the behavior of the webhook when deleting the default
SriovNetworkNodePolicy
andSriovOperatorConfig
. This change is needed so that we can smoothly uninstall Helm releases that have the webhooks enabled.Without this change, when running
helm uninstall
, it fails because the webhook doesn't allow deletion of such resources mentioned above.Since these Webhook Configurations resources are deployed via the controller itself and not Helm, it's much more difficult to handle the lifecycle of them via Helm in the current state. Instead, it's easier to send a warning that these resources should not be deleted.
Testing
Used Case 5 of #561 and managed to install and uninstall without any issue.
helm uninstall
log looks like this: