-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion webhook incompatible with Flux GitOps management of nodepool manifests #6867
Comments
We had a very similar issue after upgrading from |
From what I can tell, these are not related issues. The webhook endpoint in the author's error is correctly referencing the I'm receiving the same issue in my cluster, controlled through FluxCD and using the the karpenter and karpenter-crd Helm charts. I've also tried updating my manifest of the NodePool to match exactly what was converted on the cluster to no avail. |
Hi, Looking at the log of the karpenter pods, I received the very same error pointed out in the issue description. When running the following:
Everything works as expected, but nothing is printed in the karpenter pods logs. |
@booleanbetrayal and @cmartintlh you are both right. #6847 is more relevant to what we are facing! Thanks for pointing out 🙏🏼 |
From looking at the calls made by the API server to the conversion webhooks on a server side apply, I see that the an empty nodepool spec is being sent to the conversion webhooks. This results in a nil pointer dereference on the nodeClassRef when we try converting that component. We should be able to mitigate this from the webhook side by checking the object first https://github.com/kubernetes-sigs/karpenter/blob/b69e975128ac9a511542f9f1d245a6d4c3f91112/pkg/apis/v1/nodepool_conversion.go#L179 |
@engedaam thank you for your reply. At the moment, we paused the upgrade to karpenter v1.0.0 in all our environments. Only a test environment is based on that version now. Can we assume that when v1.1.0 is available and the conversion webhook is removed, we can upgrade successfully in two steps to version v1.1.0? We plan to upgrade our environments that are now based on version v0.37.1 by following these steps:
Does it make sense? |
@Giaco9NN Yeah, that makes sense. A good workaround here is also to use client side apply. Are you able to use that with Terraform? Flux offers the ability use client side which should help mitigate this issue: https://fluxcd.io/flux/faq/#why-are-kubectl-edits-rolled-back-by-flux |
Same behavior with argocd, "failed to prune fields: failed add back owned items: failed to convert pruned object at version karpenter.sh/v1: conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed: Post "https://karpenter.karpenter.svc:8443/conversion/karpenter.sh?timeout=30s": EOF" Anyone with argocd have solved it ? |
@engedaam I tried to leverage the field_manager {
force_conflicts = true
name = "before-first-apply"
} This way, I can modify the node pool. I aim to find a way to remove this field_manager because it's unclear to me what it exactly does. |
So we added a fix to the
|
Cool. I presume this will be backported to a 1.0.x? I think 1.1 removed the conversion webhooks. |
Yes, we plan on backporting the fix to all version that contain the conversion webhooks. |
Closing out this issue as the fix was merged kubernetes-sigs/karpenter#1669 |
Description
Observed Behavior:
When flux is managing the manifests for nodepools, the following is observed. I am starting from version 0.37.2
Update controller to v1.0.1, and then apply updated iam policy. Next update the manifests for ec2nodeclasses and nodepools to the v1 api version in git.
Flux fails to apply the nodepool manifests with the following error
This can be reproduced by doing a dry run of the same manifest with Kubectl
The Karpenter controller logs the following error when this occurs
Expected Behavior:
GitOps tooling like flux should successfully apply the nodepool manifest from
git.
Reproduction Steps (Please include YAML):
Versions:
Chart Version: 1.0.1
Kubernetes Version (
kubectl version
): 1.30 (EKS)Flux Version: 2.3.0
Manage Karpenter's installation with flux, I am installing to the karpenter namespace.
Update to version 1.0.1 from 0.37.2
workaround
In order to proceed I had to clear the managed fields on the nodepool objects
Here are the fields prior to wiping them out
I then ran the following command to clear the managed fields
Flux was then able to successfully reconcile the manifest after the removal of the managed fields.
For comparison, here is what the managed fields look like after the reconcilation
The text was updated successfully, but these errors were encountered: