-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhook Errors on Clean Install #2902
Comments
Been playing with this a fair bit. I think that this is due to leader election being disabled in the knative reconcilers. However, if we reenable leader election, things get a lot noisier with the two replicas fighting over the lease. |
Reopening this as the issue is still here and lies with knative. |
happens on a clean install in AWS EKS , version 0.19.3, how can we fix? is it critical or just noise that settled after leader election? these messages only shown when pods start, later Karpenter works as expected and nodes are spawned.
|
Happens on upgrading karpenter from 0.16.3 to 0.20.0 as well. Is there any fix for the issue? |
Happened with me on clean install of karpenter v0.20.0 that has been deployed using v4.18.1 of https://github.com/aws-ia/terraform-aws-eks-blueprints/releases/tag/v4.18.1
These logs are from two pods. Note that errors are only in logs of one pod. Other pod's logs don't have these errors. Secondly, these errors were only seen on 2022-12-16 (When I created fresh cluster with karpenter). I am not noticing these errors now. This comment is copied from my message at https://kubernetes.slack.com/archives/C02SFFZSA2K/p1671437472589809 |
upgraded from v0.20 to v0.21 and enabled
restarting both pods seems to fix it |
Did the controller not work after these errors? They should just be transient errors that self heal, since both controllers are trying to reconcile the same webhook. |
I didn't wait long enough. I restarted both pods and they went instantly OK |
kubernetes-sigs/karpenter#142 moves the problem, but results in a new class of error. Still self healing. |
Is this still an issue in 0.23.0? Still getting occasional errors:
|
Still an issue -- we need to deep dive this w/ knative. Tbh, I'd prefer to just do kubernetes-sigs/karpenter#103. These webhooks are a pain. |
Sounds like a more straight forward approach and perhaps less complex 🤔 . Unless there is anything specific we're wanting webhooks in for? |
We haven't tried to migrate them to just crd builtins. There are some nontrivial defaults that may be hard. In the short term, can you live with the errors? |
Hey @ellistarn . Seems it's provisioning fine, it's just a bit noisy with the errors ATM. Short term should be fine and looking forward to solving this later on. Cheers! |
I saw this in one of my clusters too... it's just a noise atm 2023-02-16T04:10:45.532Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "5a7faa0-dirty", "knative.dev/traceid": "681c03be-5f2c-4919-8169-82e6f0b5468d", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "81.929004ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2023-02-16T04:10:45.532Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "5a7faa0-dirty", "knative.dev/traceid": "ef42549f-439b-4cc4-be33-3bdb81a2ede6", "knative.dev/key": "karpenter/karpenter-cert", "duration": "81.772322ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} |
Same issue here on Karpenter
Is there a way to validate if these errors are just noise? |
It seems that these errors were gone after updating to But I encountered this issue (leaving it here for anyone who needs it). EDIT
|
For folks concerned about this error, know that it's just noise unless it happens continuously without going away. Ideally, we'd prevent it from happening in the first place, but this requires changes upstream to knative/pkg. |
ran into this as well on a clean install following the instructions from the docs from here:
pods itself are running ok
|
I got this issue either on karpenter v0.27.6
|
This is happening to me with version 0.30.0 in EKS, clean installs, multiple clusters having the same problem. I installed via the helm chart. Only thing I did that's a little unusual is that the helm chart is installed via ArgoCD. The problem's been happening about a week, and after multiple restarts, so whatever's supposed to be self-healing, isn't in my case. |
Typically this happens if you have webhooks leaked from old karpenter versions. ArgoCD can leak these. Can you print out your webhooks? |
@ellistarn This is a brand new install. Never used karpenter before. Started on v0.30.0, no upgrades, no old versions to leak from. I redacted the
|
Are the errors persistent, or do they eventually go away? |
They're not constant but they're persistent. Last ones were about 10 min ago, 15 min ago, and an hour ago. Approximately. They're not on any regular schedule I can see. Sometimes only one, sometimes a couple of them just a few seconds apart. |
This is due to a known bug in the knative certificate reconciliation. We're moving towards deprecating these webhooks in a future release. If it's not blocking your operations, you can safely ignore them for now. |
This should be closed and fixed with v0.33.0, since the webhooks will be disabled by default. |
We can close this out now since we just released |
Closed by #5159 |
Hi,
I think problem still exists, in migration procedure to v1 it is required to enable conversion webhook: I am updating from 0.37.0 to 0.37.2 according to this procedure with webhooks enabled.
|
Using 1.0.2, same errors present in the deployment logs and terrform too breaks whle applying manifest. |
we see the same error and most probably the reason is that path and caBundle are missing in the webhook config. I have a version working in one cluster:
And another one giving the tls verification errors @madhavdas reports:
karpenter version 1.0.2, but also seen with 0.36.5. Installed from helm chart using flux. From the description in https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#write-a-conversion-webhook-server I assume that path and caBundle are required, but I was not able to find how to get them there. Maybe noteworthy: Karpenter is running in namespace "karpenter" in both clusters. Configs are practically identical except cluster-names Any idea? |
Should we repoen it as we are seeing this same issue in 0.37.3 upgrade to karpenter: @ellistarn .Please comment. |
The error (and then the solution) in my case finally was that we used images from a lokal registry and forgot to update the image reference together with the helm chart version. After we corrected this the update went through successfully. So double-check that you really run the correct image versions when updating the helm charts. |
Version
Karpenter Version: v0.19.1
Kubernetes Version: v1.23.13
Expected Behavior
Expect Karpenter to start without error logs on a clean install.
Actual Behavior
Karpenter errors, seemingly on a race condition with the webhook controller trying to update the CA bundle.
Steps to Reproduce the Problem
Resource Specs and Logs
See above for Actual Behavior
Community Note
The text was updated successfully, but these errors were encountered: