-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-load-balancer-webhook-service error for non alb Ingresses #2071
Comments
@gazal-k The webhook for Ingress is merely validating changes(for security features we included with IngressClassParams support), and it'll just permit any changes for other Ingresses using other IngressClasses. It's fine to delete the ValidatingWebhookConfiguration if it's a single cluster use, but you should fix the certificate ideally. |
I'm not entirely sure what may have happened. Because this wasn't me personally who installed the chart. @anilkumarpasupuleti do u know if we reverted webhookConfiguration/cert Secret during installation? I just checked the installation instructions now. It's not a straight helm install, is it? Do the CRDs have to be done manually first? Can not doing that cause this issue? |
Is this the |
We also checked this somewhat related #1909. Verified that the caBundle in the webhooks matched the |
Is that |
This issue does not exist on chart version 1.1.6, its on k8s v1.18 |
@anilkumarpasupuleti, the validating webhook is included in chart v1.20 (app v2.2.0) and later. |
We got the error in chart version 1.20. We tried downgrading and obviously, no validating webhook no errors. Anil was just mentioning that we're trying all of this in EKS 1.18. (not to be confused with the chart version) |
@gazal-k caBundle configured in webhook
CA certificate from the secret
Also verify the webhook certificate is issued by the configured CA
|
I was able to verify installing & updating apps using other ingress controllers using |
I have also had this issue when using Polaris https://artifacthub.io/packages/helm/fairwinds-stable/polaris on version 4.0.4
We use cert-manager as detailed with the Polaris documentation and currently running EKS 1.20 |
@darrenwhighamfd Is this issue still happening in your cluster? have you checked whether the cert stored in secret matches the CA configured in |
Faced with the same issue, with 2 ingress controllers installed on one cluster aws-load-balancer controller and nginx-ingress controller with
Fixed the issue by uninstallation of helm chart aws-load-balancer controller and installing it back again. Hm, just now I see, that issue spontaneously still exist during deploy of some helm charts with Ingress. Triggering upgrade again, deploy successfully. So issue still exist. |
@M00nF1sh yes we still have this issue, currently we have had to revert to 1.1.6 |
I guess that's our experience too. Maybe it wasn't the tooling. It was just that the issue was sporadic. We had verified the certs and secrets matching; #2071 (comment) |
ha, found that uninstallation of helm chart, do not remove secrets.
So I've tried again helm uninstall -n kube-system aws-load-balancer-controller
k delete secret -n kube-system aws-load-balancer-controller-token-spwhd
k delete secret -n kube-system aws-externaldns-token-wkqpc and then install it again, by the way, it created 3 new secrets:
possibly it will help. After that deployment of our helm chart with ingress worked fine. Do not know if this fix whole problem or not |
@DmitriyStoyanov, @gazal-k, @darrenwhighamfd if you are able to reproduce the issue, could you please create a support ticket with AWS support with cluster ARN? |
Same problem here:
|
So I think I found my issue, and possibly what others here are seeing. @DmitriyStoyanov was close in saying
But maybe the key here is that the Ingress spec uses The documentation https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/ingress_class/#deprecated-kubernetesioingressclass-annotation looks to be specific to the annotation for So the "fix" is to update any Ingress objects that are using the annotation style like apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: default-backend-traefik
namespace: ingress
spec:
ingressClassName: traefik and also define your IngressClass object. apiVersion: networking.k8s.io/v1beta1
kind: IngressClass
metadata:
name: traefik
spec:
controller: traefik.io/ingress-controller If I had to guess, the AWS LB controller is not checking the annotations for ingress class, and therefore defaulting your Ingress objects for traefik to be the ALB class and causing the error. |
Well, I take that back. I think it was in fact sporadic like others here are reporting. I don't know what the common pattern is though. For me, we utilize spinnaker + I'm at the point where I'm wondering if I need to break my helm chart up and have to separately deploy AWS LB controller (probably should have done this from the start). Can everyone else confirm that they also only see this when applying a manifest file that includes both the AWS LB controller install AND ingress objects for nginx/traefik? |
We use helmfile and specifically We are currently just using the older version of the chart; 1.1.6. Our plan is to use albc for all our |
It is interesting, today faced with this issue while deployment of helm chart with ingress with annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:xxxxxxx:xxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx and in logs After retry, it is fixed as many times before. So possible issue not only related to two nginx and alb ingress controllers we have in cluster, but also with aws acm certificate check? |
We are using an alb ingress with an acm cert. This error is happening periodically with no configuration changes. log:
ingress:
|
EDIT: this is not the case, I am still seeing the error. It is intermittent, Update, I found that when I remove the annotation |
I do not have this annotation at all but still facing the problem. |
I use helmfile and it shows diff of certificates on each run, like described in this issue: aws/eks-charts#347 There is non-documented value enableCertManager which is false by default so certificates get regenerated on each run of helmfile apply. I suspect that this may be the reason because if I run helmfile with --selector flag and update just on chart, the error seems not to appear. But when I run helmfile over the whole stack, it appears almost always. It can be that controller is not fast enough to reconfigure the certificates. Just guessing. And to clear up, I have no non-ALB ingresses, all are just ALB. |
Hey sorry about the confusion, I am still seeing this error! |
Would #2264 resolve this issue? |
having it also |
Ran into this today. Fixed by pruning all related resources and re-installing. |
I ran into this today after upgrading the helm deployment from APP version 2.0.1 to 2.2.4. |
I just ran into this via the CDK HelmChartProvider. Solution was to try it 3 times... As this is done via Cloudformation the Stack just rolled back and I could do this easily but feels kinda hacky. |
I face this issue on a regular basis while applying new/updating nginx ingress configured to use LB via AWS load balancer controller. The only solution I found is to delete all resources of AWS lb controller and reinstall. (I usually do it though argocd, so it's pretty quick reinstallation) |
@kishorj I've created a ticket with AWS support. CaseID 9282832861 |
Did you have any reply? |
Yep, they said they were working on fixing it but couldn't share an ETA. |
Hi there, quick data point on this one. I find that I get this error when I install both the controller and a CRD which uses it in the same helm chart. I.e. my chart has the conroller as a subchart, and a targetgroup binding as a template. Here is my error:
If I install the controller first, then install the targetgroupbinding after a delay, it works. |
Reasonably confident this is the same as #2239.
We have set |
Can confirm that this solution also fixed it for me. |
We are installing it through helm via ArgoCD. We set |
I tried this, but the issue still keeps periodically popping up too |
Today I got the same error on an AWS cluster. I had to remove ns and reinstall everything from scratch. |
Still happening. aws-lb-controller version is latest |
I also ran into this today across multiple clusters. The TLS secret has not changed since deployment as far as I can tell and was not regenerated. Did anyone have any idea whats wrong? As far as I can tell the TLS secret in the cluster from the helm chart should be valid for 10 years. |
Still happening. The controller version is |
I've fixed the problem by generating a cert manually and set it via |
I'm also seeing this on a very regular basis. At least 2-5 times a day. Environment:
It's happened 164 times and I think it started in August 2022 when I updated the helm chart to 1.4.4. With 1.4.3 or older it only happened once. That means it happened 163 times with a combo of 1.4.4, 1.4.5 and 1.4.6. I'm basing this on looking at the last year's worth of Kubernetes events that match the same error message as this issue. |
Still happening, EKS, installed via helm version 1.5.3 |
I got same error by using kubernetes.helm.v3.Chart(
"lb-dev",
kubernetes.helm.v3.ChartOpts(
chart="aws-load-balancer-controller",
fetch_opts=kubernetes.helm.v3.FetchOpts(
repo="https://aws.github.io/eks-charts"
),
namespace="aws-lb-controller-dev",
values={
"region": "us-west-2",
"serviceAccount": {
"name": "aws-lb-controller-serviceaccount",
"create": False,
},
"vpcId": vpc.vpc_id,
"clusterName": "cluster-dev",
"podLabels": {
"stack": stack,
"app": "aws-lb-controller"
},
"autoDiscoverAwsRegion": "true",
"autoDiscoverAwsVpcID": "true",
"keepTLSSecret": True,
},
), pulumi.ResourceOptions(
provider=eks_provider, parent=namespace
)
) ingress = kubernetes.networking.v1.Ingress(
f"app-ingress-dev",
metadata=kubernetes.meta.v1.ObjectMetaArgs(
name=f'ingress-dev',
namespace="aws-lb-controller-dev",
annotations={
"kubernetes.io/ingress.class": "alb",
"alb.ingress.kubernetes.io/target-type": "instance",
"alb.ingress.kubernetes.io/scheme": "internet-facing"
},
labels={'app': 'ingress-dev'},
),
spec=kubernetes.networking.v1.IngressSpecArgs(
rules=[kubernetes.networking.v1.IngressRuleArgs(
host='example.com',
http=kubernetes.networking.v1.HTTPIngressRuleValueArgs(
paths=[
kubernetes.networking.v1.HTTPIngressPathArgs(
path="/app1",
path_type="Prefix",
backend=kubernetes.networking.v1.IngressBackendArgs(
service=kubernetes.networking.v1.IngressServiceBackendArgs(
name=svc_01.metadata.name,
port=kubernetes.networking.v1.ServiceBackendPortArgs(
number=80,
),
),
),
),
],
),
)],
),
opts=pulumi.ResourceOptions(provider=eks_provider)
) Info kubectl describe ingress ingress-dev -n aws-lb-controller-dev
Name: ingress-name-dev
Labels: app=ingress-name-dev
app.kubernetes.io/managed-by=pulumi
Namespace: aws-lb-controller-dev
Address: k8s-awslbcon-ingressn-63280fface-152845941.us-west-2.elb.amazonaws.com
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
example.com
/app1 eks-service-dev-443996a5:80 (10.0.18.89:80)
Annotations: alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: instance
kubernetes.io/ingress.class: alb
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 23m (x14 over 24m) ingress Failed deploy model due to Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.aws-lb-controller-dev.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")
Normal SuccessfullyReconciled 22m (x3 over 3h7m) ingress Successfully reconciled |
@omidraha I had the same issue but solved it by using a |
I am facing the same issue with |
Have also just encountered this in several clusters and utilized a |
I faced the exact same issue, deploying aws-load-balancer-controller via argocd. |
One observation for those who use ArgoCD: If you have automatic sync for an application, but you ignore differences for a webhook: resource.customizations.ignoreDifferences.admissionregistration.k8s.io_MutatingWebhookConfiguration: |
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle
resource.customizations.ignoreDifferences.admissionregistration.k8s.io_ValidatingWebhookConfiguration: |
jqPathExpressions:
- .webhooks[]?.clientConfig.caBundle At some point, ArgoCD will update the secret with a TLS cert and skip updating webhook certificates, creating this issue of invalid certificates for a webhook. |
We are trying to migrate from
ingress-nginx
toaws-load-balancer-controller
. We are starting by just installing the controller chart. The plan is to template our applications to use the new ingress.classalb
and then migrate them.But after installing
aws-load-balancer-controller
, we are seeing errors on our existing applications like:cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca"): cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")
app1-ingress
still useskubernetes.io/ingress.class: nginx
. Can we skip the webhook from modifying those?The text was updated successfully, but these errors were encountered: