Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-load-balancer-webhook-service error for non alb Ingresses #2071

Closed
gazal-k opened this issue Jun 11, 2021 · 55 comments
Closed

aws-load-balancer-webhook-service error for non alb Ingresses #2071

gazal-k opened this issue Jun 11, 2021 · 55 comments

Comments

@gazal-k
Copy link

gazal-k commented Jun 11, 2021

We are trying to migrate from ingress-nginx to aws-load-balancer-controller. We are starting by just installing the controller chart. The plan is to template our applications to use the new ingress.class alb and then migrate them.

But after installing aws-load-balancer-controller, we are seeing errors on our existing applications like:

cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca"): cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

app1-ingress still uses kubernetes.io/ingress.class: nginx. Can we skip the webhook from modifying those?

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Jun 11, 2021

@gazal-k
It seems the webhook's CA is configured incorrectly. If you install from helm, did you reverted the webhookConfiguration/cert Secret to make them drift?

The webhook for Ingress is merely validating changes(for security features we included with IngressClassParams support), and it'll just permit any changes for other Ingresses using other IngressClasses.

It's fine to delete the ValidatingWebhookConfiguration if it's a single cluster use, but you should fix the certificate ideally.

@gazal-k
Copy link
Author

gazal-k commented Jun 11, 2021

I'm not entirely sure what may have happened. Because this wasn't me personally who installed the chart. @anilkumarpasupuleti do u know if we reverted webhookConfiguration/cert Secret during installation?

I just checked the installation instructions now. It's not a straight helm install, is it? Do the CRDs have to be done manually first? Can not doing that cause this issue?

@gazal-k
Copy link
Author

gazal-k commented Jun 11, 2021

@gazal-k
Copy link
Author

gazal-k commented Jun 11, 2021

We also checked this somewhat related #1909. Verified that the caBundle in the webhooks matched the Secret; aws-load-balancer-tls

@gazal-k
Copy link
Author

gazal-k commented Jun 11, 2021

It's fine to delete the ValidatingWebhookConfiguration if it's a single cluster use, but you should fix the certificate ideally.

Is that validatingwebhookconfigurations/aws-load-balancer-webhook? I'm no sure what you mean by "single cluster use". We plan on installing aws-load-balancer-controller on each of our clusters. Wasn't aware a single installation can manage across clusters. (not that we'd need it even so)

@anilkumarpasupuleti
Copy link

This issue does not exist on chart version 1.1.6, its on k8s v1.18

@kishorj
Copy link
Collaborator

kishorj commented Jun 14, 2021

@anilkumarpasupuleti, the validating webhook is included in chart v1.20 (app v2.2.0) and later.

@gazal-k
Copy link
Author

gazal-k commented Jun 14, 2021

We got the error in chart version 1.20. We tried downgrading and obviously, no validating webhook no errors.

Anil was just mentioning that we're trying all of this in EKS 1.18. (not to be confused with the chart version)

@kishorj
Copy link
Collaborator

kishorj commented Jun 16, 2021

@gazal-k
could you compare the caBundle configured in your validating webhook aws-load-balancer-webhook and the secret aws-load-balancer-tls?

caBundle configured in webhook

kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io aws-load-balancer-webhook -ojsonpath={.webhooks[0].clientConfig.caBundle}  | base64 -d  | openssl x509 -noout -text

CA certificate from the secret

kubectl get secret -n kube-system  aws-load-balancer-tls -ojsonpath="{.data.ca\.crt}" |base64 -d   |openssl x509 -noout -text

Also verify the webhook certificate is issued by the configured CA

get secret -n kube-system  aws-load-balancer-tls -ojsonpath="{.data.tls\.crt}" |base64 -d   |openssl x509 -noout -text

@gazal-k
Copy link
Author

gazal-k commented Jun 17, 2021

  • The caBundle in the webhook & CA certificate in the secret match.
  • webhook certificate is issued by configured CA.

I was able to verify installing & updating apps using other ingress controllers using helm & helmfile just fine. It seems like a hemlfile execution from our CI/CD pipeline is where we are seeing the error. This might be a tooling issue. We'll try updating our pipeline containers with newer versions of helm & helmfile & see if that helps.

Thanks @kishorj & @M00nF1sh

@darrenwhighamfd
Copy link

darrenwhighamfd commented Jun 17, 2021

I have also had this issue when using Polaris https://artifacthub.io/packages/helm/fairwinds-stable/polaris on version 4.0.4

Error: UPGRADE FAILED: cannot patch "polaris" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

We use cert-manager as detailed with the Polaris documentation and currently running EKS 1.20

@M00nF1sh
Copy link
Collaborator

@darrenwhighamfd Is this issue still happening in your cluster? have you checked whether the cert stored in secret matches the CA configured in aws-load-balancer-webhook ValidatingWebhookConfiguration?

@DmitriyStoyanov
Copy link

DmitriyStoyanov commented Jun 24, 2021

Faced with the same issue, with 2 ingress controllers installed on one cluster aws-load-balancer controller and nginx-ingress controller with
We have automated CI with updating every week for new versions from helm, and after some upgrade during creation of ingresses with annotation kubernetes.io/ingress.class: "nginx" we faced with issue:
aws-load-balancers validation webhook started to handle such requests, with error, related to certificate.

Error: UPGRADE FAILED: cannot patch "kube-prometheus-stack-grafana" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

Fixed the issue by uninstallation of helm chart aws-load-balancer controller and installing it back again.

Hm, just now I see, that issue spontaneously still exist during deploy of some helm charts with Ingress. Triggering upgrade again, deploy successfully. So issue still exist.
Looks like web hook validation do not check annotations such as kubernetes.io/ingress.class or ingressClass in Ingress spec

@darrenwhighamfd
Copy link

@M00nF1sh yes we still have this issue, currently we have had to revert to 1.1.6
Similar to @DmitriyStoyanov this issue did not every time happen but spontaneously. One out of every three or so runs. The cert and secret look to match.

@gazal-k
Copy link
Author

gazal-k commented Jun 24, 2021

I guess that's our experience too. Maybe it wasn't the tooling. It was just that the issue was sporadic. We had verified the certs and secrets matching; #2071 (comment)

@DmitriyStoyanov
Copy link

DmitriyStoyanov commented Jun 24, 2021

ha, found that uninstallation of helm chart, do not remove secrets.

aws-externaldns-token-wkqpc                      kubernetes.io/service-account-token   3      177d
aws-load-balancer-controller-token-spwhd         kubernetes.io/service-account-token   3      177d

So I've tried again

helm uninstall -n kube-system aws-load-balancer-controller
k delete secret -n kube-system aws-load-balancer-controller-token-spwhd
k delete secret -n kube-system aws-externaldns-token-wkqpc

and then install it again, by the way, it created 3 new secrets:

aws-externaldns-token-49mpg                          kubernetes.io/service-account-token   3      72s
aws-load-balancer-controller-token-rzcvj             kubernetes.io/service-account-token   3      86s
aws-load-balancer-tls                                kubernetes.io/tls                     3      49s

possibly it will help. After that deployment of our helm chart with ingress worked fine. Do not know if this fix whole problem or not

@kishorj
Copy link
Collaborator

kishorj commented Jun 30, 2021

@DmitriyStoyanov, @gazal-k, @darrenwhighamfd if you are able to reproduce the issue, could you please create a support ticket with AWS support with cluster ARN?

@acim
Copy link

acim commented Jul 19, 2021

Same problem here:

Error: UPGRADE FAILED: cannot patch "merged" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

@clayvan
Copy link

clayvan commented Jul 22, 2021

So I think I found my issue, and possibly what others here are seeing.

@DmitriyStoyanov was close in saying

Looks like web hook validation do not check annotations such as kubernetes.io/ingress.class or ingressClass in Ingress spec

But maybe the key here is that the Ingress spec uses ingressClassName not ingressClass.

The documentation https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/ingress_class/#deprecated-kubernetesioingressclass-annotation looks to be specific to the annotation for alb itself still being supported, but if you're using a secondary ingress controller (nginx or traefik) the annotation for those will cause the error reported above.

So the "fix" is to update any Ingress objects that are using the annotation style like kubernetes.io/ingress.class: traefik to using the new IngressClass spec like:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: default-backend-traefik
  namespace: ingress
spec:
  ingressClassName: traefik

and also define your IngressClass object.

apiVersion: networking.k8s.io/v1beta1
kind: IngressClass
metadata:
  name: traefik
spec:
  controller: traefik.io/ingress-controller

If I had to guess, the AWS LB controller is not checking the annotations for ingress class, and therefore defaulting your Ingress objects for traefik to be the ALB class and causing the error.

@clayvan
Copy link

clayvan commented Jul 23, 2021

Well, I take that back. I think it was in fact sporadic like others here are reporting. I don't know what the common pattern is though.

For me, we utilize spinnaker + helm template to generate a single manifest yaml file with all objects in it. This also includes those Ingress objects that are of the traefik/nginx class. If I remove these objects from the manifest, the kubectl apply succeeds just fine.

I'm at the point where I'm wondering if I need to break my helm chart up and have to separately deploy AWS LB controller (probably should have done this from the start).

Can everyone else confirm that they also only see this when applying a manifest file that includes both the AWS LB controller install AND ingress objects for nginx/traefik?

@gazal-k
Copy link
Author

gazal-k commented Jul 24, 2021

We use helmfile and specifically helmfile sync command to manage all applications within a cluster (some nesting of helmfiles). So it's like calling helm upgrade --install on all charts in the cluster when there's a change. The issue was sporadic for us.

We are currently just using the older version of the chart; 1.1.6. Our plan is to use albc for all our Ingresses, so we might upgrade once that is done. This might not be a workaround that everyone can use.

@DmitriyStoyanov
Copy link

DmitriyStoyanov commented Sep 6, 2021

It is interesting, today faced with this issue while deployment of helm chart with ingress with

annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:xxxxxxx:xxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx

and in logs
Error: UPGRADE FAILED: cannot patch "helm-chart-name" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

After retry, it is fixed as many times before.

So possible issue not only related to two nginx and alb ingress controllers we have in cluster, but also with aws acm certificate check?

@mahaffey
Copy link

mahaffey commented Sep 16, 2021

We are using an alb ingress with an acm cert. This error is happening periodically with no configuration changes.

log:

2021/09/16 06:56:31 http: TLS handshake error from 10.0.3.214:37640: remote error: tls: bad certificate
{"level":"error","ts":1631775391.605741,"logger":"controller","msg":"Reconciler error","controller":"service","name":"rabbitmq","namespace":"rabbitmq","error":"Internal error occurred: failed calling webhook \"vtargetgroupbinding.elbv2.k8s.aws\": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"aws-load-balancer-controller-ca\")"}

ingress:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:123456789:certificate/omit
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
    alb.ingress.kubernetes.io/healthcheck-port: "15672"
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "20"
    alb.ingress.kubernetes.io/inbound-cidrs: 1.1.1.1/32,omit...
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443},{"HTTP":80}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=300
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-2019-08
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/tags: omit
    alb.ingress.kubernetes.io/target-type: ip
    external-dns.alpha.kubernetes.io/hostname: rabbitmq.company.com
    kubernetes.io/ingress.class: alb
  labels:
    app.kubernetes.io/instance: rabbitmq
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rabbitmq
    helm.sh/chart: rabbitmq-8.22.1
  name: rabbitmq
  namespace: rabbitmq
spec:
  rules:
  - host: rabbitmq.company.com
    http:
      paths:
      - backend:
          serviceName: rabbitmq-stats
          servicePort: http-stats
        path: /*

@mahaffey
Copy link

mahaffey commented Sep 17, 2021

EDIT: this is not the case, I am still seeing the error. It is intermittent,

Update, I found that when I remove the annotation alb.ingress.kubernetes.io/target-type: ip this error ceases to occur

@acim
Copy link

acim commented Sep 18, 2021

Update, I found that when I remove the annotation alb.ingress.kubernetes.io/target-type: ip this error ceases to occur

I do not have this annotation at all but still facing the problem.

@acim
Copy link

acim commented Sep 18, 2021

I use helmfile and it shows diff of certificates on each run, like described in this issue: aws/eks-charts#347

There is non-documented value enableCertManager which is false by default so certificates get regenerated on each run of helmfile apply. I suspect that this may be the reason because if I run helmfile with --selector flag and update just on chart, the error seems not to appear. But when I run helmfile over the whole stack, it appears almost always. It can be that controller is not fast enough to reconfigure the certificates. Just guessing. And to clear up, I have no non-ALB ingresses, all are just ALB.

@mahaffey
Copy link

Update, I found that when I remove the annotation alb.ingress.kubernetes.io/target-type: ip this error ceases to occur

Update, I found that when I remove the annotation alb.ingress.kubernetes.io/target-type: ip this error ceases to occur

I do not have this annotation at all but still facing the problem.

Hey sorry about the confusion, I am still seeing this error!

@gartemiev
Copy link

gartemiev commented Oct 6, 2021

I'm also having this issue.

@kishorj @M00nF1sh , any ETA when you fix it?

@gazal-k
Copy link
Author

gazal-k commented Oct 6, 2021

Would #2264 resolve this issue?

@ruckc
Copy link

ruckc commented Oct 6, 2021

having it also

@booleanbetrayal
Copy link

Ran into this today. Fixed by pruning all related resources and re-installing.

@scardena
Copy link

I ran into this today after upgrading the helm deployment from APP version 2.0.1 to 2.2.4.
I had to remove the deployment and reinstall.

@henrysachs
Copy link

I just ran into this via the CDK HelmChartProvider. Solution was to try it 3 times... As this is done via Cloudformation the Stack just rolled back and I could do this easily but feels kinda hacky.

@tiwarisuraj9
Copy link

I face this issue on a regular basis while applying new/updating nginx ingress configured to use LB via AWS load balancer controller. The only solution I found is to delete all resources of AWS lb controller and reinstall. (I usually do it though argocd, so it's pretty quick reinstallation)

@caleb15
Copy link

caleb15 commented Dec 1, 2021

@kishorj I've created a ticket with AWS support. CaseID 9282832861

@Erlichooo
Copy link

@kishorj I've created a ticket with AWS support. CaseID 9282832861

Did you have any reply?

@caleb15
Copy link

caleb15 commented Dec 3, 2021

Yep, they said they were working on fixing it but couldn't share an ETA.

@alex-treebeard
Copy link

Hi there, quick data point on this one.

I find that I get this error when I install both the controller and a CRD which uses it in the same helm chart. I.e. my chart has the conroller as a subchart, and a targetgroup binding as a template.

Here is my error:

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.xxx.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

If I install the controller first, then install the targetgroupbinding after a delay, it works.

@gazal-k
Copy link
Author

gazal-k commented Dec 21, 2021

Reasonably confident this is the same as #2239.

We have set keepTLSSecret: true. Our GitOps pipeline that uses helmfile sync has executed a dozen times after the change and we have yet to see this issue.

@stijnbrouwers
Copy link

Reasonably confident this is the same as #2239.

We have set keepTLSSecret: true. Our GitOps pipeline that uses helmfile sync has executed a dozen times after the change and we have yet to see this issue.

Can confirm that this solution also fixed it for me.

@karanrn
Copy link

karanrn commented Mar 7, 2022

We are installing it through helm via ArgoCD. We set keepTLSSecret: True yet we periodically see the issue

@jakepage
Copy link

Reasonably confident this is the same as #2239.

We have set keepTLSSecret: true. Our GitOps pipeline that uses helmfile sync has executed a dozen times after the change and we have yet to see this issue.

Can confirm that this solution also fixed it for me.

I tried this, but the issue still keeps periodically popping up too

@vkirilenko
Copy link

Today I got the same error on an AWS cluster. I had to remove ns and reinstall everything from scratch.

@arthurk
Copy link

arthurk commented May 24, 2022

Still happening. aws-lb-controller version is latest 2.4.1 and keepTLSSecret is true.

@sidewinder12s
Copy link

I also ran into this today across multiple clusters. The TLS secret has not changed since deployment as far as I can tell and was not regenerated. Did anyone have any idea whats wrong?

As far as I can tell the TLS secret in the cluster from the helm chart should be valid for 10 years.

@carozuniga
Copy link

Still happening. The controller version is 2.4.3

@arthurk
Copy link

arthurk commented Sep 1, 2022

I've fixed the problem by generating a cert manually and set it via webhookTLS. Using the certManager integration also works, both fix the problem for me. I've seen a few other charts that generate TLS certs via webhook and all of them have a similar problems

@nickjj
Copy link

nickjj commented Jan 6, 2023

I'm also seeing this on a very regular basis. At least 2-5 times a day.

Environment:

  • EKS 1.23 with ACM managing the SSL certs (not cert-manager)
  • AWS Load Balancer Controller v2.4.5 installed through Helm (1.4.6) with mostly the defaults (initiated through Argo CD)

It's happened 164 times and I think it started in August 2022 when I updated the helm chart to 1.4.4. With 1.4.3 or older it only happened once. That means it happened 163 times with a combo of 1.4.4, 1.4.5 and 1.4.6. I'm basing this on looking at the last year's worth of Kubernetes events that match the same error message as this issue.

@LockedThread
Copy link

Still happening, EKS, installed via helm version 1.5.3

@omidraha
Copy link

omidraha commented Jun 29, 2023

I got same error by using kubernetes.helm.sh/v3.Chart in Pulumi.

kubernetes.helm.v3.Chart(
    "lb-dev",
    kubernetes.helm.v3.ChartOpts(
        chart="aws-load-balancer-controller",
        fetch_opts=kubernetes.helm.v3.FetchOpts(
            repo="https://aws.github.io/eks-charts"
        ),
        namespace="aws-lb-controller-dev",
        values={
            "region": "us-west-2",
            "serviceAccount": {
                "name": "aws-lb-controller-serviceaccount",
                "create": False,
            },
            "vpcId": vpc.vpc_id,
            "clusterName": "cluster-dev",
            "podLabels": {
                "stack": stack,
                "app": "aws-lb-controller"
            },
            "autoDiscoverAwsRegion": "true",
            "autoDiscoverAwsVpcID": "true",
            "keepTLSSecret": True,
        },
    ), pulumi.ResourceOptions(
        provider=eks_provider, parent=namespace
    )
)
ingress = kubernetes.networking.v1.Ingress(
    f"app-ingress-dev",
    metadata=kubernetes.meta.v1.ObjectMetaArgs(
        name=f'ingress-dev',
        namespace="aws-lb-controller-dev",
        annotations={
            "kubernetes.io/ingress.class": "alb",
            "alb.ingress.kubernetes.io/target-type": "instance",
            "alb.ingress.kubernetes.io/scheme": "internet-facing"

        },
        labels={'app': 'ingress-dev'},
    ),
    spec=kubernetes.networking.v1.IngressSpecArgs(
        rules=[kubernetes.networking.v1.IngressRuleArgs(
            host='example.com',
            http=kubernetes.networking.v1.HTTPIngressRuleValueArgs(
                paths=[
                    kubernetes.networking.v1.HTTPIngressPathArgs(
                        path="/app1",
                        path_type="Prefix",
                        backend=kubernetes.networking.v1.IngressBackendArgs(
                            service=kubernetes.networking.v1.IngressServiceBackendArgs(
                                name=svc_01.metadata.name,
                                port=kubernetes.networking.v1.ServiceBackendPortArgs(
                                    number=80,
                                ),
                            ),
                        ),
                    ),
                ],
            ),
        )],
    ),
    opts=pulumi.ResourceOptions(provider=eks_provider)
)

Info

kubectl describe ingress  ingress-dev -n aws-lb-controller-dev
Name:             ingress-name-dev
Labels:           app=ingress-name-dev
                  app.kubernetes.io/managed-by=pulumi
Namespace:        aws-lb-controller-dev
Address:          k8s-awslbcon-ingressn-63280fface-152845941.us-west-2.elb.amazonaws.com
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host         Path  Backends
  ----         ----  --------
  example.com  
               /app1   eks-service-dev-443996a5:80 (10.0.18.89:80)
Annotations:   alb.ingress.kubernetes.io/scheme: internet-facing
               alb.ingress.kubernetes.io/target-type: instance
               kubernetes.io/ingress.class: alb
Events:
  Type     Reason                  Age                 From     Message
  ----     ------                  ----                ----     -------
  Warning  FailedDeployModel       23m (x14 over 24m)  ingress  Failed deploy model due to Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.aws-lb-controller-dev.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")
  Normal   SuccessfullyReconciled  22m (x3 over 3h7m)  ingress  Successfully reconciled

@passcod
Copy link

passcod commented Sep 27, 2023

@omidraha I had the same issue but solved it by using a k8s.helm.v3.Release instead of a k8s.helm.v3.Chart

@adecchi-2inno
Copy link

I am facing the same issue with aws-load-balancer-controller:v2.6.2
Looks like the caBundle certificate is not updated at MutatingWebhookConfiguration and ValidatingWebhookConfiguration

@booleanbetrayal
Copy link

Have also just encountered this in several clusters and utilized a keepTLSSecret: false values strategy + complete chart teardown and re-install to mitigate.

@nadavshaham
Copy link

I faced the exact same issue, deploying aws-load-balancer-controller via argocd.
Fixed the issue by deleting the aws-load-balancer-tls secret and letting argo recreate it, then restarted the deployment.
Fixed the issue for me

@RomanBats
Copy link

One observation for those who use ArgoCD:

If you have automatic sync for an application, but you ignore differences for a webhook:

  resource.customizations.ignoreDifferences.admissionregistration.k8s.io_MutatingWebhookConfiguration: |
    jqPathExpressions:
    - .webhooks[]?.clientConfig.caBundle
  resource.customizations.ignoreDifferences.admissionregistration.k8s.io_ValidatingWebhookConfiguration: |
    jqPathExpressions:
    - .webhooks[]?.clientConfig.caBundle

At some point, ArgoCD will update the secret with a TLS cert and skip updating webhook certificates, creating this issue of invalid certificates for a webhook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests