Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB controller spontaneously loses permission to add tags #3383

Closed
cpetestewart opened this issue Sep 12, 2023 · 23 comments
Closed

LB controller spontaneously loses permission to add tags #3383

cpetestewart opened this issue Sep 12, 2023 · 23 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@cpetestewart
Copy link

cpetestewart commented Sep 12, 2023

Describe the bug
Recently, out of the blue, we started getting an error from the load balancer controller in our EKS clusters across several accounts. The specific error was that it did not have permission elasticloadbalancing:AddTags. Nothing had changed on our side. We did not upgrade the controller nor change the IAM role.

We traced the error to this clause in the IAM permissions:

            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:RemoveTags"
            ],
            "Resource": [
                "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
            ],
            "Condition": {
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },

The only thing that fixed this was removing the "Condition" clause. Then the controller operated as normally.

This may not be an issue with the LB controller, but before I go to AWS support with this, anyone have any clue as to what is causing this? Note that this is exactly what is currently in the LB controller repo. We have not changed this ever since it was first installed.

Steps to reproduce
We're not sure. As indicated above, this started happening out of the blue.

Expected outcome
I expect that if I make zero changes to the cluster, the controller deployment, and the IAM role that everything will continue to function as before.

Environment

  • AWS Load Balancer controller version: 2.4.5
  • Kubernetes version: 1.23
  • Using EKS (yes/no), if so version? yes, 1.23

Additional Context:

@mnort
Copy link

mnort commented Sep 12, 2023

pretty sure we ran into this today , still reviewing

no changes to cluster/perms/etc, suddently ingress creation fails for perms. last worked like a week ago

somewhat assume aws side changes

@KlausVii
Copy link

We also ran into this yesterday, I assume it is related to this issue #2692

Is that condition block blocking the controller from adding the tags that the condition requires?

@ghost
Copy link

ghost commented Sep 13, 2023

Same issue here with an (albeit EOL) cluster v1.22. No changes whatsoever to infra, won't work anymore unless replacing the condition block as shown in #2692 .

@LCaparelli
Copy link

LCaparelli commented Sep 13, 2023

We're hitting this on 2.4.4 on EKS 1.21 since September 8th, 19:00:00 UTC-0.

EDIT: we're hitting this in some 1.21 clusters, not all of them.

@elebiodaslingshot
Copy link

Ran into this using eks module terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks version 5.2.0 upgraded to 5.30.0 and correct iam policy resolved.

@oliviassss
Copy link
Collaborator

oliviassss commented Sep 13, 2023

Hi @cpetestewart, @elebiodaslingshot, @LCaparelli, @MichielVanDerWinden-inQdo, @KlausVii, @mnort, thanks for reaching out. This issue is related to a recent change in the AWS ELB api call - from 8/30/2023, the 'Create*' API call will fail and return an error if there's no access to elasticloadbalancing:AddTags. Not from aws load balancer controller side.

We have updated our IAM template to address this issue since v2.4.7. Can you please check that your iam policy is updated with this block: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/docs/install/iam_policy.json#L202.
You can check more info in our release note: https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.4.7
related issues: #2692

@rtripat
Copy link

rtripat commented Sep 13, 2023

For context - This change is made in ELB Create* APIs to add an additional layer of security where API callers such as AWS LoadBalancerController must have explicit access to add tags in their Identity and Access Management (IAM) policy [1]. Previously, access to attach tags was implicitly granted with access to Create* APIs.

[1] https://docs.aws.amazon.com/IAM/latest/UserGuide/access_tags.html

@andresvia
Copy link

This comment points to the solution and the doc was updated here I guess we just need to keep looking very close to the doc for changes/releases frequently 😸

I added that statement to the controller policy and that fixed the issue for me.

@blakebarnett
Copy link

This issue also appears when the AWS resources the controller is trying to reconcile no longer exist. It's a confusing error because it sends you on an IAM policy goose chase in this case.

@imZack
Copy link

imZack commented Sep 15, 2023

We have 0 infra changes and bumped into this issue 2 days ago. Our services were impacted with this unexpected changes.

This issue is related to a recent change in the AWS ELB api call - from 8/30/2023

@oliviassss Do you have a reference link for above ^^^ that I can explain the changes to my team? Thank you.

To add more data point here, we've deployed on 09/06/2023 without any issues. It doesn't make sense to me if the API changes from 8/30/2023.

@oliviassss
Copy link
Collaborator

oliviassss commented Sep 15, 2023

@imZack, since AWS load balancer controller calls createLoadBalancer and createTargetGroup APIs, even there's 0 change from our side, if the API changes, we will be affected.
I don't think there is public link regarding this security change unfortunately, but I have communicated with our ELB team internally, they would be able to help better clarify this issue soon.

@imZack
Copy link

imZack commented Sep 16, 2023

@oliviassss thank you for your response. Please help to escalate the concern to the ELB team. You can imagine the services without the Load Balancers running leads to lots of problems.

@luisfelipess
Copy link

Hello @imZack, I am L. Felipe from the Elastic Load Balancing team. As mentioned previously in this thread, this change is expected, and the final part of it occurred during the time period mentioned (September 7 - 12, 2023). This update requires explicit permissions for ELB APIs that include the ability to create tags when creating resources, e.g., tag-on-create APIs. This affects all APIs that can create or manipulate tags; CreateLoadBalancer, CreateTargetGroup, CreateListener and CreateRule. We made this change in June, 2023. As part of the rollout, we identified customers that would potentially be affected by the change, and notified them via the AWS Personal Health Dashboard (PHD). These customers were given additional time to update their systems before the change would be applied to their accounts. By September 12, 2023, all calls modifying or creating tags on ELBs were updated to require explicit permissions. Although we did notify customers which we identified as impacted, we did not include customers who were not using the tag-on-create APIs. AWS takes any change that could break or impact customer workloads seriously, and we try to minimize impact to customers whenever such a change is required. Security is one area we will consider such changes, and this change increases your security by preventing unauthorized use of tags, and bringing tag permissions via ELB APIs in line with tag use across all AWS services. We apologize for any confusion or impact this may have had to you or your applications.

@imZack
Copy link

imZack commented Sep 28, 2023

Thank you @luisfelipess for the further explanation. We do appreciate the effort that you and your team on the security aspect. I suggest AWS can well-documented these changes somewhere instead of only notifying potentially affected users on the PHD since there are tons of guides, blogs, and notes referring to the wrong usage.

@cpetestewart
Copy link
Author

@oliviassss The new changes work great, thanks.

@luisfelipess Thanks for the info. Please be advised that my company has 6 accounts that this change affected and not one got notified of the change.

@danvaida
Copy link

Hey @luisfelipess,

Thanks for chiming in on this. However, as @cpetestewart mentioned, I also have 20+ AWS accounts that didn't get any notification whatsoever about the mentioned change. I'm happy to get in touch with the support team and provide some account IDs and usage patterns, as it seems that the approach you're using to identify impacted customers is not entirely accurate.

@kevinchiu-mlse
Copy link

Hello, I have a 3 EKS clusters running k8s 1.26.8 and all are running the latest AWS LB controller app verison 2.6.1 installed via blueprints addons. One cluster has the the add tag error when creating a new ingress object.

Upon reviewing the IAM policy attached to the role the Load Balancer controller is assuming, it does not have the statement @andresvia posted a link to a few posts up. Ideally I don't manually add the policy statement as all our env's are automated and will overwrite drift. Should the app version 2.6.1 AWS Load Balancer include this fix?

@oliviassss
Copy link
Collaborator

@kevinchiu-mlse, hi, we have the updated IAM policy template since v2.4.7. However, when you upgrade the AWS LBC version, the IAM policy does not update automatically, since it's not a managed policy, and we rely on users to update it. Please see the release note: https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.4.7

@kevinchiu-mlse
Copy link

kevinchiu-mlse commented Oct 2, 2023

thanks @oliviassss . I see in the EKS Blueprints Addons 5.0 the policy is updated in and managed in the load balancer controller module, however on clusters still running 4.32.1 or older, the policy is outdated. Worst case is I can attach a custom policy to the LBC role.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests