Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller can't attach ALB security group to eks cluster security group #1791

Closed
skerava opened this issue Feb 2, 2021 · 24 comments
Closed
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@skerava
Copy link

skerava commented Feb 2, 2021

Hi,
The alb controller provides two ingress annotations to work with AWS security groups to restrict traffic access.

alb.ingress.kubernetes.io/security-groups

alb.ingress.kubernetes.io/inbound-cidrs 

In my scenario, alb.ingress.kubernetes.io/security-groups is applied.

image

However, while trying to navigate to the ingress URL, the website application was not accessible with a 504 error. So I checked the target group health check and found that all the backend were in unhealthy status.

image

I noticed that the cause of those phenomena is that the security group of the EKS cluster did not include the alb security group so that the health check was unable to work normally.

On the other hand, if applying alb.ingress.kubernetes.io/inbound-cidrs, ALB would work well since the alb security is covered by the security group of the EKS cluster.

Is there any roadmap for our ALB controller project to fix the problem?

@schollii
Copy link

schollii commented Feb 2, 2021

I have same issue. The docs provide an explanation for this:

When this annotation is not present, the controller will automatically create one security groups: the security group will be attached to the LoadBalancer and allow access from inbound-cidrs to the listen-ports. Also, the securityGroups for Node/Pod will be modified to allow inbound traffic from this securityGroup.

The annotation supports multiple security groups being on ALB, so I can see why the controller does not automatically apply them to the nodes' security group. However the controller should automatically add the specified SG if there is only one in the annotation, AND it should support having some SG that are added automatically, and some that are not. This could be done with a second annotation such as

alb.ingress.kubernetes.io/security-groups-auto-add: [list]

If auto-add is not given, all security group in alb.ingress.kubernetes.io/security-groups list would be auto added.

Then the following annotations on an ingress would cause ALB to have SG A, B, C, but only B to be added. Ingress groups still work with this approach.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: your-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.name: group-name
    alb.ingress.kubernetes.io/security-groups: sg-A, sg-B, sg-C
    alb.ingress.kubernetes.io/security-groups-auto-add: sg-B

For now it seems that the workaround is to manually (or via terraform or such automation tool) add the ALB SG to the cluster SG.

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Feb 3, 2021

@skerava @schollii we currently don't support worker node securityGroup management when customized ALB securityGroup is specified(via the alb.ingress.kubernetes.io/security-groups annotation).

We'll review and work on features that support it.
/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 3, 2021
@M00nF1sh M00nF1sh added the kind/design Categorizes issue or PR as related to design. label Feb 3, 2021
@schollii
Copy link

schollii commented Feb 4, 2021

@M00nF1sh Cool, I'd be happy to help

@skerava
Copy link
Author

skerava commented Feb 4, 2021

@M00nF1sh great,my pleasure to help

@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Feb 10, 2021

also, i think we can change the default behavior, so that we reuse a single LB-securityGroup shared for all ALBs in the cluster, and create a another additional LB-securityGroup per ALB. the shared LB-securityGroup will be granted access to worker nodes at any port number in your cluster. while the additional LB-securityGroup per ALB is used to encode the inbound client CIDRs that is allowed to access ALB(the client CIDR might differs per ALB). The benefit of this approach is we'll only use a single worker node securityGroup rule no matter how many ALBs we have. The only downside of approach is we have a securityGroup that allowed to access worker node at any port number, some customers might treat this as a security threat. however, it looks secure to me as we'll only attach this security group to LoadBalancers.

We can use the same approach for NLBs too and share the same shared LB-securityGroup for both ALB/NLB.

What do you guys think? @schollii @skerava

@schollii
Copy link

@M00nF1sh I understand the possible need for different CIDR per ALB, makes sense, but can you clarify (maybe an example) the ingress and egress rules on the shared LB?

@hieu29791
Copy link

Any update on this? I tried to add ALB SG in alb.ingress.kubernetes.io/security-groups manually. But after a while, eks-cluster-sg-CLUSTER-NAME automatically deletes that SG.

@cmsmith7
Copy link

Hi .. we are experiencing the same issue as we now have the need to annotate the attachment of multiple SG's using the "security-groups" annotation to have ingress to our cluster via the worker node security group. This I would have thought would be default behaviour or otherwise I do not see the point of annotating SG's that ultimately do not have access/ingress to the cluster. Is there a clear view to implement this and if so do you have any idea when this might make its way out us. Many thanks.

@M00nF1sh
Copy link
Collaborator

Hi,
When using alb.ingress.kubernetes.io/security-groups, you should use a SG provisioned by yourself. Our current semantic is to allow users to manage all the SG rules if want to bring their own security group.
we'll design whether to allow a mixed mode post our v2.2.0 release.

@cmsmith7
Copy link

Hi,
Many thanks for your response, would you want to allow mixed mode or just create a new annotation name i.e alb.ingress.kubernetes.io/security-groups-managed-ingress that way you would not have a mixed mode per say but a specific annotation for ALB managing ingress keeping the functions isolated, regardless of how you do it I would say that I think it is a very much needed requirement so from our perspective we have one single/standard process [aws-load-balancer-controller] that manages the ingress to our clusters. Also if I understand correctly this will not be included in 2.2.0 but a subsequent version as it was relayed in my support ticket that it would be contained in 2.2.0?
Many thanks again for all your help.

@M00nF1sh
Copy link
Collaborator

@cmsmith7
It won't be included in v2.2.0. We are already start working on a design doc for improve security group handling, which should land in v2.3.0 😄

@cmsmith7
Copy link

Thanks for the confirmation, sad thats it's not going to be here sooner (from a selfish point of view lol) but all the same I'm incredibly pleased that it is now included in your road map and that it will be incorporated in a later version. Any inkling of a estimated target release date for 2.3.0? 😄

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2021
@rkennedy-tpl
Copy link

I think I am bumping into this exact issue. I'm defining "by hand" (via terraform) some "Large SGs with many CIDR rules" and I thought I could use annotation alb.ingress.kubernetes.io/security-groups to slap those SG Rules on top of a few ingresses. When I do so, I start getting failed healthchecks from ALB to my TargetGroups and it's very unclear why.

I think the ALB controller is managing one or more additional rules required for healthchecks to pass. I think it has to do with the SG described as EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.

@MadhavJivrajani
Copy link

/remove-kind design
/kind feature
kind/design is migrated to kind/feature, see kubernetes/community#6144 for more details

@k8s-ci-robot k8s-ci-robot removed the kind/design Categorizes issue or PR as related to design. label Oct 11, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kishorj kishorj removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 9, 2021
@kishorj
Copy link
Collaborator

kishorj commented Dec 14, 2021

/reopen

@k8s-ci-robot
Copy link
Contributor

@kishorj: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Dec 14, 2021
@rdeteix
Copy link

rdeteix commented Jan 4, 2022

Hi
Is there any news on this matter?

@kishorj
Copy link
Collaborator

kishorj commented Jan 28, 2022

Starring v2.3 release, controller supports the annotation alb.ingress.kubernetes.io/manage-backend-security-group-rules. When you set the annotation to "true", controller attaches the shared backend security group to the ALB, and create ingress rules on the Node/Pod SG to allow traffic from the shared SG.

Closing the issue.

@kishorj kishorj closed this as completed Jan 28, 2022
@gukwonku
Copy link

Starring v2.3 release, controller supports the annotation alb.ingress.kubernetes.io/manage-backend-security-group-rules. When you set the annotation to "true", controller attaches the shared backend security group to the ALB, and create ingress rules on the Node/Pod SG to allow traffic from the shared SG.

Closing the issue.

it works perfectly!

alb.ingress.kubernetes.io/security-groups: sg-xxxx
alb.ingress.kubernetes.io/manage-backend-security-group-rules: "true"

@rcfja
Copy link

rcfja commented May 2, 2024

Sheez yeah this solved my headache. I feel like it should be a default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests