Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [IPv6 on instance TGs]: Add IPv6 support to instance-type target groups for EKS support #1653

Open
eshicks4 opened this issue Feb 15, 2022 · 48 comments
Labels
EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@eshicks4
Copy link

eshicks4 commented Feb 15, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Please add IPv6 support to instance-type target groups so that we can use EKS cluster autoscaling groups with ALBs/NLBs.

Which service(s) is this request for?
EKS, ELBs

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
EKS creates an autoscaling group onto which we can attach target groups; however, the new IPv6-based clusters don't bind NodePorts to the EC2 nodes' IPv4 IPs. We have dual-stack ELBs and IPv6-enabled EKS clusters but seem to be missing that connecting piece in-between.

Are you currently working around this issue?
We aren't. We're currently stuck with IPv4 clusters.

Additional context
An alternative could be to make EKS clusters dual-stack so we can use the ipFamilies & ipFamilyPolicy features. IPv6-only would be the default to avoid IP exhaustion but we could selectively bind IPv4 IPs as-needed.

Attachments
N/A

@eshicks4 eshicks4 added the Proposed Community submitted issue label Feb 15, 2022
@eshicks4
Copy link
Author

See also case #9628572941

@mikestef9 mikestef9 added the EKS Amazon Elastic Kubernetes Service label Feb 15, 2022
@mikestef9
Copy link
Contributor

Hey @eshicks4 this first needs to implemented by ALB and NLB. As called out here, ALB and NLB only support IP targeting mode for IPv6. Once they support instance mode, we can add support in the AWS Load Balancer Controller.

Any reason you can't use IP targeting mode?

@eshicks4
Copy link
Author

My understanding was that IP targeting mode required that you knew the IPs you would be targeting. Since the cluster is autoscaled and my service's target is running on all nodes as a daemonset, couldn't that list change?

@mikestef9
Copy link
Contributor

That's the job of the AWS LB Controller. It watches service endpoints in the cluster and auto updates ALB/NLB target groups with the latest list of pod IP addresses.

@eshicks4
Copy link
Author

Ok I think I have this design working - at least on my existing IPv4 setup. I'll need to rebuild to try it on an IPv6-only cluster.

One question, though, as this process involved quite a bit more than my original design. What kinds of benefits does it provide over a more simple one that just uses an instance-based TG (with IPv6 support) to connect an NLB to the cluster's autoscaling group?

Thanks

@mikestef9
Copy link
Contributor

mikestef9 commented Feb 16, 2022

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly uses VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

@eshicks4
Copy link
Author

eshicks4 commented Feb 16, 2022

That makes sense. This is sounding more like an ELB feature request. Should I switch it over to their queue instead?

@stevehipwell
Copy link

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly used VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

@mikestef9 don't forget that with instance mode you need the extra complexity of external traffic policy set to local if you have compliance policies for public traffic.

@eshicks4 assuming you're using the AWS Load Balancer Controller (which you should be as the the in tree controller is deprecated) either ALB IP backed ingress or NLB IP backed ingress controller services are the simplest solution.

@eshicks4
Copy link
Author

@stevehipwell while that may be the simplest solution that currently works for IPv6, I'm not sure I'd call it the simplest overall. Envoy runs as a daemonset in Project Contour's design so it's going to route to all available nodes anyway. An NLB configured to route to a static NodePort and auto-updated by the autoscaler doesn't require an ELB controller deployment or any of the IAM role setup that goes along with it. It works perfectly with IPv4 so, once IPv6 support is added to instance-based target groups, the only real benefit the controller has for us is the direct IP routing that bypasses kube-proxy.

@stevehipwell
Copy link

@eshicks4 I'd suggest that you could switch Contour to use deployments for Envoy and nlb-ip service annotations; this will allow you to use IPv6 and have a HA ingress (see pod readiness gates). I'm sure there are some limited cases where the daemonset and instance mode is better but I can't think of many cases where the pros outweigh the cons? Obviously you might have some of these so this is just a friendly suggestion.

@eshicks4
Copy link
Author

@stevehipwell Just the reduction in complexity really. (fewer moving parts to break, etc.) There may be other reasons but, in our case, Kubernetes is still pretty new and we just have more people familiar with AWS. That said, I have it all working & documented with the in-cluster ELB controller and no real reason to switch back since there are benefits to using it. That's why I'm thinking it's best to move this over to the ELB team's feature request bucket instead.

@yann-soubeyrand
Copy link

Helllo,

We’re more or less in the same situation as @eshicks4.

We’d like to attach our ingress nodes autoscaling group to our load balancer target group. The reason why we’d like to do so is the same: reduction of the complexity (no need to deploy the load balancer controller, one less piece which could break, etc).

Another reason is that, currently, the load balancer controller doesn’t handle the case where two clusters are behind the same target group, which is how we do some blue/green upgrades.

Also, to avoid the additional hop when using instance type target groups and node ports, we deploy our ingress controllers using host ports.

@plaisted
Copy link

plaisted commented Oct 10, 2022

I've been struggling to get the suggested alternative (ip based) solution working with ipv6. Are there guidelines anywhere for troubleshooting this? Logs from aws-load-balancer-controller seem okay without errors, The load balancer and target group are created but targets always stay unhealthy. Routing table has ipv6 routes, NACL are fully open for ipv4/v6, security groups wide open to test (in addition to rules created by the controller). I have latest EKS / CNI / controller versions. I have tried ALB / NLB and both have same result and have attempted to hit the service ipv6 endpoints directly as well from the EKS nodes and get refused. Service works perfect with a kubectl port-forward.

service description

Name:                     <name>
Namespace:                <namespace>
Labels:                   app.kubernetes.io/instance=<namespace>
Annotations:              service.beta.kubernetes.io/aws-load-balancer-ip-address-type: dualstack
                          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
                          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-<id>, subnet-<id>
Selector:                 <selector>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv6
IP:                       <address>:e8fb::5a07
IPs:                      <address>:e8fb::5a07
LoadBalancer Ingress:     <redacted>.elb.us-east-1.amazonaws.com
Port:                     <unset>  80/TCP
TargetPort:               5000/TCP
NodePort:                 <unset>  30538/TCP
Endpoints:                [<address>:bf0::1a]:5000,[<address>:bf0::1c]:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                  Age                From     Message
  ----    ------                  ----               ----     -------
  Normal  SuccessfullyReconciled  26m (x3 over 72m)  service  Successfully reconciled

edit: Turns out this was an issue with my app and what it was listening to. Traffic routed by kube-proxy and port-forward works fine when bound to localhost, when coming directly to the pod using the IP method it does not which should probably be obvious. I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

@eshicks4
Copy link
Author

eshicks4 commented Nov 7, 2022

I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

I ran into this a few times too. The pods run IPv6-only so there is no 127.0.0.1 or 0.0.0.0 to bind to. Sometimes localhost works (depends on the container's /etc/hosts file) but it's generally been safer or even necessary to override app defaults and force it to bind listeners to [::1] or [::] instead.

@xanather
Copy link

I really need IPv6 ALB with IPv6 instance target groups for IPv6 native subnets. This feature being discussed requires that to be implemented first.

Is there a better place to express interest in this feature outside 'container' roadmap?

@xanather
Copy link

Any update on this feature? :)

@NeilHanlon
Copy link

o/ Waving from the void on this :)

@mikestef9
Copy link
Contributor

This is still dependent on ALB/NLB first adding service for instance IPv6 target groups (which is coming later this year). When that happens, we can add support in the controller.

@sjastis sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023
@nakrule
Copy link

nakrule commented Sep 29, 2023

This feature has been implemented and is now available on AWS.

@matthenry87
Copy link

matthenry87 commented Oct 1, 2023

This feature has been implemented and is now available on AWS.

My ASG refuses to add worker nodes to the target group. Did you get this working? Please provide a link to the merged PR that backs your claim. Otherwise you are being misleading..

@sjastis
Copy link

sjastis commented Oct 3, 2023

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

@matthenry87
Copy link

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

That doesn't really meet my use case. We use a single ingress controller of type NodePort, and 1 internal and external NLB each.

I just wrote a Lambda to enable the primary IPv6 IP as each instance come up. Ideally the node group would have that as a feature that can be turned on, as it isn't something that can be specified in a custom launch template due to not having the CIDR yet.

@matthenry87
Copy link

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

@sjastis Otherwise the instances are never added to the target group due to not having a primary IPv6 IP.

@yann-soubeyrand
Copy link

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

@matthenry87
Copy link

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

Can't really set it on the launch template because it would require that the node IPv6 CIDR is already known, as it uses the first one in the block.

@yann-soubeyrand
Copy link

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

@matthenry87
Copy link

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

The CIDR ranges are automatically assigned to your nodes - they are not known in advance. When I tried to go in and manually create a launch template - w/ the primary IPv6 IP option set to true, it wouldn't not let me unless I specified the IPv6 CIDR in advance.

@mikestef9 mikestef9 reopened this Oct 3, 2023
@yann-soubeyrand
Copy link

yann-soubeyrand commented Oct 3, 2023

@matthenry87 I tried to modify the launch template generated by EKS to set PrimaryIpv6 to true and then modified the autoscaling group generated by EKS to use my new launch template version, and I was able to start nodes with a primary IPv6 address and they correctly attached to the target group I set on the autoscaling group.

EDIT: I also had to set Ipv6AddressCount to 1 in the launch template.

@oliviassss
Copy link

oliviassss commented Oct 3, 2023

@yann-soubeyrand, thanks for confirming and glad it works for you.
@matthenry87, in order for the ALB/NLB instance target type to work in IPv6, the EC2 instance needs to have primary IPv6 address since the traffic is routed to instances using the primary private IP address specified in the primary network interface for the instance[1][2].
You can either assign it during launch, or from console. Please refer to the below docs[3][4] to assign the IPv6 primary address to your instance
Refs:

  1. https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#target-type
  2. https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#target-type
  3. https://docs.aws.amazon.com/vpc/latest/userguide/vpc-migrate-ipv6.html#vpc-migrate-assign-ipv6-address
  4. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html#ipv6-addressing

@yann-soubeyrand
Copy link

@oliviassss don’t get me wrong, what I did is a hacky test, I think we must never touch the launch template generated by EKS. However, the path to a fully working solution doesn’t seem so hard at first sight: either EKS should set PrimaryIpv6 to true in its generated launch template when the cluster is an IPv6 one, or EKS should keep the PrimaryIpv6 value set by the user in its custom launch template. The latter solution does put some burden on the user, though.

@matthenry87
Copy link

@yann-soubeyrand @oliviassss What you're proposing is not possible to do in an automated way for if/when new nodes are brought up in the context of an autoscaling group.

The IPv6 CIDRs are assigned automatically/dynamically - so it's not possible to flip enablePrimaryIpV6 to true in the launch template.

We want to be able to have the autoscaling group behave the same way it does with IPv4. That means - that when the ASG lifecyle hook notifies the EKS service that a new node has launched, that EKS service should grab the instanceId, grab it's networkAdapterId, and then update the network adapter to have a primary IPv6 selected from it's already assigned IPv6 CIDR range.

Why make users have to manually edit the network adapter of every new node, after it's been automatically assigned it's CIDR range?

@matthenry87
Copy link

@yann-soubeyrand Er I now understand you were able to get it working - but I don't want to use a custom launch template.

@oliviassss
Copy link

oliviassss commented Oct 9, 2023

@yann-soubeyrand, just for my own understanding, by "launch template" do you mean the EKS template or CFN template? I think the PrimaryIPv6 is a new addition, so it may not be updated with all the tools.
I was also checking the CFN template but didn't find such param: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-networkinterface.html

@yann-soubeyrand
Copy link

@oliviassss we don’t use CloudFormation, we use Terraform. Here is what I did:

  1. I created an EKS cluster with Terraform.
  2. I created a launch template (which I’ll called the source launch template) with Terraform, setting Ipv6AddressCount to 1 and setting PrimaryIpv6 to true.
  3. I created an EKS managed node group with Terraform using the previously created launch template.
  4. I created an IPv6 target group of type instance with Terraform.
  5. I tried to attach the autoscaling group generated by the EKS managed node group to the target group, which failed. The reason is that the instances created by the autoscaling group don’t have a primary IPv6.
  6. I manually edited the launch template created by EKS for the managed node group to set Ipv6AddressCount to 1 and PrimaryIpv6 to true. These parameters hadn’t been reported from the source launch template.
  7. I manually edited the autoscaling group to use the new launch template version.
  8. I tried again to attach the autoscaling group to the target group, which worked this time.

From an external point of view, it seems that the best solution would be that EKS sets Ipv6AddressCount to 1 and PrimaryIpv6 to true in the launch template it generates from the source launch template, when the cluster is an IPv6 cluster.

In the meantime, I didn’t find a solution to make everything work using Terraform and I consider that I must not edit the EKS generated launch template.

@oliviassss
Copy link

oliviassss commented Oct 10, 2023

@yann-soubeyrand, thanks for the details. There's an open issue in terraform for this missing field: hashicorp/terraform-provider-aws#33733

@johngmyers
Copy link

If you want to use a custom launch template for a managed node group and you want to use Terraform to create said launch template, you'll need the feature requested by that open issue.

I agree with @yann-soubeyrand that Managed Node Group's creation of default launch templates for IPv6 clusters needs improvement.

I have not yet looked into Karpenter's support for primary IPv6 addresses.

@yann-soubeyrand
Copy link

@oliviassss in addition to what @johngmyers said, even with the above Terraform issue fixed, there’s still the issue that EKS doesn’t take the Ipv6AddressCount and PrimaryIpv6 into account when it generates the launch template from the source launch template. In any case, something needs to be done on the EKS side.

@sjastis
Copy link

sjastis commented Oct 11, 2023

Agreed @johngmyers. Thanks for the feedback @yann-soubeyrand . We are tracking this as an enhancement for Managed Node Group and Karpenter.

@yann-soubeyrand
Copy link

Hi @sjastis, do you have any public pointer (like a GitHub issue) where we can track the progress?

@yann-soubeyrand
Copy link

Hi @sjastis, do you have news and/or pointers we can follow?

@svz-ya
Copy link

svz-ya commented May 13, 2024

@oliviassss We faced the same issue and currently we have to perform post actions manually in Launch Templates for EKS Autoscaling Groups so Primary IPv6 flag is set for instances.
Do you have any open issue for EKS to solve this cumbersomeness?

@yann-soubeyrand
Copy link

Hi, could you please share details about the difficulty to fix this issue? It could help us being more understanding, because, from a user point of view, it seems to be “just” a matter of taking into account the value of PrimaryIpv6 which users set in the launch templates they pass when creating managed node groups. Hence a certain frustration facing this issue taking ages being addressed.

@yann-soubeyrand
Copy link

Hello, I see that this issue has been moved to shipped in the roadmap, what does it mean? I tried again setting Ipv6AddressCount and PrimaryIpv6 in the launch template of one of our EKS node groups, but in the launch template created from it by EKS, only Ipv6AddressCount is kept and PrimaryIpv6 is discarded. It would be really appreciated if someone from AWS could communicate on this issue 😉

@mikestef9 mikestef9 moved this from Shipped to Coming Soon in containers-roadmap Oct 10, 2024
@mikestef9
Copy link
Contributor

Checking with our managed node groups team again on this one. Do you know if same issue exists in Karpenter?

@yann-soubeyrand
Copy link

I’ve not tested Karpenter, unfortunately.

@mikestef9
Copy link
Contributor

We've merged the change to keep PrimaryIpv6 in launch template used by MNG. This should be rolled out globally within two weeks.

@yann-soubeyrand
Copy link

I confirm that attaching an IPv6 instance-type target group to the autoscaling group of an EKS managed node group now works as expected. In French we say « Mieux vaut tard que jamais ! ». Thanks!

@mikestef9
Copy link
Contributor

Good to hear. Will leave this issue open as we consider whether to make that the default behavior so setting in launch template is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
Status: Coming Soon
Development

No branches or pull requests