[EKS] [IPv6 on instance TGs]: Add IPv6 support to instance-type target groups for EKS support #1653

eshicks4 · 2022-02-15T15:45:58Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Please add IPv6 support to instance-type target groups so that we can use EKS cluster autoscaling groups with ALBs/NLBs.

Which service(s) is this request for?
EKS, ELBs

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
EKS creates an autoscaling group onto which we can attach target groups; however, the new IPv6-based clusters don't bind NodePorts to the EC2 nodes' IPv4 IPs. We have dual-stack ELBs and IPv6-enabled EKS clusters but seem to be missing that connecting piece in-between.

Are you currently working around this issue?
We aren't. We're currently stuck with IPv4 clusters.

Additional context
An alternative could be to make EKS clusters dual-stack so we can use the ipFamilies & ipFamilyPolicy features. IPv6-only would be the default to avoid IP exhaustion but we could selectively bind IPv4 IPs as-needed.

Attachments
N/A

eshicks4 · 2022-02-15T15:49:35Z

See also case #9628572941

mikestef9 · 2022-02-15T16:05:43Z

Hey @eshicks4 this first needs to implemented by ALB and NLB. As called out here, ALB and NLB only support IP targeting mode for IPv6. Once they support instance mode, we can add support in the AWS Load Balancer Controller.

Any reason you can't use IP targeting mode?

eshicks4 · 2022-02-15T16:14:52Z

My understanding was that IP targeting mode required that you knew the IPs you would be targeting. Since the cluster is autoscaled and my service's target is running on all nodes as a daemonset, couldn't that list change?

mikestef9 · 2022-02-15T19:01:46Z

That's the job of the AWS LB Controller. It watches service endpoints in the cluster and auto updates ALB/NLB target groups with the latest list of pod IP addresses.

eshicks4 · 2022-02-15T21:11:00Z

Ok I think I have this design working - at least on my existing IPv4 setup. I'll need to rebuild to try it on an IPv6-only cluster.

One question, though, as this process involved quite a bit more than my original design. What kinds of benefits does it provide over a more simple one that just uses an instance-based TG (with IPv6 support) to connect an NLB to the cluster's autoscaling group?

Thanks

mikestef9 · 2022-02-16T16:36:47Z

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly uses VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

eshicks4 · 2022-02-16T17:47:57Z

That makes sense. This is sounding more like an ELB feature request. Should I switch it over to their queue instead?

stevehipwell · 2022-02-18T15:55:17Z

Instance mode load balancer can potentially go through an additional instance hop before the traffic gets to the pod. This adds a higher latency as compared to the case where load balancer can send the traffic directly to the pods. This is possible because VPC CNI directly used VPC IP addresses, so ALB/NLB can send traffic directly to pods and skip node ports and kube-proxy.

@mikestef9 don't forget that with instance mode you need the extra complexity of external traffic policy set to local if you have compliance policies for public traffic.

@eshicks4 assuming you're using the AWS Load Balancer Controller (which you should be as the the in tree controller is deprecated) either ALB IP backed ingress or NLB IP backed ingress controller services are the simplest solution.

eshicks4 · 2022-02-24T15:52:43Z

@stevehipwell while that may be the simplest solution that currently works for IPv6, I'm not sure I'd call it the simplest overall. Envoy runs as a daemonset in Project Contour's design so it's going to route to all available nodes anyway. An NLB configured to route to a static NodePort and auto-updated by the autoscaler doesn't require an ELB controller deployment or any of the IAM role setup that goes along with it. It works perfectly with IPv4 so, once IPv6 support is added to instance-based target groups, the only real benefit the controller has for us is the direct IP routing that bypasses kube-proxy.

stevehipwell · 2022-02-24T16:27:30Z

@eshicks4 I'd suggest that you could switch Contour to use deployments for Envoy and nlb-ip service annotations; this will allow you to use IPv6 and have a HA ingress (see pod readiness gates). I'm sure there are some limited cases where the daemonset and instance mode is better but I can't think of many cases where the pros outweigh the cons? Obviously you might have some of these so this is just a friendly suggestion.

eshicks4 · 2022-02-25T18:00:00Z

@stevehipwell Just the reduction in complexity really. (fewer moving parts to break, etc.) There may be other reasons but, in our case, Kubernetes is still pretty new and we just have more people familiar with AWS. That said, I have it all working & documented with the in-cluster ELB controller and no real reason to switch back since there are benefits to using it. That's why I'm thinking it's best to move this over to the ELB team's feature request bucket instead.

yann-soubeyrand · 2022-08-22T08:56:10Z

Helllo,

We’re more or less in the same situation as @eshicks4.

We’d like to attach our ingress nodes autoscaling group to our load balancer target group. The reason why we’d like to do so is the same: reduction of the complexity (no need to deploy the load balancer controller, one less piece which could break, etc).

Another reason is that, currently, the load balancer controller doesn’t handle the case where two clusters are behind the same target group, which is how we do some blue/green upgrades.

Also, to avoid the additional hop when using instance type target groups and node ports, we deploy our ingress controllers using host ports.

plaisted · 2022-10-10T23:40:00Z

I've been struggling to get the suggested alternative (ip based) solution working with ipv6. Are there guidelines anywhere for troubleshooting this? Logs from aws-load-balancer-controller seem okay without errors, The load balancer and target group are created but targets always stay unhealthy. Routing table has ipv6 routes, NACL are fully open for ipv4/v6, security groups wide open to test (in addition to rules created by the controller). I have latest EKS / CNI / controller versions. I have tried ALB / NLB and both have same result and have attempted to hit the service ipv6 endpoints directly as well from the EKS nodes and get refused. Service works perfect with a kubectl port-forward.

service description

Name:                     <name>
Namespace:                <namespace>
Labels:                   app.kubernetes.io/instance=<namespace>
Annotations:              service.beta.kubernetes.io/aws-load-balancer-ip-address-type: dualstack
                          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
                          service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-<id>, subnet-<id>
Selector:                 <selector>
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv6
IP:                       <address>:e8fb::5a07
IPs:                      <address>:e8fb::5a07
LoadBalancer Ingress:     <redacted>.elb.us-east-1.amazonaws.com
Port:                     <unset>  80/TCP
TargetPort:               5000/TCP
NodePort:                 <unset>  30538/TCP
Endpoints:                [<address>:bf0::1a]:5000,[<address>:bf0::1c]:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                  Age                From     Message
  ----    ------                  ----               ----     -------
  Normal  SuccessfullyReconciled  26m (x3 over 72m)  service  Successfully reconciled

edit: Turns out this was an issue with my app and what it was listening to. Traffic routed by kube-proxy and port-forward works fine when bound to localhost, when coming directly to the pod using the IP method it does not which should probably be obvious. I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

eshicks4 · 2022-11-07T21:45:22Z

I'll leave this here in case anyone else has the same issue and have server binding to https://localhost:port instead of https://[::]:port

I ran into this a few times too. The pods run IPv6-only so there is no 127.0.0.1 or 0.0.0.0 to bind to. Sometimes localhost works (depends on the container's /etc/hosts file) but it's generally been safer or even necessary to override app defaults and force it to bind listeners to [::1] or [::] instead.

xanather · 2022-12-29T04:01:21Z

I really need IPv6 ALB with IPv6 instance target groups for IPv6 native subnets. This feature being discussed requires that to be implemented first.

Is there a better place to express interest in this feature outside 'container' roadmap?

xanather · 2023-04-11T01:44:26Z

Any update on this feature? :)

NeilHanlon · 2023-06-19T17:46:24Z

o/ Waving from the void on this :)

mikestef9 · 2023-06-20T15:19:43Z

This is still dependent on ALB/NLB first adding service for instance IPv6 target groups (which is coming later this year). When that happens, we can add support in the controller.

nakrule · 2023-09-29T06:38:11Z

This feature has been implemented and is now available on AWS.

matthenry87 · 2023-10-01T00:35:46Z

This feature has been implemented and is now available on AWS.

My ASG refuses to add worker nodes to the target group. Did you get this working? Please provide a link to the merged PR that backs your claim. Otherwise you are being misleading..

sjastis · 2023-10-03T00:05:35Z

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

matthenry87 · 2023-10-03T02:52:50Z

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

That doesn't really meet my use case. We use a single ingress controller of type NodePort, and 1 internal and external NLB each.

I just wrote a Lambda to enable the primary IPv6 IP as each instance come up. Ideally the node group would have that as a feature that can be turned on, as it isn't something that can be specified in a custom launch template due to not having the CIDR yet.

matthenry87 · 2023-10-03T02:54:02Z

With the recent launch of support in ELB to register instances using IPv6 address([1]), you can use AWS load balancer (LB) controller to create ALB/NLB in instance type for IPv6. We recommend using AWS LB controller v2.5.1+ to get started.
[1] https://aws.amazon.com/about-aws/whats-new/2023/10/application-load-balancer-network-load-balancer-registering-instances-ipv6-targets/

@sjastis Otherwise the instances are never added to the target group due to not having a primary IPv6 IP.

yann-soubeyrand · 2023-10-03T13:24:29Z

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

matthenry87 · 2023-10-03T13:47:25Z

I confirm that there’s a hole in the racket here: EKS nodes cannot be registered to these new IPv6 instance type target groups due to missing primary IPv6 address. I tried setting PrimaryIpv6 to true in the NetworkInterfaces section of my custom launch template, but it gets somehow lost in translation when EKS creates its own launch template from mine. Is it planned to fix this? Should I open a new issue to track this @mikestef9?

Can't really set it on the launch template because it would require that the node IPv6 CIDR is already known, as it uses the first one in the block.

yann-soubeyrand · 2023-10-03T13:58:43Z

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

matthenry87 · 2023-10-03T14:26:09Z

I’m not sure to understand: in the launch template I was able to set Ipv6AddressCount to 1 (which, if I understand correctly, enables automatic IP address allocation) and PrimaryIpv6 to true (which, again, if I understand correctly, should make the firstly allocated address the primary), but I wasn’t able to test due to the limitation I described. Could you elaborate on why do you think it couldn’t work at all?

The CIDR ranges are automatically assigned to your nodes - they are not known in advance. When I tried to go in and manually create a launch template - w/ the primary IPv6 IP option set to true, it wouldn't not let me unless I specified the IPv6 CIDR in advance.

yann-soubeyrand · 2023-10-03T16:40:10Z

@matthenry87 I tried to modify the launch template generated by EKS to set PrimaryIpv6 to true and then modified the autoscaling group generated by EKS to use my new launch template version, and I was able to start nodes with a primary IPv6 address and they correctly attached to the target group I set on the autoscaling group.

EDIT: I also had to set Ipv6AddressCount to 1 in the launch template.

oliviassss · 2023-10-03T17:22:33Z

@yann-soubeyrand, thanks for confirming and glad it works for you.
@matthenry87, in order for the ALB/NLB instance target type to work in IPv6, the EC2 instance needs to have primary IPv6 address since the traffic is routed to instances using the primary private IP address specified in the primary network interface for the instance[1][2].
You can either assign it during launch, or from console. Please refer to the below docs[3][4] to assign the IPv6 primary address to your instance
Refs:

yann-soubeyrand · 2023-10-03T17:32:16Z

@oliviassss don’t get me wrong, what I did is a hacky test, I think we must never touch the launch template generated by EKS. However, the path to a fully working solution doesn’t seem so hard at first sight: either EKS should set PrimaryIpv6 to true in its generated launch template when the cluster is an IPv6 one, or EKS should keep the PrimaryIpv6 value set by the user in its custom launch template. The latter solution does put some burden on the user, though.

matthenry87 · 2023-10-03T18:14:48Z

@yann-soubeyrand @oliviassss What you're proposing is not possible to do in an automated way for if/when new nodes are brought up in the context of an autoscaling group.

The IPv6 CIDRs are assigned automatically/dynamically - so it's not possible to flip enablePrimaryIpV6 to true in the launch template.

We want to be able to have the autoscaling group behave the same way it does with IPv4. That means - that when the ASG lifecyle hook notifies the EKS service that a new node has launched, that EKS service should grab the instanceId, grab it's networkAdapterId, and then update the network adapter to have a primary IPv6 selected from it's already assigned IPv6 CIDR range.

Why make users have to manually edit the network adapter of every new node, after it's been automatically assigned it's CIDR range?

matthenry87 · 2023-10-03T18:17:21Z

@yann-soubeyrand Er I now understand you were able to get it working - but I don't want to use a custom launch template.

oliviassss · 2023-10-09T20:04:28Z

@yann-soubeyrand, just for my own understanding, by "launch template" do you mean the EKS template or CFN template? I think the PrimaryIPv6 is a new addition, so it may not be updated with all the tools.
I was also checking the CFN template but didn't find such param: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-networkinterface.html

yann-soubeyrand · 2023-10-10T12:40:25Z

@oliviassss we don’t use CloudFormation, we use Terraform. Here is what I did:

I created an EKS cluster with Terraform.
I created a launch template (which I’ll called the source launch template) with Terraform, setting Ipv6AddressCount to 1 and setting PrimaryIpv6 to true.
I created an EKS managed node group with Terraform using the previously created launch template.
I created an IPv6 target group of type instance with Terraform.
I tried to attach the autoscaling group generated by the EKS managed node group to the target group, which failed. The reason is that the instances created by the autoscaling group don’t have a primary IPv6.
I manually edited the launch template created by EKS for the managed node group to set Ipv6AddressCount to 1 and PrimaryIpv6 to true. These parameters hadn’t been reported from the source launch template.
I manually edited the autoscaling group to use the new launch template version.
I tried again to attach the autoscaling group to the target group, which worked this time.

From an external point of view, it seems that the best solution would be that EKS sets Ipv6AddressCount to 1 and PrimaryIpv6 to true in the launch template it generates from the source launch template, when the cluster is an IPv6 cluster.

In the meantime, I didn’t find a solution to make everything work using Terraform and I consider that I must not edit the EKS generated launch template.

oliviassss · 2023-10-10T17:57:37Z

@yann-soubeyrand, thanks for the details. There's an open issue in terraform for this missing field: hashicorp/terraform-provider-aws#33733

johngmyers · 2023-10-10T21:59:03Z

If you want to use a custom launch template for a managed node group and you want to use Terraform to create said launch template, you'll need the feature requested by that open issue.

I agree with @yann-soubeyrand that Managed Node Group's creation of default launch templates for IPv6 clusters needs improvement.

I have not yet looked into Karpenter's support for primary IPv6 addresses.

yann-soubeyrand · 2023-10-11T11:27:48Z

@oliviassss in addition to what @johngmyers said, even with the above Terraform issue fixed, there’s still the issue that EKS doesn’t take the Ipv6AddressCount and PrimaryIpv6 into account when it generates the launch template from the source launch template. In any case, something needs to be done on the EKS side.

sjastis · 2023-10-11T21:27:07Z

Agreed @johngmyers. Thanks for the feedback @yann-soubeyrand . We are tracking this as an enhancement for Managed Node Group and Karpenter.

yann-soubeyrand · 2023-11-30T21:18:21Z

Hi @sjastis, do you have any public pointer (like a GitHub issue) where we can track the progress?

yann-soubeyrand · 2024-02-13T14:35:42Z

Hi @sjastis, do you have news and/or pointers we can follow?

svz-ya · 2024-05-13T17:49:26Z

@oliviassss We faced the same issue and currently we have to perform post actions manually in Launch Templates for EKS Autoscaling Groups so Primary IPv6 flag is set for instances.
Do you have any open issue for EKS to solve this cumbersomeness?

yann-soubeyrand · 2024-07-09T21:26:36Z

Hi, could you please share details about the difficulty to fix this issue? It could help us being more understanding, because, from a user point of view, it seems to be “just” a matter of taking into account the value of PrimaryIpv6 which users set in the launch templates they pass when creating managed node groups. Hence a certain frustration facing this issue taking ages being addressed.

yann-soubeyrand · 2024-10-10T09:17:45Z

Hello, I see that this issue has been moved to shipped in the roadmap, what does it mean? I tried again setting Ipv6AddressCount and PrimaryIpv6 in the launch template of one of our EKS node groups, but in the launch template created from it by EKS, only Ipv6AddressCount is kept and PrimaryIpv6 is discarded. It would be really appreciated if someone from AWS could communicate on this issue 😉

mikestef9 · 2024-10-10T14:30:50Z

Checking with our managed node groups team again on this one. Do you know if same issue exists in Karpenter?

yann-soubeyrand · 2024-10-10T14:53:12Z

I’ve not tested Karpenter, unfortunately.

mikestef9 · 2024-10-12T04:51:01Z

We've merged the change to keep PrimaryIpv6 in launch template used by MNG. This should be rolled out globally within two weeks.

yann-soubeyrand · 2024-10-24T07:43:20Z

I confirm that attaching an IPv6 instance-type target group to the autoscaling group of an EKS managed node group now works as expected. In French we say « Mieux vaut tard que jamais ! ». Thanks!

mikestef9 · 2024-10-24T15:12:59Z

Good to hear. Will leave this issue open as we consider whether to make that the default behavior so setting in launch template is not necessary.

eshicks4 added the Proposed Community submitted issue label Feb 15, 2022

mikestef9 added the EKS Amazon Elastic Kubernetes Service label Feb 15, 2022

sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023

mikestef9 closed this as completed Oct 3, 2023

mikestef9 reopened this Oct 3, 2023

github-project-automation bot added this to containers-roadmap Aug 27, 2024

github-project-automation bot moved this to Shipped in containers-roadmap Aug 27, 2024

mikestef9 moved this from Shipped to Coming Soon in containers-roadmap Oct 10, 2024

haouc mentioned this issue Oct 24, 2024

Feat: Provisioning IPv6 prefix to LT if cluster is IPv6 aws/karpenter-provider-aws#7275

Merged

3 tasks

[EKS] [IPv6 on instance TGs]: Add IPv6 support to instance-type target groups for EKS support #1653

[EKS] [IPv6 on instance TGs]: Add IPv6 support to instance-type target groups for EKS support #1653

Comments

eshicks4 commented Feb 15, 2022 • edited Loading

Community Note

eshicks4 commented Feb 15, 2022

mikestef9 commented Feb 15, 2022

eshicks4 commented Feb 15, 2022

mikestef9 commented Feb 15, 2022

eshicks4 commented Feb 15, 2022

mikestef9 commented Feb 16, 2022 • edited Loading

eshicks4 commented Feb 16, 2022 • edited Loading

stevehipwell commented Feb 18, 2022

eshicks4 commented Feb 24, 2022

stevehipwell commented Feb 24, 2022

eshicks4 commented Feb 25, 2022

yann-soubeyrand commented Aug 22, 2022

plaisted commented Oct 10, 2022 • edited Loading

eshicks4 commented Nov 7, 2022

xanather commented Dec 29, 2022

xanather commented Apr 11, 2023

NeilHanlon commented Jun 19, 2023

mikestef9 commented Jun 20, 2023

nakrule commented Sep 29, 2023

matthenry87 commented Oct 1, 2023 • edited Loading

sjastis commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

yann-soubeyrand commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

yann-soubeyrand commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

yann-soubeyrand commented Oct 3, 2023 • edited Loading

oliviassss commented Oct 3, 2023 • edited Loading

yann-soubeyrand commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

matthenry87 commented Oct 3, 2023

oliviassss commented Oct 9, 2023 • edited Loading

yann-soubeyrand commented Oct 10, 2023

oliviassss commented Oct 10, 2023 • edited Loading

johngmyers commented Oct 10, 2023

yann-soubeyrand commented Oct 11, 2023

sjastis commented Oct 11, 2023

yann-soubeyrand commented Nov 30, 2023

yann-soubeyrand commented Feb 13, 2024

svz-ya commented May 13, 2024

yann-soubeyrand commented Jul 9, 2024

yann-soubeyrand commented Oct 10, 2024

mikestef9 commented Oct 10, 2024

yann-soubeyrand commented Oct 10, 2024

mikestef9 commented Oct 12, 2024

yann-soubeyrand commented Oct 24, 2024

mikestef9 commented Oct 24, 2024

eshicks4 commented Feb 15, 2022 •

edited

Loading

mikestef9 commented Feb 16, 2022 •

edited

Loading

eshicks4 commented Feb 16, 2022 •

edited

Loading

plaisted commented Oct 10, 2022 •

edited

Loading

matthenry87 commented Oct 1, 2023 •

edited

Loading

yann-soubeyrand commented Oct 3, 2023 •

edited

Loading

oliviassss commented Oct 3, 2023 •

edited

Loading

oliviassss commented Oct 9, 2023 •

edited

Loading

oliviassss commented Oct 10, 2023 •

edited

Loading