Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign node public IPs from Elastic IP pool #440

Closed
aparamon opened this issue Jan 15, 2019 · 28 comments
Closed

Assign node public IPs from Elastic IP pool #440

aparamon opened this issue Jan 15, 2019 · 28 comments

Comments

@aparamon
Copy link

aparamon commented Jan 15, 2019

Why do you want this feature?
Currently, worker node EC2 instances are (by default) created with dynamic/volatile public IPs. This is often sub-optimal:

  1. Worker nodes may have to access private/out-of-AWS resources (most notably: private Docker registries) protected by white-list firewall rules.
    Worker nodes are expected to have specific public IPs.
  2. Accessing NodePort services from outside.
    Having predictable public node IPs eliminates need for proxies.

What feature/behavior/change do you want?
Provide an option to assign worker nodes public IPs from AWS Elastic IP pool.

One possible implementation outlined in
kubernetes/kops#3182 (comment)

@errordeveloper
Copy link
Contributor

To me this sounds like an operator could do this very nicely, I am actually not sure how this would fit into eksctl. Also, have you considered using an NLB? It's already available on Kubernetes via an annotation.

@aparamon
Copy link
Author

aparamon commented Jan 16, 2019

@errordeveloper Thank you for your prompt reply!

Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation. Also, it doesn't seem straightforward:
https://stackoverflow.com/questions/54202575/associate-elastic-ips-with-eks-worker-nodes
A room for some automation!

NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.

@errordeveloper
Copy link
Contributor

errordeveloper commented Jan 16, 2019

It sounds like you actually want to use pre-allocated EIPs, is that correct? We can provide an option for using a pre-allocated EIP for the NAT gateway very easily.

Indeed, it is possible to associate Elastic IPs manually, but that would have to be done after every scaling/node creation operation.

I didn't suggest to do this manually, by "operator" I mean a component running inside the cluster that would automatically attach pre-allocated EIPs to nodes (could also allocate new ones and attach those).
But if you want an EIP per-node, whether pre-allocated or not, I think this is best suited for a separate component anyhow, cluster autoscale maybe a better place to consider then eksctl itself.

NLB does solve the problem 2. accessing the services from outside, but not 1. accessing private resources behind white-listing firewalls.

Is that for egress? If you use --private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe conceptually somewhat different from what you had in mind. Also, this actually means that you will have on EIP to whitelist for a cluster, not one for each nodegroup... What do you think?

@aparamon
Copy link
Author

aparamon commented Jan 16, 2019

So you are saying that you'd like an EIP for each node in a given nodegroup?

That's how I do currently. It works reliably albeit wasteful on EIPs.

Is that for egress? If you use --private-node-networking, you will get an EIP which is there for the NAT gateway. And we always have that there, but it only gets used by nodes that are in the private subnets.
I suppose that might work? I understand that maybe somewhat suboptimal.

If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).
But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?

In any case, if allocating an EIP as part of ASG config is what you want, that can be done easily.

Regarding my suggestion about operator, I mean something that would attach pre-allocated EIPs whenever you create a service which specifies the EIP via an annotation.

I think I'm missing the actual details :-) I.e. what is ASG config and what annotations should be placed on what objects?

@errordeveloper
Copy link
Contributor

@aparamon I've updated my comment before I noticed your reply, you might want to re-read it. I gathered that there is no actual option for allocating EIPs as part of ASG (surprisingly).

@aparamon
Copy link
Author

Exactly: AWS::AutoScaling::LaunchConfiguration is missing that.
Method proposed in kubernetes/kops#3182 (comment) hooks into instance userdata.

@errordeveloper
Copy link
Contributor

errordeveloper commented Jan 16, 2019

If all worker nodes do share the same public EIP, that should work (at least for outcoming traffic; I'm not sure how NAT will actually resolve incoming traffic from public EIP to worker node NodePorts though).

I think you'd be looking to use two things:

  1. NAT gateway for egress
  2. NLB for ingress, which means either EIP per-service, or EIP for service that handles routing to more services internally (if you don't have too many EIPs to spare)

But I'm missing the actual configuration; could you please provide the command that specifies and attaches the EIP?

We don't have an option to pass pre-allocated EIP just yet, but it can be easily added.

With regards to NLB, see Kubernetes docs. I am not quite sure if they allow you to attach a pre-allocated IP or not exactly.

Method proposed in kubernetes/kops#3182 (comment) hooks into instance userdata.

We cannot do this, as one of the main design principle is to keep node bootstrap script as simple as possible with least number of input variables. The ideal place to do this would be in something like cluster autoscaler or a standalone operator. Hope this makes sense, also it'd mean you could actually re-use this in any Kubernetes cluster on EC2.

@errordeveloper
Copy link
Contributor

errordeveloper commented Jan 16, 2019

@aparamon are you on Slack, perhaps better to chat in real time? :)

@aparamon
Copy link
Author

I've just registered as aparamon

@aparamon
Copy link
Author

aparamon commented Jan 16, 2019

Indeed, hooking instance userdata is hackey; let's consider another possibility!

Currently, a NAT Gateway for private networks is created and assigned a freshly-acquired Elastic IP unconditionally:
https://github.com/weaveworks/eksctl/blob/ac0bbad34031a7f4292304eb44f653631c62392d/pkg/cfn/builder/vpc.go#L63-L83

An option to supplying existing EIP for the NAT Gateway will solve current issue.
Opting NAT Gateway out altogether seems actually useful too, as NAT Gateway incurs additional cost and is not required on default settings (without --private-node-networking). EKS Getting Started Guide doesn't mention NAT Gateway.

What about introducing config parameter --nat-gateway=VALUE (default true) with the following options:

  • false: do not create NAT Gateway
  • true: allocate Elastic IP and use it to create NAT Gateway
  • IP address or EIP id: use this allocated Elastic IP to create NAT Gateway

?

Potential extensions include multiple NAT Gateways, see #392

@mumoshu
Copy link
Contributor

mumoshu commented Jan 17, 2019

Hey! Just my two cents but --nat-gateway-eip=EIP_ALLOCATION_ID(But we don't want to add more config flags to eksctl) a.k.a:

natGateway:
  eip: <EIP_ALLOC_ID>

or

natGatewayEIP: <EIP_ALLOC_ID>

seem to make sense to me in eksctl.

Otherwise, I believe you can use pre-created VPC and subnets, NAT gateways so that you can provide those to eksctl with eksctl create cluster --vpc-public-subnets <subnet ids separated by commans> --vpc-private-subnets <subnet ids separated by commans>

For more sources of inspirations, I'd suggest looking into how this has been supported in an another tool.

A managed subnet with pre-created EIP:

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L970-L976

A managed subnet with pre-created NGW(w/ or w/o EIP. It doesn't matter to the tool):

https://github.com/kubernetes-incubator/kube-aws/blob/c50c2a030b47043f2064054248b0b0347abd283b/builtin/files/cluster.yaml.tmpl#L951-L957

@mumoshu
Copy link
Contributor

mumoshu commented Jan 17, 2019

I believe I understand the use-case that requires what's originally requested in this issue.

You basically need a reliable way to assign EIPs before kubelet talks to the apiserver, in order to build an ingress/egress gateway limited to have a specific set of EIPs.

It is used so that:

  • Your customer is able to configure their infrastructure so that they can accept ingress traffic only from those EIPs (The counter-part is the egress gateway w/ EIPs running on the eksctl-managed nodepool)
  • Your customer is able to configure their infrastructure so that they can allow egress traffic only to those EIPs (The counter-part is the ingress gateway w/ EIPs running on the eksctl-managed nodepool)

And you don't want to waste so much for additional NAT gateways and NLBs, or don't want to have a nodegroup per EIP which can results in many small nodegroups.

Implementation-wise, I think I have the same feeling as @errordeveloper:

The ideal place to do this would be in something like cluster autoscaler or a standalone operator

The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.

kubernetes-retired/kube-aws#219

I haven't tried it myself yet but as you've said, an k8s operator or a daemonset may be used instead if and only if you can reliably update the node IP address stored in K8S.

Would assigning EIPs and then restarting every kubelet from the operator/daemonset work, or maybe just updating the node object via k8s api...? I'm not certain of that yet.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 17, 2019

Regardless of the above, If you don't need so many EIPs, or are ok with creating a set of nodegroups per EIP, what @errordeveloper had summarized above would work best:

NAT gateway for egress
NLB for ingress, which means either EIP per-service, or EIP for service that handles routing to more services internally (if you don't have too many EIPs to spare)

@aparamon
Copy link
Author

aparamon commented Jan 17, 2019

@mumoshu Thanks for your comments!

Having individual node EIPs assigned before kubelets start communicating to apiserver is not a requirement, fortunately. It is only required that pods use IP from specific pre-defined pool when talking to the outside world, e.g. private Docker repositories.

So "NAT gateway for egress, NLB for ingress" sounds most simple and natural.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 17, 2019

@aparamon Thanks for claryfying!

Yes, "NAT gateway for egress, NLB for ingress" would allow you to prepare EIP before your pods start, which does solve your issue and making the extreme case I've summarized irrelevant.

Nice to see you found a simpler solution for your issue!

@aparamon
Copy link
Author

Upon cluster delete, It's important to make sure EIP is not released if it was not acquired when creating NAT Gateway.

@aparamon
Copy link
Author

Some considerations on

  1. Accessing NodePort services from outside.

One option that is generally working with eksctl out-of-the-box is LoadBalancer services. An AWS Load Balancer is created for every k8s service and is reachable from outside by reported ExternalIP (something like a218ece131a4011e9a0160683d1063c6-1044786145.eu-central-1.elb.amazonaws.com).
However, if services are expected to be allocated on specific IP/DNS, it becomes harder to set up. Also, the dynamism of Load Balancers makes it harder to control inbound access rules.

But there is another, apparently simpler alternative: NodePort services!
It is possible to do the following:

  1. Create a node group in public subnets.
    (-N=1 is sensible).
  2. Make the nodes dedicated load balancers:
    kubectl taint node -l alpha.eksctl.io/nodegroup-name=<group-name> dedicated=foo:NoSchedule dedicated=foo:NoExecute
  3. Allow all traffic from public subnets to private subnets (so dedicated load balancer nodes can access other worker nodes).
  4. Allow incoming traffic to public subnets (open ports 30000-32767).
  5. Assign Elastic IPs to dedicated load balancer instances.

Now you can access the services by <EIP>:<NodePort>!

What do you think of automating it? Maybe something like
eksctl create lb-nodegroup -N=2 --eip=<EIP>,<EIP>
or just
eksctl create lb-nodegroup
to allocate EIPs automatically, consistently with NAT Gateway?

It is possible to further refine the scheme, by creating dedicated "load balancer" subnets initially, along with current private and public subnets.

@aparamon
Copy link
Author

aparamon commented Jan 18, 2019

Apparently, most of above is covered by #419, #448, and #396.
The only remaining part is assignment of EIPs.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 21, 2019

The only workable solution I have found so far is to use userdata or custom systemd unit, so that you can attach an EIP before kubelet starts talking to the apiserver.

I'm withdrawing my previous statement. I think restarting kubelet isn't necessary, as kubelet would communicate with the apiserver via whatever public IP addr avaiable to the node. Either EIP or automatically assigned public IP would work.

apiserver would need kubelet access in order for things like kubectl logs, but it would use private ips.

eksctl create lb-nodegroup -N=2 --eip=,

This is cool!

But now, I believe we can implement it with a simple daemonset external to eksctl given the above.

The daemonset would work like the below:

  • Firstly we'd need feat: add taints to the node on creation #396 in order to add a label like eksctl.io/eip-from-pool: pool1 and taints like eksctl.io/eip-from-pool: pool1 and eksctl.io/waiting-for-eip: true to every dedicated load balancer node.
    • The label and the former taint is used to make the nodes dedicated to specific pods that requires EIPs from the pool.
    • The latter taint is used to postpone pod scheduling until an EIP is associated by the daemonset pod
  • The daemonset pod would tolerate both taints and run in hostNetwork, so that it is able to rely on node's IAM role to make AWS API calls for associating an unused EIP in the pool to the node.
  • Once an EIP is associated, the daemonset pod removes the latter taint(eksctl.io/waiting-for-eip: true) only, so that your app pods get scheduled to the node

@aparamon
Copy link
Author

aparamon commented Jan 21, 2019

@mumoshu The daemonset idea looks appealing!

I'm not sure about eksctl.io/waiting-for-eip: true taint though. Removing that in the end will not get apps pods scheduled, as eksctl.io/eip-from-pool: pool taint is still there. And my initial idea was that load-balancer nodes are dedicated, so no apps pods run on them.
Am I missing something?

@aparamon
Copy link
Author

aparamon commented Feb 5, 2019

Related: https://forums.aws.amazon.com/message.jspa?messageID=515725#613460 and the following comment hook into userdata.

@hden
Copy link

hden commented Apr 25, 2019

NAT with existing EIP looks awesome.

As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.

@Jaykah
Copy link

Jaykah commented Apr 25, 2019

NAT with existing EIP looks awesome.

As for assigning EIP from a work pool, we used a lambda function to routinely scan for new nodes.
The code is available here if anyone is interested.

@hden for EKS-NODE-POOL=foo , do we just list all IPs, comma separated?

The same question goes for EKS-IP-POOL=bar

Also, I see variables INETANCE_TAG_KEY and INETANCE_TAG_VALUE - should it be INSTANCE_TAG_... instead?

Also, how do you trigger the updates through CloudWatch?

Thanks!

@hden
Copy link

hden commented Apr 25, 2019

@Jaykah It's it's slightly off topic for this thread, so maybe we could move discuss it here?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Jan 27, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2021

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as completed Feb 1, 2021
@werebear73
Copy link

This seems like it is still needed or has something else been done on this?

@aairbag
Copy link

aairbag commented Mar 4, 2022

i am unable to see that a resolution was reached regarding adding this feature. if that is the case, please note there is still interest in being able to assign existing EIPs to NAT gateways and re-opening this issue would be much appreciated, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants