-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAT only provisioned in one AZ, even for multi-AZ node groups #392
Comments
Hi @whereisaaron, thanks for reporting this! Somehow I wasn't aware of NAT instances until now, the AWS docs that I was reading seemed to mention only NAT gateways when I was working on the feature. It is entirely doable to provide an option, but if there is no benefit from using NAT gateways (which looks like a legacy product), I'm all for switching to NAT instances. Please let me know if you are keen to help here. |
It is worth considering if even free tier t2.micro would be enough in most cases as NAT instance. |
Thanks @errordeveloper, I don't mind whether NAT Gateways or NAT Instances are deployed, but we absolutely need one per AZ with per-AZ routing tables, otherwise our multi-AZ control plane and multi-AZ node groups are pointless 😄 NAT Instances are technically the old way, where you run your own NAT, but AWS provides an AMI that is basically zero-config. The trade-off is it up to you to update the AMI's and, and scale the instance types up if necessary. But for most small clusters without 100's of nodes NAT Instances are more than adequate and vastly cheaper. This is because the base price is based on the instance type you choose, and there is no additional traffic charges. NAT Gateways are the newer, fully-managed option. AWS maintains them for you and they scale automatically, which is great. They problem is they are still per-AZ just like NAT instances, and AWS charges for every subnet and/or AZ you launch one in. In contrast, Google Cloud Platform has the same service for the same price (suspiciously the same 😄), but a single Cloud NAT 'instance' is highly-available across all subnets and zones in a region. So there no overhead for being highly available with Google, whereas AWS NAT Gateway pricing punishes you for that 😢 And both AWS and Google charge you a traffic premium on top of regular traffic charges, which are per-GB costs you don't need to pay with NAT Instances. So the vital change is, some form of NAT for every AZ. If sticking with NAT Gateways is easiest, that's fine. The nice-to-have change would be the option to deploy NAT instances instead of NAT Gateways. I say 'option' because people with unlimited budgets will probably prefer NAT Gateways. @milkowski yes T2.micro or T2.nano works fine. But I think T3 instances have better network performance than T2, plus multi-core, so preferable for NAT if available. |
Aaron, thanks for the insight!
When I looked at the AWS docs, it wasn't clear to me how multi-AZ NAT
gateway would be configured. There was a reference to that it is possible,
but they haven't showed how to achieve a multi-AZ setup. Do you know of any
docs/examples of hand? If not, should we ask folks at Amazon?
If we use NAT instances, could we run whatever daemon is needed as a
daemonset on a nodegroup, instead of having to manage EC2 instances
directly?
…On Fri, 4 Jan 2019, 5:04 pm Aaron Roydhouse, ***@***.***> wrote:
Thanks @errordeveloper <https://github.com/errordeveloper>,
I don't mind whether NAT Gateways or NAT Instances are deployed, but we
absolutely need one per AZ with per-AZ routing tables, otherwise our
multi-AZ control plane and multi-AZ node groups are pointless 😄
NAT Instances
<https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html>
are technically the old way, where you run your own NAT, but AWS provides
an AMI that is basically zero-config. The trade-off is it up to you to
update the AMI's and, and scale the instance types up if necessary. But for
most small clusters without 100's of nodes NAT Instances are more than
adequate and vastly cheaper. This is because the base price is based on the
instance type you choose, and there is no additional traffic charges.
NAT Gateways
<https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html>
are the newer, fully-managed option. AWS maintains them for you and they
scale automatically, which is great. They problem is they are still per-AZ
just like NAT instances, and AWS charges for *every* subnet and/or AZ you
launch one in. In contrast, Google Cloud Platform has the same service for
the same price (suspiciously the same 😄), *but* a single Cloud NAT
'instance' is highly-available across all subnets and zones in a region. So
there so overhead for being highly available with Google, whereas AWS NAT
Gateway pricing punishes you for that 😢 And both AWS and Google charge
you a traffic premium on top of regular traffic charges, which are per-GB
costs you don't need to pay with NAT Instances.
So the vital change is, some form of NAT for every AZ. If sticking NAT
Gateways are easiest, that's fine.
The nice-to-have would be the *option* to deploy NAT instances instead of
NAT Gateways. I say 'option' because people with unlimited budgets will
probably prefer NAT Gateways.
@milkowski <https://github.com/milkowski> yes T2.micro or T2.nano works
fine. But I think T3 instances have better network performance than T2,
plus multi-core, so preferable for NAT if available.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWSyg-sA6wcW-aNv-C5b1yt36yQU24ks5u_4mngaJpZM4ZpLtG>
.
|
For NAT instances, in the CF template you include for each AZ:
Here is a CF enample: The NAT 'daemon' is the Linux kernel, so you couldn't easily do that in-cluster. Maybe it is possible somehow but it is unlikely to be efficient and low-latency. |
So if one wants to have multi-AZ NAT Gateways, they can do that by
deploying one in each AZ, but they also need to make sure they have routes
setup accordingly.
It would be logical to extend current functionality to multi-AZ gateways to
start with. We can discuss NAT instance option separately, as it sounds
like there are few tradeoffs there that are not very straightforward. Also,
I believe user can have their own custom VPC setup the way they need it,
and pass subnet IDs to eksctl. It does seem like a pricing optimisation
that some folks may prefer, but those users probably know how to do it and
better decide on e.g. instance size tradeoff themeselves or use a unikernel
instead of Linux, or a commercially supported appliance.
…On Sat, 5 Jan 2019, 7:57 pm Aaron Roydhouse, ***@***.***> wrote:
For NAT instances, in the CF template you include for each AZ:
1. An instance using an AWS NAT AMI image in the public subnet, with
an attached EIP, and source IP address checking disabled.
2. A routing table that routes to the NAT instance for that AZ.
3. For all the public/private subnets you associate the routing table
for the AZ the subnet is in.
Here is a CF enample:
https://github.com/rcrelia/aws-mojo/blob/master/cloudformation/vpc-scenario-2-reference/aws-vpc-nat-instances.README.md
The NAT 'daemon' is the Linux kernel, so you couldn't easily do that
in-cluster. Maybe it is possible somehow but it is unlikely to be efficient
and low-latency.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS0BdqXpK-aXpWH2XtW-epwJIauqFks5vAQO1gaJpZM4ZpLtG>
.
|
Hi @errordeveloper, as we discussed above, while I created the following CloudFormation template (based on mostly someone else's template) to create what is required; 3 public subnets each with a NAT Gateway, and 3 private subnets each with their own routing table to route to the NAT Gateway for that AZ. https://gist.github.com/whereisaaron/7eb907d17d7a3bc4d50b9ab279107492 This is working great with This also reduces latency and cost from cross-AZ traffic, while ensuring a zone failure won't bring the cluster down. |
Thanks a lot, Aaron! I will take a look and see if this could get
incorporated in 0.1.21.
…On Wed, 6 Feb 2019, 5:49 am Aaron Roydhouse ***@***.*** wrote:
Hi @errordeveloper <https://github.com/errordeveloper>, as we discussed
above, while eksctl can create a VPC for you, it can't create a
high-availability private cluster because the single-AZ NAT is vulnerable
to zone failure.
I created the following CloudFormation template
<https://gist.github.com/whereisaaron/7eb907d17d7a3bc4d50b9ab279107492>
(based on mostly someone else's template
<https://github.com/stelligent/cloudformation_templates>) to create what
is required; 3 public subnets each with a NAT Gateway, and 3 private
subnets each with their own routing table to route to the NAT Gateway for
that AZ.
https://gist.github.com/whereisaaron/7eb907d17d7a3bc4d50b9ab279107492
This is working great with eksctl and the --private-subnets option. This
VPC template is not any more special that what ekscrl already does, it
just makes sure the NAT isn't a single point of failure. Nothing else is
required, as whenever a new node is launched, it will use the routing table
in its AZ and thus also use the NAT Gateway in its AZ.
This also reduces latency and cost from cross-AZ traffic, while ensuring a
zone failure won't bring the cluster down.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS-ufaeKuzV-oQzVS2ftaRT1dhLucks5vKmzygaJpZM4ZpLtG>
.
|
This is a tricky one due to the 5 elastic IP limit per VPC (default) and also the cost of the When running in production, I completely agree that redundancy should be there. The I like the Thanks |
@IPyandy, if you don't have multi-AZ NAT then you don't have a high-availability cluster anyway, so you can just deploy the whole cluster to one subnet in one AZ if you only want one NAT. But if people do want the option, it might be easier to offer a You are correct that NAT Gateways are expensive, and small NAT instances are often more than sufficient (see my costing example above). So it would be good to have the option of either. You don't need EIPs for NAT instances, you can just launch them with auto-assigned public IPs. Also you can just say what you are doing and request a higher EIP limit. After all NAT reduces the IP usage over public clusters. Lastly you only need one (set of) NAT gateways per VPC, and you can large numbers of |
@whereisaaron I do agree that if single NAT then no redundancy, though in testing and dev sometimes testing east-west is more important than having inbound redundancy. By this time your prod environment should be configured and have redundancy at the edge built in (NAT gateways, Internet Gateways and so forth). Sometimes testing the viability of node failure across AZs, amongst other use cases is still a good option for testing with a single NAT Gateway, this is not uncommon to do. If the purpose is to test redundancy at all points, then yes, of course, build the multiple gateways in. Whether it's an optional switch to add or reduce gateways really doesn't really concern me, mostly just making sure it is an option. Though from the perspective of engineers/devs turning clusters on-and-off there's the usability aspect as well. I'm not really a big fan of NAT instances and it would be unnecessary code as it's just not something anyone should be dealing with unless you have ops teams taking care of those. No idea how long AWS plans to support them (native ones) either. I think we do agree there should be an option, though I'll be curious as to what the maintainers think from a design perspective, I wouldn't mind tackling the code. @errordeveloper Would another issue be productive to discuss the actual design case? |
Yeah I can understand that people might want it, even if I'm not a fan 😄 I'm not using NAT instances are simple things they are really trouble free. The hassle is you have to update the images regularly and that means taking them out of service for a restart. It's not too bad, re-route an AZ to another AZ's NAT, update the first AZ's image, change the routing back. There are also more complex schemes involving reassigning ENI's. NAT Gateways are of course, no effort at all, but the data processing charges seems overpriced to me. It's not even vaguely competitive with doing it yourself with AWS own NAT images. |
It does sound like we should will need to have a flag that lets you control multiple vs single NAT gateway. I also wonder if we could find a way to easily add NAT gateways on-demand. Perhaps it should also be possible for someone to avoid creating private subnets. We now have a way for someone to modify cluster stack, so adding some resources to it after it was created should be easier now than it was some time ago. |
There are much easier ones, and if you don't have personal need, I wouldn't worry about trying to tackle this as the first issue. |
@whereisaaron true, I'm mostly using terraform for production VPC environments and But for quick, lets demo, dev or test something quick (being in the consulting side) eksctl right now is pretty great. |
@errordeveloper: "We now have a way for someone to modify cluster stack" |
We have 'eksctl utils update-cluster-stack'. It is of limited use at the
moment, but it is a departure from not having a way to enhance the stack
once it was deployed. We currently use it to add share SG to clusters that
don't have it. It is append-only right now, but can evolve in the future.
…On Thu, 21 Feb 2019, 10:59 pm Aaron Roydhouse ***@***.*** wrote:
@errordeveloper <https://github.com/errordeveloper>: "We now have a way
for someone to modify cluster stack"
Cool. what is that? You mean you can provide your own VPC, or this is a
new feature?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS4KgXLFnFZG4VS7MUswxAKXfouVQks5vPyTmgaJpZM4ZpLtG>
.
|
Oh, from the
|
I would also really appreciate the deployment of multi AZ NAT Gateways (in my opinion this should even be the default)... This is currently prevents the use of eksctl in our company and we will stick to kops for now. However eksctl makes a nice impression and we will strongly observe it. |
I second that feature request. Without configuring all components to be HA this won't be a production grade solution for many users. |
Started using eksctl recently only to realise NAT Gateways are not Multi AZ. My workaround would be to create VPC with Multi AZ NAT using cloudformation and setup the cluster using eksctl. |
I know its the opposite of this bug, but maybe someone can tell me, how to create a cluster without nat gateway? |
I like how aws-cdk handles this when creating a Vpc. If private subnets are created, then natgateways == numAZs unless natGateways is specified, then it deploys that number of gateways. This could be made into optional flags here. |
@errordeveloper I would like to have a go at this if it's up for grabs. :) |
thanks @whereisaaron for the cloudformation gist. I found this diagram as a visual reference as well. I like the idea proposed by @IPyandy
I feel like HA NAT by default would be best in order to try and prevent someone from accidentally creating a non HA production cluster. Accidentally creating HA staging/dev clusters will incur more cost, but at least you don't risk an outage by doing that and should be able to change it later once you've realized where all your money went :) |
Update version string
What happened?
I deployed a cluster with
eksctl create --node-private-networking
. A multi-AZ k8s control plane was created, and nodes were created in three AZs. However a NAT Gateway was only created in one AZ, and all subnets were routed through that one AZ. Because of that the loss of that one AZ compromises the whole cluster - at the very least the cluster can't pull images any more.What you expected to happen?
I expected that either NAT gateways or NAT instances would to be created in a public subnet for each AZ that the node group uses, and for the default route in all public/private subnets to go to the NAT gateway/instance in the same AZ. Thus maintaining the high availability for the cluster as a whole.
How to reproduce it?
Anything else we need to know?
I believe a workaround already exists, in that I can create my own VPC and subnets with per-AZ NAT Gateways and then use
--vpc-private-subnets
to deploy to those subnets?It would nice to have the option of deploying NAT instances rather than NAT gateways. Per-AZ AWS NAT Gateways are expensive, especially given the additional traffic charges, (e.g. 3 x AZ and 2TB traffic = $250/month in Sydney), versus per-AZ t3.micro NAT instances (e.g. 3 x t3.micro and 2TB traffic = $15/month in Sydney). So NAT gateways are ~15 times more expensive to run for small amounts of (mostly image pull) traffic. And it gets worse if you use a lot of traffic.
Per-AZ NAT gateways/instances does save a little money and latency on cross-AZ traffic charges.
Versions
Also include your version of
heptio-authenticator-aws
--> N/ALogs
The text was updated successfully, but these errors were encountered: