-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for private subnet instance groups with NAT Gateway #428
Comments
cc: @kris-nova |
Hi Guys, I have create a sample kops cluster named "k8s-test-evironment-com" in 3 AWS AZ's(eu-west) and output this to terraform, then i have start to manage the routing in an extra file "isolated_cluster_sample.tf". My working sample terraform code for isolated nodes is an addition of the generated subnet by kops. Also the NAT gateways are allocated to the generated public networks. For testing, you need following steps:
start file "isolated_cluster_sample.tf" resource "aws_subnet" "eu-west-1a-k8s-test-evironment-com_private" { resource "aws_subnet" "eu-west-1b-k8s-test-evironment-com_private" { resource "aws_subnet" "eu-west-1c-k8s-test-evironment-com_private" { #----------------------------------------- #------------------------------------------------------- resource "aws_eip" "nat-1a" { resource "aws_eip" "nat-1b" { resource "aws_eip" "nat-1c" { resource "aws_nat_gateway" "gw-1a" { resource "aws_nat_gateway" "gw-1b" { resource "aws_nat_gateway" "gw-1c" { #------------------------------------------------------- resource "aws_route" "0-0-0-0--nat-1a" { resource "aws_route" "0-0-0-0--nat-1b" { resource "aws_route" "0-0-0-0--nat-1c" { resource "aws_route_table" "k8s-test-evironment-com_private_1a" { resource "aws_route_table" "k8s-test-evironment-com_private_1b" { resource "aws_route_table" "k8s-test-evironment-com_private_1c" { resource "aws_route_table_association" "eu-west-1a-k8s-test-evironment-com_private" { resource "aws_route_table_association" "eu-west-1b-k8s-test-evironment-com_private" { resource "aws_route_table_association" "eu-west-1c-k8s-test-evironment-com_private" { #------------------------------------------------------- |
Hello! Thank you for the work done so far. This is far better than everything I've seen so far for spinning up K8s clusters on AWS. I would very much like to see this implemented. My company uses a predefined, well designed, network topology that kind of matches this. AWS also recommends this kind of setups. AWS published a CloudFormation stack that implements a neat network topology:
To sum it up here:
I would also like to contribute to this project, so I'd be happy to take on a part of the work linked to this issue. |
@tazjin - few questions for you
In regards to the route table. Because we are not using an overlay network we are utilizing that routing to communicate between az. Which limits us to 50 severs total. We need to think full HA with 3+ masters and multiple AZ. Only way to roll ;) |
@chrislovecnm Hi!
Yes, I'm hoping to find some time for that this week.
We need something in a public subnet that can NAT the private traffic to the internet (this is assuming people want their clusters to be able to access the internet!)
Not sure about ingress (depends on the ingress type I suppose?) but normal
There's several options and this needs discussing. For example:
I'm not familiar with what that option will do, so no clue! |
Just to answer this question:
If the machine you want to SSH in is in a private subnet, you can setup an ELB to forward the trafic of port 22 to this server. |
cc: @ajayamohan |
I propose that as the first step kops is made to work in an existing environment (shared VPC) using existing NGWs. As a minimum first set of requirements I would think we need to:
I think having kops spin up an entire environment including public and private subnets with IGWs, NAT Gateways, bastions, proper Security Groups, etc. is asking a lot of kops. Making this work with existing infrastructure as the first step and then adding onto it if necessary is potentially a cleaner path than doing it the other way around. For instance, I already have 3 NGWs and wouldn't want to have to pay for 3 more if kops created them automatically. |
I'm new to kops and don't know how kops handle infrastructure (not k8s), but I think it would be nice if I can use kops to deploy k8s cluster on all infrastructure (set up with other tools) by giving all information.
For example, if kops has separates steps for infrastructure setup and k8s cluster creation, it would be easier to test this. Also I think it would be better to start with single-AZ set up (no cross-AZ cluster / no HA on the top of multiple AZ). |
@tazjin one of the interesting things that @justinsb just brought up was that this may need to use overlay networking for HA to function. We would need a bastion connection in each AZ otherwise, which seems a tad complicated. Thoughts? @chulkilee I understand that it would be simpler, but non-HA is not useable for us in production. Also we need to probably address overlay networking as well. |
@chrislovecnm kops should support HA eventually - what I'm saying is that for this new feature, kops may support the simple use case at first. I don't know what's the most used deployment scenario for HA (e.g. HA over multiple AZ, HA in single AZ, or leveraging federation) and which options kops supports.. |
@chulkilee The advertised goal of kops is to set up a production cluster. IMHO, a cluster cannot be production-ready if it is not HA. On AWS, if every service is not hosted concurrently on at least 2 AZs, you don't have HA. @chrislovecnm @justinsb I'm not sure I understand why the overlay networking would be mandatory for HA to function: routing between AZs and subnets, in a given VPC, is pretty transparent in AWS. |
I drew a picture of what I am currently testing. It may help with the discussion.
Everything seems to be workings so far but there is one HA deficiency. Internally sourced outbound traffic is routed through a single NGW,
I tried creating a second route table with the right tag. Unfortunately it just causes all route tables updates to stop. I was hoping it would magically get updated. |
@jkemp101 you mention that public facing nodes are required for public elb. Can you not force an elb to use a public IP, and connect to a node that is in private ip space? |
@chrislovecnm That is correct. AWS will stop you because it will detect the routing table for a private subnet does not have an Internet Gateway set as the default route. This is the error message in AWS console And k8s is also smart enough to know it needs a public subnet to create the ELB. So it will refuse if it can't find one.. But @justinsb suggested labeling manually created subnets after the kops run. That worked fine. I can create services in k8s and it attaches Internet ELBs to the 3 public subnets (I created and labelled manually) and Internal ELBs to the 3 private subnets (kops created automatically). |
@jkemp101 Nice work. 2 questions:
|
@MrTrustor Hope this clarifies. Keep the questions coming.
|
@jkemp101 much thanks for the help btw, I think if you are at kubecon, I owe you libation. Anyways... Have you been able to do this with kops in its current state or how are you doing this? You mentioned a problem with HA. Please elaborate. |
@chrislovecnm I am currently running/testing the configuration depicted in my first diagram. Everything is working well so far. The only HA issue at the moment is that the clusters rely on a single NGW for outbound Internet connections. So if the NGW zone goes down the clusters can no longer do outbound connections to the Internet. Inbound through the Public ELB should still work fine. I've automated the cluster build and delete process so a single command brings up the cluster and applies all modifications. All settings for public subnets id, NGWs id, IGW id, VPC id, zones, etc. are in a custom yaml file. Here are the 11 steps for a cluster create.
This script brings up a complete cluster as depicted in the diagram in about 8 minutes with a cluster of 3 masters/3 nodes. |
I am working on testing weave with kops, and once that is done I would like to see how to incorporate this using an external networking provider. With an external networking provider, I don't think K8s will have to manage the three routing tables. Probably setting up a hangout with you to determine where the product gaps are specifically. |
@jkemp101 glad to hear the progress.. but shouldn't each cluster has own NAT, so that clusters are more isolated from others? |
@jkemp101 do you want to setup a hangout to review this? I have work items scheduled to knock this out, and would like to get the requirements clear. clove at datapipe.com is a good email for me. |
Hi, pointed here from kubernetes/kubernetes#34430 Firstly, I want to second #428 (comment) - nat gw per AZ + subnets in that AZ have their routes (default or otherwise) pointed at it. Secondly, I have a couple of variations on the stock use case. The first one is that I really want to be able to reuse my nat gw's EIP's - though I don't particularly care about the nat gw or subnet, reusing those could be a minor cost saving. The reuse of EIP is to avoid 10 working day lead times on some APIs I need which use source IP ACLs on stuff :). The second one is that I don't particularly care about private vs public subnets - as long as I can direct traffic for those APIs out via a nat gw with a long lived IP address, I'm happy :) - which may mean that my use case should be a separate bug, but I was pointed here :P. @justinsb asked about DHCP and routing - I don't think thats feasible in AWS, since their DHCP servers don't support the options needed - https://ercpe.de/blog/advanced-dhcp-options-pushing-static-routes-to-clients - covers the two options, but neither are supported by DHCP Option Set objects in the AWS VPC API. That said, since a NAT GW is as resilient as an AZ, treat the combination of NAT GW + private subnet as a scaling unit - to run in three AZ's, run three NAT GW's, three private subnets, and each subnet will have one and only one NAT GW route. |
@jkemp101 possible to share what you've done so far in a gist/git repo? Sounds like a custom bit of python wrapped over kops. |
@ajohnstone Yup. I'll share a git repo shortly with the Python script I'm using. This weekend at the latest. |
@ajohnstone Here it is https://github.com/closeio/devops/tree/master/scripts/k8s. Let me know if you have any questions. |
Allowing for additional routes or existing network infrastructure would be great. We're using VPC Peering for environment interconnectivity. This is also how we're currently accessing our kubernetes API via our VPN client. I'm also using a single TCP Load Balancer in front of our HA Kubernetes Backplane to alleviate any DNS stickiness. |
- Publishing documentation to grow with the PR - Defining command line flags
Code changes for the PR coming soon #694 |
Thanks Kris, I'll test soon also |
@starkers - It's still a WIP - let me hammer on it this weekend a bit more before testing. Was just adding the pointer last night, some people were asking about it. |
- Publishing documentation to grow with the PR - Defining command line flags
Little bird is telling me that we may have out first demo on Friday .. no promises, but @kris-nova is kicking some butt!! |
@kris-nova @chrislovecnm How is that demo going? :) This would be super-useful for creating a simple kubernetes cluster without having to modify the kops terraform output to create the private subnets. Also, hoping for user-supplied security-groups on instance groups soon as well! |
I have put together almost the exact same architecture as listed above. We are trying to get this to a state for production usage.I generate the VPC in terraform and then graft the output of kops onto the VPC output. Weavenet doesn't stay stable for very long. When I begin populating services into the cluster. It ends with all sorts of networking weirdness I can't diagnose (kubedns becoming blind, certain nodes having 3 second network latency etc etc). Flannel / Callico doesn't work either (out of the box). I'm happy to battle test the changes. Is there anything I could do to get the Egress route tables populating before Friday ? |
@hsyed need more details on weave. Can you provide details on an open cni issue? Work still in progress with private networking |
I'm very excited to see all of the progress on this issue! Thanks for all the hard work! I have been running a cluster in AWS using a setup similar to the "private" topology mentioned in #694. and pretty much a spot on match to to the diagram @jkemp101 created above, where each private subnet has a public "utility" subnet with its own NAT gateway and corresponding route tables which send 0.0.0.0/0 through the AZ's NAT. It all works fine except for one thing: kubernetes stops updating routes because mutliple route tables are found (@jkemp101 also mentioned seeing this behavior). I've had to manually add the routes to all the routing tables every time my set of nodes changes. It looks as though kubernetes itself does not currently support multiple route tables (https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws_routes.go#L45). I could definitely be missing something (I'm new to go and so my investigation speed is slow), but it seems to me that having kubernetes support multiple routing tables would be a prerequisite to supporting multple private subnets with dedicated NAT gateways, right? I tried searching kubernetes for an existing issue about supporting multiple routing tables, but can't find one (perhaps I'm not using the correct keywords). |
@jschneiderhan I do not know what approach is being taken by the work being done by @kris-nova. I assumed it was updating multiple route tables. I had a realisation that there is an alternative architecture that could work. We would need a second network interface on each node. This would be 9 Subnets per cluster. 3 Subnets for kubenet connected to the route table it manages, 3 additional subnets (NAT routing subnet) for the nodes where each subnet is connected to a route table which is connected to a shared NAT gateway in it's AZ. Finally 3 public subnets for ELBs. The NAT routing subnet would mean dynamically attaching elastic network interfaces as auto scaling groups do not support these. |
@kris-nova I'd be happy to add an issue over in the kubernetes project, but before I do I could use another pair of eyes to make sure I'm not just being stupid: I think for all of this (really awesome) work to function in AWS without and overlay network, an improvement needs to be make to kubernetes itself. If a subnet-per-AZ is being created, we will end up with multiple VPC routing tables that need to be updated with the CIDR ranges assigned to each node. When kubernetes goes to create the Route, it's going to find multiple tables with the cluster name tag and return an error on this line https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws_routes.go#L45. At least that's what I'm seeing with multiple manually created route tables. It just logs that line multiple times and never updates the route. So I think this PR does everything right, but in order for kubernetes to do it's thing properly it needs to be improved to iterate over all the route tables and create a route for each one. Again, if that makes sense to you I'm happy to create a kubernetes issue and give a shot at an implementation, but my confidence is pretty low since I'm new to just about every technology involved here :). |
@justinsb thoughts about @jschneiderhan comment? cc: @kubernetes/sig-network ~ can someone give @jschneiderhan any guideance? @jschneiderhan we are only initially going to be running with CNI in private mode btw. |
If you are interested in testing the current branch you are welcome to run it. More information can be found in the PR #694 |
Topology Support (AKA Private Networking) #428
Closed with #694 |
After some discussions with @chrislovecnm I'm using this issue to summarise what we need to do to support instances in private subnets with NAT gateways.
Problem
Currently all instance groups created by kops are placed in public subnets. This may not be desirable in all use-cases. There are related open issues about this (#232, #266 which should maybe be closed, #220, #196).
As the simplest use-case kops should support launching instance groups into private subnets with Amazon's managed NAT gateways as the default route.
In addition a feature to specify a default route may be desirable for use-cases where NAT is handled differently, as suggested by @ProTip.
AWS resources
In order to set this up several resources are required. We need:
Open questions
Implementation
After the open questions are answered (and unless somebody else comes up with any new ones!) I think the implementation steps are roughly these:
awstask
for NAT gateway creation.Dump your thoughts in the comments and I'll update this as we go along. I'm willing to spend time on this but have limited Go experience, so if someone familiar with the code base has time to answer questions that may come up I'd be grateful.
The text was updated successfully, but these errors were encountered: