-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error trying to create a private only EKS cluster #272
Comments
Hi @miaeyg, thanks for filling out the new issue form, the additional information requested in it is often quite helpful. I looked over your description this morning but nothing stood out. Josh reached out to me regarding the issue you are having. I suspect that it may be a private endpoint security group rule related issue given the "instances failed to join the k8s cluster" behavior that you mentioned. I can work with Josh next week to get this issue resolved for you. |
Hi @dhoucgitter. Any idea what is wrong? |
Hi @miaeyg, touched based with Josh and heard you are meeting soon to discuss. |
Hi @dhoucgitter not sure how the meeting is related to the reported problem. I am interested to understand how does this IaC project support creating a private-only EKS cluster. The way I did it (since I do not have AWS Direct Connect nor AWS Site to Site VPN) is by creating a small public subnet with an EC2 instance which has both public access and since it is in the VPC is also has private access to the VPC. I ran Terraform from this EC2 public machine with the terraform.tfvars files I included at the beginning. Should this have worked or is my public subnet perhaps the root cause of the failure? How did you test the creation of the private-only EKS setup? |
Hi @dhoucgitter so after the meeting it was understood that we need to setup a private only VPC but we can use a public K8S API Server. So, I tested again this time I used my on-prem machine as the deployment machine and the VPC does not contain any public subnets + no IGW + no NAT Gateway. It contains only 1 private subnet + 2 private control-plane subnets. I tagged the private subnet as per the doc and also manually created a security group which allows inbound "0.0.0.0/0" and assigned it to the three input security groups variables so I am now on BYON scenario #3. Nevertheless, the Terraform script still fails to join the node-groups with the EKS cluster with the same error. I really need help with this... Here is the terraform.tfvars file:
Here is the error I get: module.eks.module.eks_managed_node_group["cas"].aws_eks_node_group.this[0]: Still creating... [24m20s elapsed] |
Root Cause: Fix: Additional Details: After further investigation/testing, we ultimately discovered that functionally a S3 VPC endpoint with either Gateway type OR interface type will work in an EKS air gap scenario. The difference is in implementation and cost. A gateway type requires a routable association whereas interfaces do not and Gateway types are free whereas interfaces have a cost associated. |
Hi @miaeyg, I have been testing the change Josh describes above re: changing the S3 VPC endpoint type from "gateway" to "interface" with success in a a private cluster configuration. Anticipating that it will be part of the content in the upcoming April IAC AWS release along with help for dark site deployments. |
Didn't intend to close this before, I will add a link to the upcoming dark site PR that is expected to close this issue once I've created it. |
Hi @dhoucgitter , good to hear that you will be making this fix public! Here is what I did in the meanwhile to workaround after reading Josh's explanation of the root cause: The changes are located here: main...miaeyg:viya4-iac-aws:fix-vpc-private |
Hi @dhoucgitter I tested the latest version 8.6.0 with Terraform 1.9.6 at my test env. simulating a dark site.
|
Terraform Version Details
Using the latest version of this project and trying to setup a private-only EKS cluster.
Using BYON Scenario 2 to setup a private only EKS cluster.
Manually created the VPC and 2 control-plance subnets (in two separate AZs) + 1 private subnet + 1 public subnet.
The public subnet has it's own route table with an additional route "0.0.0.0/0" to the attached IGW to the VPC. The public subnet is used only for setting up the public facing "SAS deployment machine" on it. I can SSH to this "SAS deployment machine" from the Internet and run the viya4-iac-aws project on it.
The private subnet and the 2 control-plane were assigned the following tags and do not have route to the IGW:
Terraform Variable File Details
Steps to Reproduce
Use the supplied "terraform.tfvars" and adjust the following input variables to your env:
location
default_private_access_cidrs
vpc_id
subnet_ids
Expected Behavior
EKS cluster setup as private only without any problem
Actual Behavior
Failure happens after 25 minutes: NodeCreationFailure: Instances failed to join the kubernetes cluster
See complete error below
I see that the EKS cluster is Active and I also see the EC2 instances running:
Additional Context
No response
References
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: