Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs should clarify private endpoint need to be true when public endpoint is IP filtered #1111

Closed
2 of 4 tasks
schollii opened this issue Nov 20, 2020 · 12 comments · Fixed by #1603
Closed
2 of 4 tasks

Comments

@schollii
Copy link

schollii commented Nov 20, 2020

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

If this is a bug, how to reproduce? Please include a code sample if relevant.

Use the managed node group example and override cluster_endpoint_public_access_cidrs to be [YOUR_WAN_IP/32] but do not set cluster_endpoint_private_access

terraform apply: after 15 min to create EKS cluster, the node group will loop waiting for 25 minutes when TF gives up, with the following error:

Error: error waiting for EKS Node Group (eks-experiment-3:eks-experiment-3-main-wise-gobbler) creation: NodeCreationFailure: Instances failed to join the kubernetes cluster. Resource IDs: [... ... ...]

What's the expected behavior?

I didn't realize until after a lot of pain trying to figure out causes that it was result of having cluster_endpoint_public_access_cidrs = [MY_WAN_IP], but have only a public cluster endpoint: the nodes would then talk to the cluster through the public endpoint but their IP would not match the above list so could never join.

So this might be more of an enhancement than a bug report:

  • I should have seen in the docs for cluster_endpoint_public_access_cidrs that in almost all cases where you override this value, you will need to set cluster_endpoint_private_access to true
  • the cluster_endpoint_private_access should have defaulted to true since cluster_endpoint_public_access_cidrs was not the default value, how could it possibly work with anything but private=true?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: 13.2.1
  • OS: Ubuntu
  • Terraform version: 0.13.5

Any other relevant info

If you override the default of cluster_endpoint_public_access_cidrs (which is [0.0.0.0] by default), you are pretty much guaranteed, IMO, to need to activate the private endpoint. In fact, enabling nodes to use the private endpoint should probably be a default. What would be the downside? What benefit to have node-to-node traffic ever leave the cluster?

@barryib
Copy link
Member

barryib commented Nov 20, 2020

I didn't realize until after a lot of pain trying to figure out causes that it was result of having cluster_endpoint_public_access_cidrs = [MY_WAN_IP], but have only a public cluster endpoint: the nodes would then talk to the cluster through the public endpoint but their IP would not match the above list so could never join

@schollii I don't have right answer for that. But I don't think that module should copy paste all the AWS EKS docs (or even replace it). It's clearly describe in the EKS docs https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html :

Your cluster API server is accessible from the internet. You can, optionally, limit the CIDR blocks that can access the public endpoint. If you limit access to specific CIDR blocks, then it is recommended that you also enable the private endpoint, or ensure that the CIDR blocks that you specify include the addresses that nodes and Fargate pods (if you use them) access the public endpoint from.


If you override the default of cluster_endpoint_public_access_cidrs (which is [0.0.0.0] by default), you are pretty much guaranteed, IMO, to need to activate the private endpoint. In fact, enabling nodes to use the private endpoint should probably be a default. What would be the downside? What benefit to have node-to-node traffic ever leave the cluster?

This the default behavior of AWS EKS, so the Terraform AWS provider also. This module is not changing any default here, we are just following AWS one.

@schollii
Copy link
Author

schollii commented Nov 20, 2020

[Sorry this ended up a bit longer than anticipated!]

About the AWS default being reason: understood, I can't argue with that.

However I am proposing to add one sentence to docs, that's rather astronomically far from "all AWS docs".

One important objective of modules like this one is to decrease cognitive load and abstract away details.

The settings of a module basically distill what is most important in the eyes of whoever wrote the module. Whoever uses the module does so knowing that they are trading some configurability in favor of convenience.

So although it's fair to assume some familiarity with AWS, it's less fair to assume the person knows the underlying AWS docs inside out, let alone that they know every sentence of the docs that relate to a particular module setting.

If a setting involves a non-obvious side effect in the underlying AWS behavior, its docs should at least be fixed, when issue raised, to point to the AWS docs where the caveat is discussed. So no I'm not saying anyone should analyze other module settings to determine if there are more of these side-effects, it's ok to let them be discovered.

Do you think a PR for a one-sentence fix to the module docs would be accepted?

@barryib
Copy link
Member

barryib commented Nov 20, 2020

Do you think a PR for a one-sentence fix to the module docs would be accepted?

Please open a PR, we'll review it.

@stale
Copy link

stale bot commented Feb 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 18, 2021
@schollii
Copy link
Author

/refresh

@stale stale bot removed the stale label Feb 19, 2021
@stale
Copy link

stale bot commented May 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 21, 2021
@schollii
Copy link
Author

/refresh

@stale stale bot removed the stale label May 21, 2021
@stale
Copy link

stale bot commented Aug 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 19, 2021
@stale
Copy link

stale bot commented Sep 14, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Sep 14, 2021
@schollii
Copy link
Author

schollii commented Sep 17, 2021

@barryib I'm creating a separate PR for this

@antonbabenko
Copy link
Member

Thanks @schollii !

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants