Skip to content
This repository was archived by the owner on Sep 30, 2020. It is now read-only.

Best practices in configuring and running production clusters #1050

Closed
mumoshu opened this issue Dec 5, 2017 · 22 comments
Closed

Best practices in configuring and running production clusters #1050

mumoshu opened this issue Dec 5, 2017 · 22 comments
Assignees
Labels
documentation lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Dec 5, 2017

This is a fast, incomplete write-up to start discussing towards documenting best practices to help users configure their production clusters.

As far as I remember, we don't have all of these in a single page as of today, right?
cc @c-knowles

Availability

  • Etcd
    • At least 3 nodes
      • etcd.count
    • Deploy all the etcd nodes into a single AZ when your region has only two
    • Deploy each etcd node in a separate AZ to tolerate AZ failures
  • Contorller
    • At least 2 nodes
      • controller.count
    • 2 AZs
  • Worker
    • At least 2 nodes
    • Ensure all your replicasets/deployments to
      • Have at least 2 replicas
      • affinity.podAntiAffinity to prefer or require pods to be not collocated in the same node

Security

Something like published recently in the GCP blog would be nice:

Harden kube-dashboard

kubernetesDashboard:
  adminPrivileges: false
  insecureLogin: false

Enable Calico for Network Policies

useCalico: true

RBAC

Enabled by default since v0.9.9-rc.1

User Authentication

  • Either of the followings
    • heptio-authenticator-aws + experimental.authentication.webhook.*

Node Authn/Authz

  • Enable TLS bootstrapping + Node Authorizer
    • experimental.tlsBootstrap.enabled
    • experimental.nodeAuthorizer.enabled

Auditing

  • Enable apiserver audit logging
    • experimental.auditLog.enabled

Misc

  • Enable node local resolver for faster DNS lookups, limited failure domain
    • kubeDns.nodeLocalResolver.enabled
  • Enable cluster-autoscaler
    • addons.clusterAutoscaler.enabled
    • controller.clusterAutoscalerSupport.enabled
    • worker.nodePools[].autoscaling.clusterAutoscaler.enabled
@mumoshu mumoshu changed the title Document best practices in configuring production clusters Best practices in configuring production clusters Dec 5, 2017
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 5, 2017

@c-knowles Just noticed the high availability part is covered in your recent PR 😄
Good job!

@cknowles
Copy link
Contributor

cknowles commented Dec 5, 2017

@mumoshu yeah if you could merge #1034 and then I could take what it here to improve it.

FYI adminPrivileges should be false to prevent dashboard from having admin privileges. Also, need to run kubectl delete clusterrolebinding kubernetes-dashboard on old clusters.

@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 5, 2017

@c-knowles Thanks for the correction! Yes - we really should set it to false.
Also, I'll take a look into your PR today.

@cknowles cknowles self-assigned this Dec 5, 2017
@mumoshu
Copy link
Contributor Author

mumoshu commented Dec 6, 2017

Availability: nodeDrainer should be enabled

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 6, 2018

Cluster access control mentioned in #1122 by @int128 should be addressed, too.

@Vince-Cercury
Copy link

@mumoshu, would you be able to clarify or provide references to kubeDns.nodeLocalResolver.enabled?
My K8S DNS sometimes fail to resolve AWS S3 or AWS ASG API, I wonder if that can help

@mumoshu mumoshu changed the title Best practices in configuring production clusters Best practices in configuring and running production clusters Mar 8, 2018
@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 8, 2018

@Vincemd Probably.

For me, kube-dns occasionally failed to resolve AWS managed DNS names. I suspect that it is due to temporary failure in Amazon DNS and/or communication between your node and the Amazon DNS. kubeDns.nodeLocalResolver would be a solution if it was a communication issue - as long as nodeLocalResolver has a cached dns entry available for your query, it will "hide" the failure.

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 8, 2018

Also see "best practices on team operation" at #1122.
In nutshell, I suggest you to:

  • Deploy heptio/authenticator for authenticating your devs/services outside of the cluster for k8s api access and
  • Use a tool like sopsed or vaulted to securely git-commit your cluster/admin credentials for sharing.

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 22, 2018

This issue has been mostly about configuring and managing "source code" of your cluster.
Regarding suggestions and backing contexts of production cluster upgrades, also see #455.

@Vince-Cercury
Copy link

@mumoshu kubeDns.nodeLocalResolver.enabled. What does that do if I enable it? I mean what configuration or deployment does it alter? Is it possible to make that change without doing kube-aws update by simply editing a k8s resource or is it more complicated than that?

Getting a lot of Unable to execute HTTP request: MYBUCKET.s3.amazonaws.com: Temporary failure in name resolution on my pods running aws S3 cli on kube-aws 0.9.8. I'm going to retire this cluster for a 0.9.9 but wanted to try quick fix as the impact is great

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 11, 2018

@Vincemd I understand that it should be possible to introduce nodeLocalResolver without downtime.

But I'd prefer creating an another cluster for migration to protect your production service with maximum care :) Also, we should be very good at creating/deleting k8s clusters. So that we don't need to fear any kind of cluster failures too much.

Anyway, the name resolution error seems like what I have seen before due to kube-dns instability in higher loads. I just made my apps to tolerate transient dns look failures by retrying. If you don't have retries in your apps, I suggest you to implement ones.

One more thing. I scaled kube-dns by adding more replicas which greatly reduced such errors. So do it, even if kube-dns doesn't seem to be very overloaded.

The move to nodeLocalResolver is the last thing you should try.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 11, 2018

Reserve compute resources for kubelet and system daemons #1356

@Vince-Cercury
Copy link

@mumoshu thanks. Indeed we have the concept of blue/green so most of the time we make changes to an inactive cluster. We are getting better at creating cluster. The hard bit used to be Prometheus Operator but since kube-aws has opened the ports, it's easier.

Anyway, I tried adding more dns pods without success.
We do have retry. But there is room for improvement

nodeLocalResolver is now enabled and no issue so far.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 12, 2018

@Vincemd Awesome! You're a very experienced k8s admin 🎉

Probably I have a similar sentiment about prometheus. I basically wanted to migrate a k8s monitoring system across k8s clusters without downtime, but gave up and went to a different route. That's metricbeat + Kinesis + AWS ES for multi-cluster monitoring. Managing AWS ES cluster seems not that easy compared to Prometheus Operator but I thought migrating stateful services across k8s clusters would be way harder.

I tried adding more dns pods without success.

Good to know. Thanks for sharing your experience!

@cmcconnell1
Copy link
Contributor

Hello All, I reference this issue from time to time when building out new clusters (with new kube-aws versions) and wanted to note since there were mentions about prometheus here. Just came across this today and planning on testing/evaluating thanos to help with our prometheus infrastructure:

Seamless integration with Prometheus and the ability to use only part of the Thanos project makes it a perfect choice if you want to scale your monitoring system without a superhuman effort.

Blog/overview thanos-prometheus-at-scale
github thanos

@mdgreenwald
Copy link

Following this thread. 😁

@cmcconnell1
Copy link
Contributor

Regarding best practices, it would be great to have concrete and working examples with the kube-aws docs showing exactly how to configure clusters for desired functionality.

On that note, I understand there is the baseline provisioning testing tool kube-aws/e2e/run in the project, and perhaps this might be a better reference for folks to use as a baseline? I'm not sure this existed when we first started using kube-aws years ago (and if it did, my bad for not finding and using it at that time--the kube-aws slack channel definitely didn't exist ;-) ).

I have the following scrubbed cluster.yaml file which has been upgraded for the latest kube-aws version v0.10.2 that we're using in prod for a small cluster.

I am providing it for review and encouraging feedback with the goal of giving everyone coming to kube-aws a working cluster config ready for deployment which provides:

  • H/A Multi-AZ NodePools
  • Autoscaling
  • Kiam
  • A Basic opinion/config using a pre-existing SSH AWS Security Group for all cluster nodes
  • etc.

This could provide new users (and existing if needed) configurations that can start using immediately.

Perhaps this might be something that could go under the kube-aws/tree/master/docs/advanced-topics location?

Another thought proposal on this subject would be creating template (command-line) options for kube-aws which would be called/used during the cluster.yaml initialize assets creation process, that would configure (and make active) specified configuration options such as: multi-AZ nodepools, autoscaler, kiam, etc.
I.e.:

kube-aws init \
--cluster-name=my-cluster-name \
--external-dns-name=my-cluster-endpoint \
--hosted-zone-id=hosted-zone-xxxxx \
--region=us-west-1 \
--key-name=key-pair-name \
--kms-key-arn="arn:aws:kms:us-west-1:xxxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx" \
--s3-uri=s3://my-kube-aws-assets-bucket \
# possible future options below something like?
# --subnet-availability-zones=us-west-1a, instanceCIDR: "10.1.8.64/27",us-west-1b:instanceCIDR: "10.1.8.96/27"
# --autoScaler=true \
# --multiAzNodePool=true \
# --kiam=true
# --alloSshToAll=true

I understand this is a complex ask and is difficult to maintain. But, if possible, this could greatly enhance our ability to manage our kube clusters with kube-aws using a more desirable IAC methodology.

I think something like the above options would help us gathering more kube-aws adoption moving forward, as the criticisms I have faced personally from other kube deployment consultants/companies/engineers, etc., essentially can be summarized with the current lack of fully IAC management of our clusters' configurations and deployment code and the requisite intermediate steps during initial (i.e. new version) deployments, etc.

On that note, I see that there is a nice existing project (similar to what I've done internally with the exception that ours is not interactive) which provides scaffolding around kube-aws deployment procsses camilb's kube-aws-secure

I would appreciate thoughts and feedback on these ideas.
Thank you

@edalford11
Copy link

One thing I think people would find useful on kube-aws best practices is how to confirm that their serviceCIDR and podCIDR do not clash with any other outside resource CIDRs that they need MASQUERADE. For example, I ended up needing to connect my pods to RDS instances via VPC peering and it turned out the VPC's I needed to connect to used the exact same CIDRs as the podCIDR and serviceCIDR. #1407 was very helpful to me for figuring this out.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants