Best practices in configuring and running production clusters #1050

mumoshu · 2017-12-05T02:33:09Z

This is a fast, incomplete write-up to start discussing towards documenting best practices to help users configure their production clusters.

As far as I remember, we don't have all of these in a single page as of today, right?
cc @c-knowles

Availability

Etcd
- At least 3 nodes
  - etcd.count
- Deploy all the etcd nodes into a single AZ when your region has only two
- Deploy each etcd node in a separate AZ to tolerate AZ failures
Contorller
- At least 2 nodes
  - controller.count
- 2 AZs
Worker
- At least 2 nodes
- Ensure all your replicasets/deployments to
  - Have at least 2 replicas
  - affinity.podAntiAffinity to prefer or require pods to be not collocated in the same node

Security

Something like published recently in the GCP blog would be nice:

Harden kube-dashboard

kubernetesDashboard:
  adminPrivileges: false
  insecureLogin: false

Enable Calico for Network Policies

useCalico: true

RBAC

Enabled by default since v0.9.9-rc.1

User Authentication

Either of the followings
- heptio-authenticator-aws + experimental.authentication.webhook.*

Node Authn/Authz

Enable TLS bootstrapping + Node Authorizer
- experimental.tlsBootstrap.enabled
- experimental.nodeAuthorizer.enabled

Auditing

Enable apiserver audit logging
- experimental.auditLog.enabled

Misc

Enable node local resolver for faster DNS lookups, limited failure domain
- kubeDns.nodeLocalResolver.enabled
Enable cluster-autoscaler
- addons.clusterAutoscaler.enabled
- controller.clusterAutoscalerSupport.enabled
- worker.nodePools[].autoscaling.clusterAutoscaler.enabled

The text was updated successfully, but these errors were encountered:

mumoshu · 2017-12-05T03:05:18Z

@c-knowles Just noticed the high availability part is covered in your recent PR 😄
Good job!

cknowles · 2017-12-05T03:49:23Z

@mumoshu yeah if you could merge #1034 and then I could take what it here to improve it.

FYI adminPrivileges should be false to prevent dashboard from having admin privileges. Also, need to run kubectl delete clusterrolebinding kubernetes-dashboard on old clusters.

mumoshu · 2017-12-05T04:06:32Z

@c-knowles Thanks for the correction! Yes - we really should set it to false.
Also, I'll take a look into your PR today.

mumoshu · 2017-12-06T04:37:58Z

Availability: nodeDrainer should be enabled

mumoshu · 2018-02-06T07:53:37Z

Cluster access control mentioned in #1122 by @int128 should be addressed, too.

Vince-Cercury · 2018-02-26T22:50:49Z

@mumoshu, would you be able to clarify or provide references to kubeDns.nodeLocalResolver.enabled?
My K8S DNS sometimes fail to resolve AWS S3 or AWS ASG API, I wonder if that can help

mumoshu · 2018-03-08T02:28:16Z

@Vincemd Probably.

For me, kube-dns occasionally failed to resolve AWS managed DNS names. I suspect that it is due to temporary failure in Amazon DNS and/or communication between your node and the Amazon DNS. kubeDns.nodeLocalResolver would be a solution if it was a communication issue - as long as nodeLocalResolver has a cached dns entry available for your query, it will "hide" the failure.

mumoshu · 2018-03-08T02:30:57Z

Also see "best practices on team operation" at #1122.
In nutshell, I suggest you to:

Deploy heptio/authenticator for authenticating your devs/services outside of the cluster for k8s api access and
Use a tool like sopsed or vaulted to securely git-commit your cluster/admin credentials for sharing.

mumoshu · 2018-03-22T04:28:48Z

This issue has been mostly about configuring and managing "source code" of your cluster.
Regarding suggestions and backing contexts of production cluster upgrades, also see #455.

Vince-Cercury · 2018-04-04T10:56:20Z

@mumoshu kubeDns.nodeLocalResolver.enabled. What does that do if I enable it? I mean what configuration or deployment does it alter? Is it possible to make that change without doing kube-aws update by simply editing a k8s resource or is it more complicated than that?

Getting a lot of Unable to execute HTTP request: MYBUCKET.s3.amazonaws.com: Temporary failure in name resolution on my pods running aws S3 cli on kube-aws 0.9.8. I'm going to retire this cluster for a 0.9.9 but wanted to try quick fix as the impact is great

mumoshu · 2018-06-11T01:46:35Z

@Vincemd I understand that it should be possible to introduce nodeLocalResolver without downtime.

But I'd prefer creating an another cluster for migration to protect your production service with maximum care :) Also, we should be very good at creating/deleting k8s clusters. So that we don't need to fear any kind of cluster failures too much.

Anyway, the name resolution error seems like what I have seen before due to kube-dns instability in higher loads. I just made my apps to tolerate transient dns look failures by retrying. If you don't have retries in your apps, I suggest you to implement ones.

One more thing. I scaled kube-dns by adding more replicas which greatly reduced such errors. So do it, even if kube-dns doesn't seem to be very overloaded.

The move to nodeLocalResolver is the last thing you should try.

mumoshu · 2018-06-11T02:21:15Z

Reserve compute resources for kubelet and system daemons #1356

Vince-Cercury · 2018-06-12T02:50:09Z

@mumoshu thanks. Indeed we have the concept of blue/green so most of the time we make changes to an inactive cluster. We are getting better at creating cluster. The hard bit used to be Prometheus Operator but since kube-aws has opened the ports, it's easier.

Anyway, I tried adding more dns pods without success.
We do have retry. But there is room for improvement

nodeLocalResolver is now enabled and no issue so far.

mumoshu · 2018-06-12T03:32:46Z

@Vincemd Awesome! You're a very experienced k8s admin 🎉

Probably I have a similar sentiment about prometheus. I basically wanted to migrate a k8s monitoring system across k8s clusters without downtime, but gave up and went to a different route. That's metricbeat + Kinesis + AWS ES for multi-cluster monitoring. Managing AWS ES cluster seems not that easy compared to Prometheus Operator but I thought migrating stateful services across k8s clusters would be way harder.

I tried adding more dns pods without success.

Good to know. Thanks for sharing your experience!

cmcconnell1 · 2018-09-13T22:35:07Z

Hello All, I reference this issue from time to time when building out new clusters (with new kube-aws versions) and wanted to note since there were mentions about prometheus here. Just came across this today and planning on testing/evaluating thanos to help with our prometheus infrastructure:

Seamless integration with Prometheus and the ability to use only part of the Thanos project makes it a perfect choice if you want to scale your monitoring system without a superhuman effort.

Blog/overview thanos-prometheus-at-scale
github thanos

mdgreenwald · 2018-09-22T15:04:09Z

Following this thread. 😁

cmcconnell1 · 2018-09-24T18:52:45Z

Regarding best practices, it would be great to have concrete and working examples with the kube-aws docs showing exactly how to configure clusters for desired functionality.

On that note, I understand there is the baseline provisioning testing tool kube-aws/e2e/run in the project, and perhaps this might be a better reference for folks to use as a baseline? I'm not sure this existed when we first started using kube-aws years ago (and if it did, my bad for not finding and using it at that time--the kube-aws slack channel definitely didn't exist ;-) ).

I have the following scrubbed cluster.yaml file which has been upgraded for the latest kube-aws version v0.10.2 that we're using in prod for a small cluster.

I am providing it for review and encouraging feedback with the goal of giving everyone coming to kube-aws a working cluster config ready for deployment which provides:

H/A Multi-AZ NodePools
Autoscaling
Kiam
A Basic opinion/config using a pre-existing SSH AWS Security Group for all cluster nodes
etc.

This could provide new users (and existing if needed) configurations that can start using immediately.

Perhaps this might be something that could go under the kube-aws/tree/master/docs/advanced-topics location?

Another thought proposal on this subject would be creating template (command-line) options for kube-aws which would be called/used during the cluster.yaml initialize assets creation process, that would configure (and make active) specified configuration options such as: multi-AZ nodepools, autoscaler, kiam, etc.
I.e.:

kube-aws init \
--cluster-name=my-cluster-name \
--external-dns-name=my-cluster-endpoint \
--hosted-zone-id=hosted-zone-xxxxx \
--region=us-west-1 \
--key-name=key-pair-name \
--kms-key-arn="arn:aws:kms:us-west-1:xxxxxxxxxx:key/xxxxxxxxxxxxxxxxxxx" \
--s3-uri=s3://my-kube-aws-assets-bucket \
# possible future options below something like?
# --subnet-availability-zones=us-west-1a, instanceCIDR: "10.1.8.64/27",us-west-1b:instanceCIDR: "10.1.8.96/27"
# --autoScaler=true \
# --multiAzNodePool=true \
# --kiam=true
# --alloSshToAll=true

I understand this is a complex ask and is difficult to maintain. But, if possible, this could greatly enhance our ability to manage our kube clusters with kube-aws using a more desirable IAC methodology.

I think something like the above options would help us gathering more kube-aws adoption moving forward, as the criticisms I have faced personally from other kube deployment consultants/companies/engineers, etc., essentially can be summarized with the current lack of fully IAC management of our clusters' configurations and deployment code and the requisite intermediate steps during initial (i.e. new version) deployments, etc.

On that note, I see that there is a nice existing project (similar to what I've done internally with the exception that ours is not interactive) which provides scaffolding around kube-aws deployment procsses camilb's kube-aws-secure

I would appreciate thoughts and feedback on these ideas.
Thank you

edalford11 · 2018-09-24T19:02:11Z

One thing I think people would find useful on kube-aws best practices is how to confirm that their serviceCIDR and podCIDR do not clash with any other outside resource CIDRs that they need MASQUERADE. For example, I ended up needing to connect my pods to RDS instances via VPC peering and it turned out the VPC's I needed to connect to used the exact same CIDRs as the podCIDR and serviceCIDR. #1407 was very helpful to me for figuring this out.

fejta-bot · 2019-04-25T18:11:10Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-25T18:53:37Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-24T19:44:17Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-24T19:44:25Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mumoshu changed the title ~~Document best practices in configuring production clusters~~ Best practices in configuring production clusters Dec 5, 2017

cknowles added the documentation label Dec 5, 2017

cknowles self-assigned this Dec 5, 2017

zflamig mentioned this issue Mar 1, 2018

HA-ify data.kidsfirstdrc.org k8s cluster uc-cdis/cloud-automation#213

Closed

mumoshu changed the title ~~Best practices in configuring production clusters~~ Best practices in configuring and running production clusters Mar 8, 2018

mumoshu mentioned this issue Sep 29, 2018

Out-of-box integration with istio #978

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2019

k8s-ci-robot closed this as completed Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices in configuring and running production clusters #1050

Best practices in configuring and running production clusters #1050

mumoshu commented Dec 5, 2017 •

edited

Loading

mumoshu commented Dec 5, 2017

cknowles commented Dec 5, 2017

mumoshu commented Dec 5, 2017

mumoshu commented Dec 6, 2017

mumoshu commented Feb 6, 2018

Vince-Cercury commented Feb 26, 2018

mumoshu commented Mar 8, 2018

mumoshu commented Mar 8, 2018

mumoshu commented Mar 22, 2018

Vince-Cercury commented Apr 4, 2018

mumoshu commented Jun 11, 2018

mumoshu commented Jun 11, 2018

Vince-Cercury commented Jun 12, 2018

mumoshu commented Jun 12, 2018

cmcconnell1 commented Sep 13, 2018

mdgreenwald commented Sep 22, 2018

cmcconnell1 commented Sep 24, 2018

edalford11 commented Sep 24, 2018

fejta-bot commented Apr 25, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

k8s-ci-robot commented Jun 24, 2019

Best practices in configuring and running production clusters #1050

Best practices in configuring and running production clusters #1050

Comments

mumoshu commented Dec 5, 2017 • edited Loading

Availability

Security

Harden kube-dashboard

Enable Calico for Network Policies

RBAC

User Authentication

Node Authn/Authz

Auditing

Misc

mumoshu commented Dec 5, 2017

cknowles commented Dec 5, 2017

mumoshu commented Dec 5, 2017

mumoshu commented Dec 6, 2017

mumoshu commented Feb 6, 2018

Vince-Cercury commented Feb 26, 2018

mumoshu commented Mar 8, 2018

mumoshu commented Mar 8, 2018

mumoshu commented Mar 22, 2018

Vince-Cercury commented Apr 4, 2018

mumoshu commented Jun 11, 2018

mumoshu commented Jun 11, 2018

Vince-Cercury commented Jun 12, 2018

mumoshu commented Jun 12, 2018

cmcconnell1 commented Sep 13, 2018

mdgreenwald commented Sep 22, 2018

cmcconnell1 commented Sep 24, 2018

edalford11 commented Sep 24, 2018

fejta-bot commented Apr 25, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

k8s-ci-robot commented Jun 24, 2019

mumoshu commented Dec 5, 2017 •

edited

Loading