-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog Post: Out of the Clouds onto the Ground: How to Make Kubernetes Production Grade Anywhere #9716
Conversation
This reverts commit eeb1132.
/assign @zacharysarah |
Deploy preview for kubernetes-io-master-staging ready! Built with commit 69e85fc https://deploy-preview-9716--kubernetes-io-master-staging.netlify.com |
Deploy preview for kubernetes-io-master-staging ready! Built with commit fef53fc https://deploy-preview-9716--kubernetes-io-master-staging.netlify.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbarnard10 @cantbewong @embano1
thanks for the writeup! 👍
i've added some comments and found a couple of typos.
|
||
**Authors**: Steven Wong (VMware), Michael Gasch (VMware) | ||
|
||
This blog offers some guidelines for running a production-grade Kubernetes cluster in an environment like an on-premise data center or edge location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency for production grade
|
||
This article is directed at on-premise Kubernetes deployments on a hypervisor or bare-metal platform, facing finite backing resources compared to the expansibility of the major public clouds. However, some of these recommendations may also be useful in a public cloud if budget constraints limit the resources you choose to consume. | ||
|
||
A single node bare-metal Minikube deployment may be cheap and easy, but is not production grade. Conversely, you’re not likely to achieve Google’s Borg experience in a retail store, branch office, or edge location -- nor are you likely to need it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- nor
-> , nor
|
||
A single node bare-metal Minikube deployment may be cheap and easy, but is not production grade. Conversely, you’re not likely to achieve Google’s Borg experience in a retail store, branch office, or edge location -- nor are you likely to need it. | ||
|
||
This blog offers some guidance on achieving a production-worthy Kubernetes deployment, even when dealing with some resource constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly production-worthy
-> production worthy
|
||
![api server](/images/blog/2018-08-03-make-kubernetes-production-grade-anywhere/api-server.png) | ||
|
||
Typically the API server, Controller Manager and Scheduler components are co-located within multiple instances of control plane (aka Master) nodes. Master nodes usually include etcd too – although there are high availability and large cluster scenarios that call for running etcd on independent hosts. The components can be run as containers, and optionally be supervised by Kubernetes, i.e., running as statics pods. The latter requires the kubelet agent on the control plane nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there were recent discussion to move away from the term master
in k8s, yet we do have this everywhere in the docs and in the code base.
etcd too – although
-> etcd too, although
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, i.e., running
-> - i.e. running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter requires the kubelet agent on the control plane nodes.
i think the latter
it's not very clear, also the kubelet runs on every node, not only CP nodes.
i would omit the last sentence.
|
||
![kubernetes components HA](/images/blog/2018-08-03-make-kubernetes-production-grade-anywhere/kubernetes-components-ha.png) | ||
|
||
Risks to these components include hardware failures, software bugs, bad updates, human errors, network outages, and overloaded systems resulting in resource exhaustion. Redundancy can mitigate the impact of many of these hazards. In addition, the resource scheduling and high availability features of a hypervisor platform can be useful to surpass what can be achieved using the Linux operating system, Kubernetes, and container runtime alone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, and container runtime
-> and a container runtime
|
||
## Security | ||
|
||
Every Kubernetes cluster has a cluster root Certificate Authority (CA). Master, Kubelet, and Kubectl certs need to be generated and installed. If you use an install tool or a distribution this may be handled for you. A manual process is described [here](https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/04-certificate-authority.md). You should be prepared to reinstall certificates in the event of node replacements or expansions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master, Kubelet, and Kubectl
this is more valid and also aligns with Kelsey's guide:
The Controller Manager, API Server, Scheduler, kubelet client, kube-proxy and administrator certificates
* Consider physical security, especially when deploying to edge or remote office locations that may be unattended. Include storage encryption to limit exposure from stolen devices and protection from attachment of malicious devices like USB keys. | ||
* Protect Kubernetes plain-text cloud provider credentials (access keys, tokens, passwords, etc.) | ||
|
||
Kubernetes [secret](https://kubernetes.io/docs/concepts/configuration/secret/) objects are appropriate for holding small amounts of sensitive data. These are retained within etcd. These can be readily used to hold credentials for the Kubernetes API but there are times when a workload or an extension of the cluster itself needs a more full-featured solution. The HashiCorp Vault project is is a popular solution if you need more than the built-in secret objects can provide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is a popular
has double is
.
|
||
Backing up an etcd cluster can be accomplished with etcd’s [built-in](https://coreos.com/etcd/docs/latest/op-guide/recovery.html) snapshot mechanism, and copying the resulting file to storage in a different failure domain. The snapshot file contains all the Kubernetes states and critical information. In order to keep the sensitive Kubernetes data safe, encrypt the snapshot files. | ||
|
||
Using disk volume based snapshot recovery of etcd can have issues; see https://github.com/kubernetes/kubernetes/issues/40027. API-based backup solutions (e.g., [Ark](https://github.com/heptio/ark)) can offer more granular recovery than a etcd snapshot, but also can be slower. You could utilize both snapshot and API-based backups, but you should do one form of etcd backup as a minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issues; see kubernetes/kubernetes#40027
to:
issues. See #40027.
website
won't map the link same way as github does.
## Considerations for your production workloads | ||
Anti-affinity specifications can be used to split clustered services across backing hosts, but at this time the settings are used only when the pod is scheduled. This means that Kubernetes can restart a failed node of your clustered application, but does not have a native mechanism to rebalance after a fail back. This is a topic worthy of a separate blog, but supplemental logic might be useful to achieve optimal workload placements after host or worker node recoveries or expansions. The [Pod Priority and Preemption feature](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) can be used to specify a preferred triage in the event of resource shortages caused by failures or bursting workloads. | ||
|
||
For stateful services, external attached volume mounts are the standard Kubernetes recommendation for a non-clustered service (e.g., a typical SQL database). At this time Kubernetes managed snapshots of these external volumes is in the category of a [roadmap feature request](https://docs.google.com/presentation/d/1dgxfnroRAu0aF67s-_bmeWpkM1h2LCxe6lB1l1oS0EQ/edit#slide=id.g3ca07c98c2_0_47), likely to align with the Container Storage Interface (CSI) integration. Thus performing backups of such a service would involve application specific, in-pod activity that is beyond the scope of this document. While awaiting better Kubernetes support for a snapshot and backup workflow, running your database service in a VM rather than a container, and exposing it it to your Kubernetes workload may be worth consideration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it it to your
has double it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth consideration
-> worth considering
?
|
||
Buying a ticket on a commercial airline is convenient and safe. But when you travel to a remote location with a short runway, that commercial Airbus A320 flight isn’t an option. This doesn’t mean that air travel is off the table. It does mean that some compromises are necessary. | ||
|
||
The adage in aviation is that on a single engine aircraft, an engine failure means you crash. With twin engines, at the very least, you get more choices of where you crash. Kubernetes on a small number of hosts is sort of like this. And if your business case justifies it, you might scale up to a larger fleet of mixed large and small vehicles (e.g., FedEx, Amazon). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is sort of like this. And if
-> is similar, and if
FYI, i will submit a copy edit commit for this later today, as discussed with @kbarnard10. |
@kbarnard10 if you still want me to help with the edits, you need to grant me permission for the branch: though, i need to leave in a couple of hours and i can do the rest on Sunday or Monday. 👍 |
@neolit123 thanks for the edits, all your changes LGTM |
@neolit123 I made your suggested edits for time's sake. But will add you to future blog posts. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: natekartchner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kbarnard10 awesome, thanks. |
Adding blog post.