Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog Post: Out of the Clouds onto the Ground: How to Make Kubernetes Production Grade Anywhere #9716

Merged
merged 4 commits into from
Aug 3, 2018

Conversation

kbarnard10
Copy link
Contributor

Adding blog post.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2018
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 2, 2018
This reverts commit eeb1132.
@kbarnard10
Copy link
Contributor Author

/assign @zacharysarah
/assign @natekartchner

@k8sio-netlify-preview-bot
Copy link
Collaborator

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 69e85fc

https://deploy-preview-9716--kubernetes-io-master-staging.netlify.com

@k8sio-netlify-preview-bot
Copy link
Collaborator

k8sio-netlify-preview-bot commented Aug 2, 2018

Deploy preview for kubernetes-io-master-staging ready!

Built with commit fef53fc

https://deploy-preview-9716--kubernetes-io-master-staging.netlify.com

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbarnard10 @cantbewong @embano1
thanks for the writeup! 👍

i've added some comments and found a couple of typos.


**Authors**: Steven Wong (VMware), Michael Gasch (VMware)

This blog offers some guidelines for running a production-grade Kubernetes cluster in an environment like an on-premise data center or edge location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency for production grade


This article is directed at on-premise Kubernetes deployments on a hypervisor or bare-metal platform, facing finite backing resources compared to the expansibility of the major public clouds. However, some of these recommendations may also be useful in a public cloud if budget constraints limit the resources you choose to consume.

A single node bare-metal Minikube deployment may be cheap and easy, but is not production grade. Conversely, you’re not likely to achieve Google’s Borg experience in a retail store, branch office, or edge location -- nor are you likely to need it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- nor -> , nor


A single node bare-metal Minikube deployment may be cheap and easy, but is not production grade. Conversely, you’re not likely to achieve Google’s Borg experience in a retail store, branch office, or edge location -- nor are you likely to need it.

This blog offers some guidance on achieving a production-worthy Kubernetes deployment, even when dealing with some resource constraints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly production-worthy -> production worthy


![api server](/images/blog/2018-08-03-make-kubernetes-production-grade-anywhere/api-server.png)

Typically the API server, Controller Manager and Scheduler components are co-located within multiple instances of control plane (aka Master) nodes. Master nodes usually include etcd too – although there are high availability and large cluster scenarios that call for running etcd on independent hosts. The components can be run as containers, and optionally be supervised by Kubernetes, i.e., running as statics pods. The latter requires the kubelet agent on the control plane nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there were recent discussion to move away from the term master in k8s, yet we do have this everywhere in the docs and in the code base.

etcd too – although -> etcd too, although

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, i.e., running -> - i.e. running

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter requires the kubelet agent on the control plane nodes.

i think the latter it's not very clear, also the kubelet runs on every node, not only CP nodes.
i would omit the last sentence.


![kubernetes components HA](/images/blog/2018-08-03-make-kubernetes-production-grade-anywhere/kubernetes-components-ha.png)

Risks to these components include hardware failures, software bugs, bad updates, human errors, network outages, and overloaded systems resulting in resource exhaustion. Redundancy can mitigate the impact of many of these hazards. In addition, the resource scheduling and high availability features of a hypervisor platform can be useful to surpass what can be achieved using the Linux operating system, Kubernetes, and container runtime alone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, and container runtime -> and a container runtime


## Security

Every Kubernetes cluster has a cluster root Certificate Authority (CA). Master, Kubelet, and Kubectl certs need to be generated and installed. If you use an install tool or a distribution this may be handled for you. A manual process is described [here](https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/04-certificate-authority.md). You should be prepared to reinstall certificates in the event of node replacements or expansions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master, Kubelet, and Kubectl

this is more valid and also aligns with Kelsey's guide:

The Controller Manager, API Server, Scheduler, kubelet client, kube-proxy and administrator certificates

* Consider physical security, especially when deploying to edge or remote office locations that may be unattended. Include storage encryption to limit exposure from stolen devices and protection from attachment of malicious devices like USB keys.
* Protect Kubernetes plain-text cloud provider credentials (access keys, tokens, passwords, etc.)

Kubernetes [secret](https://kubernetes.io/docs/concepts/configuration/secret/) objects are appropriate for holding small amounts of sensitive data. These are retained within etcd. These can be readily used to hold credentials for the Kubernetes API but there are times when a workload or an extension of the cluster itself needs a more full-featured solution. The HashiCorp Vault project is is a popular solution if you need more than the built-in secret objects can provide.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is a popular has double is.


Backing up an etcd cluster can be accomplished with etcd’s [built-in](https://coreos.com/etcd/docs/latest/op-guide/recovery.html) snapshot mechanism, and copying the resulting file to storage in a different failure domain. The snapshot file contains all the Kubernetes states and critical information. In order to keep the sensitive Kubernetes data safe, encrypt the snapshot files.

Using disk volume based snapshot recovery of etcd can have issues; see https://github.com/kubernetes/kubernetes/issues/40027. API-based backup solutions (e.g., [Ark](https://github.com/heptio/ark)) can offer more granular recovery than a etcd snapshot, but also can be slower. You could utilize both snapshot and API-based backups, but you should do one form of etcd backup as a minimum.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issues; see kubernetes/kubernetes#40027

to:

issues. See #40027.

website won't map the link same way as github does.

## Considerations for your production workloads
Anti-affinity specifications can be used to split clustered services across backing hosts, but at this time the settings are used only when the pod is scheduled. This means that Kubernetes can restart a failed node of your clustered application, but does not have a native mechanism to rebalance after a fail back. This is a topic worthy of a separate blog, but supplemental logic might be useful to achieve optimal workload placements after host or worker node recoveries or expansions. The [Pod Priority and Preemption feature](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) can be used to specify a preferred triage in the event of resource shortages caused by failures or bursting workloads.

For stateful services, external attached volume mounts are the standard Kubernetes recommendation for a non-clustered service (e.g., a typical SQL database). At this time Kubernetes managed snapshots of these external volumes is in the category of a [roadmap feature request](https://docs.google.com/presentation/d/1dgxfnroRAu0aF67s-_bmeWpkM1h2LCxe6lB1l1oS0EQ/edit#slide=id.g3ca07c98c2_0_47), likely to align with the Container Storage Interface (CSI) integration. Thus performing backups of such a service would involve application specific, in-pod activity that is beyond the scope of this document. While awaiting better Kubernetes support for a snapshot and backup workflow, running your database service in a VM rather than a container, and exposing it it to your Kubernetes workload may be worth consideration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it it to your has double it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth consideration -> worth considering ?


Buying a ticket on a commercial airline is convenient and safe. But when you travel to a remote location with a short runway, that commercial Airbus A320 flight isn’t an option. This doesn’t mean that air travel is off the table. It does mean that some compromises are necessary.

The adage in aviation is that on a single engine aircraft, an engine failure means you crash. With twin engines, at the very least, you get more choices of where you crash. Kubernetes on a small number of hosts is sort of like this. And if your business case justifies it, you might scale up to a larger fleet of mixed large and small vehicles (e.g., FedEx, Amazon).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sort of like this. And if -> is similar, and if

@neolit123
Copy link
Member

neolit123 commented Aug 3, 2018

FYI, i will submit a copy edit commit for this later today, as discussed with @kbarnard10.

@neolit123
Copy link
Member

@kbarnard10
Github tells me that i don't have push access to the kbarnard10:blog branch.

if you still want me to help with the edits, you need to grant me permission for the branch:
https://help.github.com/articles/enabling-branch-restrictions/

though, i need to leave in a couple of hours and i can do the rest on Sunday or Monday. 👍

@cantbewong
Copy link
Contributor

@neolit123 thanks for the edits, all your changes LGTM

@kbarnard10
Copy link
Contributor Author

@neolit123 I made your suggested edits for time's sake. But will add you to future blog posts.

@natekartchner
Copy link

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: natekartchner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2018
@k8s-ci-robot k8s-ci-robot merged commit 4af1c1c into kubernetes:master Aug 3, 2018
@neolit123
Copy link
Member

@kbarnard10 awesome, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants