-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up the HA docs #9387
Clean up the HA docs #9387
Conversation
docs/operations/high_availability.md
Outdated
--networking weave \ | ||
--cloud-labels "Team=Dev,Owner=John Doe" \ | ||
--image 293135079892/k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-11-16 \ | ||
--networking cilium \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mention Cilium here. If you want, maybe Calico or Weave which seem simpler to digest for beginners (less options).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weave is listed as experimental on our networking page, which is why I am changing weave for something else when I can. But I am happy with using calico.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be better as I think is the default in other places, like Docker Desktop and Enterprise. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how experimental Weave is. I don't use it, but kept it up to date and seems pretty stable in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weave is experimental per https://kops.sigs.k8s.io/networking/ :) I think I actually had weave a stable initially, but someone said it probably wasn't and I decided to be conservative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After getting a ten-hour 3AM call about Weave, I'm reluctant to call it stable. I'm given to understand it has at its core an algorithm with polynomial complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK 😄
Do we all agree that Calico is stable and the easiest one to get started with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't :)
After some sleep, I am wondering if our examples should just put "" placeholder instead of a specific one. Or we need to have an "official" opinion on what users should go with. Most of the docs tries to be neutral here.
If we do go for e.g calico as the recommended provider, we should probably also use that one as default instead of kubenet, since you typically don't want to use kubenet.
This is probably worth an issue on its own though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neutral is good enough for me :)
docs/operations/high_availability.md
Outdated
|
||
Kubernetes has two strategies for high availability: | ||
For testing purposes, kubernetes works just fine with a single master. However, when the master becomes unavailable, for example due to upgrade or instance failure, the kubernetes API will be unavailable. Pods and services that are running on the continues to operate as long as they do not depend on interacting with the API, but operations such as adding nodes, scaling pods, replacing terminated pods will not work. Running kubectl will also not work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing word (maybe "nodes"?): "running on the continues to operate"
docs/operations/high_availability.md
Outdated
|
||
* Run multiple independent clusters and combine them behind one management plane: [federation](https://kubernetes.io/docs/user-guide/federation/) | ||
* Run a single cluster in multiple cloud zones, with redundant components | ||
kops runs each master in a dedicated autoscaling groups (ASG) and stores data on ESB volumes. That way, if a master node is terminated the ASG will launch a new master instance with the master's volume. Because of the dedicated ESB volumes, each master is bound to a fixed Availability Zone (AZ). If the AZ becomes unavailable, the master instance in that AZ will also become unavailable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/groups/group (or remove "a" from "a dedicated autoscaling group")
Nit: s/ESB/EBS
We should probably try to call them "control plane nodes" instead of "master" also.
Nits aside, love this paragraph - it explains a tricky subject well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nits.
I am all for changing "master" to "control plane nodes" (something a bit less verbose would be nice though) if that is what k/k is doing as well. But we should have a plan on changing this everywhere. It would probably be a good idea to do this as part of #9178
docs/operations/high_availability.md
Outdated
kops has good support for a cluster than runs | ||
with redundant components. kops is able to create multiple kubernetes masters, so in the event of | ||
a master instance failure, the kubernetes API will continue to operate. | ||
For production use, you therefor want to run kubernetes in a HA setup with multiple masters. With multiple master nodes, you will be able both to do graceful, zero-down time upgrades, and you will be able to survive AZ failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: therefor -> therefore
docs/operations/high_availability.md
Outdated
|
||
When you first call `kops create cluster`, you specify the `--master-zones` flag listing the zones you want your masters | ||
to run in, for example: | ||
The simplest way to get started with a HA cluster is to run `kops create cluster` as shown below. The `--master-zones` flag listing the zones you want your masters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/listing/lists
A few nits, and we should probably prefer the term "control plane" over "master", but I guess that is confusing because of the flags also. |
I would also add somewhere to be mindful of how many AZs are used for HA. Transferring data between AZs can be expensive. This is why, maybe limit to 2 AZs for worker nodes. |
Do we really want to recommend running workers in two AZs? That would at least be a "you better know what you are doing" disclaimer with that as if you run apps that requires quorum, you'll have downtime should the wrong AZ fail. I am considering a warning about running e.g 5 AZs though. It gives you higher fault tolerance, but in most cases, a bit too much. |
/retest |
Apps with quorum is a totally different story. To get to those you actually have to get past the beginner status. Also have to understand how pod scheduling works because you may only think it works and instead all your quorum pods are in same AZ. I don't say we recommend 2 AZs. I just mean to phrase it in a way that explains that inter-AZ traffic costs in most cases. You need 2+ AZ for HA, but depends on use case how many. |
Something like this? |
Yes, sounds pretty good. |
/lgtm |
/retest |
/assign @zetaab |
Co-authored-by: Ciprian Hacman <[email protected]>
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus, rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #8769