-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature blog for KEP-3017: Unhealthy Pod Eviction Policy for PDBs #37627
Feature blog for KEP-3017: Unhealthy Pod Eviction Policy for PDBs #37627
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site settings. |
Hello @atiratree , v1.26 Comms lead here. This feature blog is on a feature tracked for release, the deadline for submitting a draft is the 29th of November; this should be considered the hard limit since we will need to review/edit/discuss the draft, so if at all possible it's better to submit it earlier to avoid any problems. Any doubts, we're here to help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @atiratree, v1.26 release comms shadow here, as the deadline is already over and the feature blog PR is not up yet, should be consider it cancelled for this release.
You can still blog about the new feature, just not as part of the post-release blog series for v1.26. Please do consider tweaking the article for publication early next year. |
3d47da9
to
2903c7f
Compare
content/en/blog/_posts/2022-12-08-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
@Debanitrkl @fsmunoz @sftim sorry for the constant delays. I have pushed the blog proposal and will try to find people to review this. Not sure if it can still make it to the post-release blog series, but I will be happy to see this published even on a later date. |
Thank you @atiratree , we're looking at it! |
2903c7f
to
432fb0d
Compare
@atiratree publication date for this feature blog is January 6, 2023, you can update the date in the header. We have the series in good shape so I don't see a problem in adding this one, regardless of the missed deadlines. The publication date is still relatively far away, but we should be able to review and merge it before, assuming that comments and suggestions are quickly reflected - your help in that is greatly appreciated. |
content/en/blog/_posts/2022-12-08-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
432fb0d
to
47888f6
Compare
Thank you. I have updated the date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Here's some more feedback.
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
--- | ||
|
||
**Authors:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a few words before the first heading. We avoid starting blog articles with a 2nd level heading. You could omit “What problems does this solve?”; I think it'd be better to add 50ish words to summarize what you're saying in the article.###
Please also mention that this is describing a feature released last month, ie December 2022.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atiratree maybe something like:
Ensuring the disruptions to your applications do not affect its availability isn't a simple task. With
the introduction of Unhealthy Pod Eviction Policy for Pod Disruption Budgets this is going to change.
In this article, we will dive deeper into what modifications were introduced in PDBs to give application
owners greater flexibility in managing disruptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensuring the disruptions to your applications do not affect its availability isn't a simple | |
task. With the introduction of _unhealthy pod eviction policy_ for PodDisruptionBudgets, | |
the way you can ensure this has changed again. | |
In this article, we will dive deeper into what modifications were introduced for PDBs to | |
give application owners greater flexibility in managing disruptions. | |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh thanks for the suggestion. I have added it there, with the link to PDBs.
With the introduction of unhealthy pod eviction policy for PodDisruptionBudgets,
the way you can ensure this has changed again.
@sftim not sure what his is trying to say. Can we keep Maciej's version?
Please also mention that this is describing a feature released last month, ie December 2022.
Do we need to mention the time, when we specify the version in the title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by time? We don't need to be precise to the hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I meant the date and not the time. I have rewritten the going to change
part. Please let me know if you have something better in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we also mention the k8s version in the What problems does this solve?
section
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
you are choosing the best effort availability for your application. | ||
This will make it easier to maintain and upgrade your clusters. | ||
|
||
Eviction API will consider these policies when eviction of a pod that is guarded by a PDB is requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could the outcome be in the default IfHealthyBudget
case? I mean: it's not always a successful create for the subresource.
I think we're assuming that the reader already knows this, and that we should instead make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I follow.
we are describing the behaviour of the default IfHealthyBudget in the section above with.
Eviction of pods respects PodDisruptionBudgets (PDBs). This means that the eviction of a pod
should not disrupt a guarded application and `.status.currentHealthy` of a PDB should not fall
below `.status.desiredHealthy`. Running pods that are [Unhealthy](/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod)
do not count towards the PDB status, but eviction of these is only possible in case the application
is not disrupted. This helps disrupted or not yet started application to achieve availability
as soon as possible without additional downtime that would be caused by evictions.
or are you suggesting we should describe the edge cases of Eviction API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If:
- you have a workload with n Pods
- you protect the workload with a PodDisruptionBudget
- you choose, or default to,
IfHealthyBudget
- several of those Pods are in CrashLoopBackoff
and you then
- (attempt to) drain a node where the CrashLoopBackoffs are happening
what actually happens? It'd be good to summarise this mainly in terms of kubectl drain
, but readers might also want to know at API level. For example: you try to create an eviction, and that request fails with error 418, I'm a teapot
(OK, probably not that specific error code).
My ask here is to explain the outcome that you don't see if you enable the feature gate and set a custom unhealthyPodEvictionPolicy
, so that readers can appreciate the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I have tried to put more details to describe the behavior.
Unfortunately, this poses a problem for cluster administrators that would like to drain nodes
without any manual interventions. Misbehaving applications with pods in `CrashLoopBackOff`
state (due to a bug or misconfiguration) or pods that are simply failing to become ready
make this task much harder. Any eviction request will fail due to violation of a PDB,
when all pods of an application are unhealthy. Draining of a node cannot make any progress
in that case.
Do you think it make sense to describe the exact message and http code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, you should describe something. I haven't tried the feature out so I don't know what you need to say. I don't think “Eviction API will consider these policies when eviction of a pod that is guarded by a PDB is requested.” feels like enough.
That said, if we had to we could merge this and live with the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like:
Eviction is the process that triggers graceful pod termination. The process can be initiated either by a kubectl drain
command, or due to specific conditions in the cluster (see node-pressure eviction or api-eviction). During this process every pod removal is consulted with appropriate PDBs, to ensure that a requested number of pods is always running in the cluster. The aforementioned policies allow PDB authors to have a greater control how the process counts not healthy pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the suggestion, I used part of it and reorganized the section: https://github.com/kubernetes/website/compare/4629fb1dcda1ab70deaed7a30f89228d0d1fbb6d..e27c6086386bcbdc67b52cf7c9909d3c5caf788d.
I think we should not mention the node-pressure eviction here, since it could distract from the main topic. The node-pressure eviction does not abide by the PDBs. I have tried to highlight that we are dealing with the API initiated eviction.
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,76 @@ | |||
--- | |||
layout: blog | |||
title: "Eviction policy for unhealthy pods guarded by PodDisruptionBudgets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: "Eviction policy for unhealthy pods guarded by PodDisruptionBudgets" | |
title: "Kubernetes 1.26: Configure How PodDisruptionBudgets Affect Unhealthy Pods" |
How about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have put the Kubernetes 1.26
just for now. I like the rest Configure How PodDisruptionBudgets Affect Unhealthy Pods
, but I am wondering if we could/should mention the eviction part as well?
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
--- | ||
|
||
**Authors:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atiratree maybe something like:
Ensuring the disruptions to your applications do not affect its availability isn't a simple task. With
the introduction of Unhealthy Pod Eviction Policy for Pod Disruption Budgets this is going to change.
In this article, we will dive deeper into what modifications were introduced in PDBs to give application
owners greater flexibility in managing disruptions.
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
47888f6
to
4fa924b
Compare
feebd52
to
4629fb1
Compare
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
We can merge this if needed; I'd prefer some more polish first. |
@atiratree if you could take a look at the added review comments, it would be great. Publication date is 4 days ahead, so there's still time. Otherwise, we can default to merge as-is, but this would be a last option. |
@sftim I am trying to address the missing pieces. can you please take a look again? |
@atiratree Please go ahead and revise the text. |
@tengqm do you have anything in particular in mind? I believe I have replied to all of the remarks so far. |
content/en/blog/_posts/2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
Outdated
Show resolved
Hide resolved
|
||
## What problems does this solve? | ||
|
||
Eviction of pods respects PodDisruptionBudgets (PDBs). This means that the eviction of a pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit / optional change)
Eviction of pods respects PodDisruptionBudgets (PDBs). This means that the eviction of a pod | |
Eviction of pods respects PodDisruptionBudgets (PDBs). This means that a | |
[voluntary disruption](https://kubernetes.io/docs/concepts/scheduling-eviction/#pod-disruption) | |
to a Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could link that. What do you think about still having the eviction mentioned there?
This means that a requested [voluntary disruption](https://kubernetes.io/docs/concepts/scheduling-eviction/#pod-disruption)
via an eviction to a Pod, should not disrupt a guarded application and `.status.currentHealthy` of a PDB should not fall
below `.status.desiredHealthy`
4629fb1
to
e27c608
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two more nits, but otherwise this looks great to me
|
||
API-initiated eviction is the process that triggers graceful pod termination. | ||
The process can be initiated either by calling the API directly, | ||
by using a kubectl drain command, or other actors in the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
by using a kubectl drain command, or other actors in the cluster. | |
by using a `kubectl drain` command, or other actors in the cluster. |
|
||
There are two policies `IfHealthyBudget` and `AlwaysAllow` to choose from. | ||
|
||
The former, `IfHealthyBudget`, follows the existing behavior to achieve the best availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this does match the current behavior, I'd add a one sentence how it works. Something like:
Unhealthy pods can be disrupted only if the applications has a minimum available .status.desiredHealthy
number of pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it. @atiratree even after merge, you can open a PR to make minor edits - eg, see #37627 (review)
/lgtm
/approve
LGTM label has been added. Git tree hash: 72d92fd88ac87e9803c0b39f157886feddf08158
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sftim, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
enhancement: kubernetes/enhancements#3017
API changes + feature gate: kubernetes/kubernetes#113375