-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasonable defaults with eviction and PodDisruptionBudget #35318
Comments
I'm on the fence on this one. Would be interested to hear other folks' opinions. Here are my thoughts: It seems an alternative to your proposal, which would have the same effect, would be to just add an admission controller that adds a PDB to every collection that you think should have the rule you want (e.g. max 1 eviction at a time). The downside of this would be a proliferation of PDBs. If we implement your proposal, clients that want to get the behavior they get today in the absence of a PDB, instead of the new behavior you're introducing, could just use regular delete. (Maybe their logic could be: try eviction subresource first, if it is rejected, then just do regular delete. That way they wouldn't need to look up whether the pod is covered by a PDB first.) So it wouldn't be a big inconvenience. The one case I am worried about with your proposal is pods that are managed by a controller with replicas=1. Such a pod would block removing the node, even in the absence of a PDB. Maybe that is the right thing to do, but it would be better if we knew the user's intent, and the only way to know that would be to require them to put a PDB on it if they really don't want to allow such a node to be removed. I guess I am leaning slightly towards endorsing your suggestion but there may be other issues I didn't think of. cc/ @mml |
What class of things run as rc=1, ps=1? Important: Not important: In large clusters the latter are going to dwarf the former (many OpenShift clusters today are 95% the latter). If this change added the PR above, we'd probably revert it because it would break all of those very large, dense clusters (eviction would be impossible). On small clusters, it's much more likely you have the former - in which case this might be a reasonable default. But if you have a database or leader service and it doesn't have a PDB, you're just not using the system like you should. The only downside today is that eviction can be surprising (if you are relying on emptydir), but you can't prevent that even with PDB, so I don't think it's unreasonable to ask users to use PDB to avoid disruption. |
A global config flag to opt in to this makes sense, but I don't think that rs=1 or ps=1 implies disruption budget |
Sorry, what I said here was wrong. If we implement your proposal, clients that want to always delete in the absence of a PDB but respect the PDB if it exists (which is the behavior they get today) would have to look up whether the pod is covered by a PDB first, and use that to decide where to use /eviction or regular delete.
I guess a config flag to choose the behavior would be one approach, though I always worry about the support implications of making core behaviors configurable... |
There should be a default behavior for disruption budgets. The default should optimize for these things:
Some options:
The actual implementation could be to have a controller that makes PDBs for things without them and then cleans them up when not needed anymore, or for the eviction logic to just compute these sets, but not expose them. |
I would add one more criteria: the intended behavior of the system should be as simple to understand as possible. In this vein, I think saying "absence of PDB means unlimited evictions" is less surprising than "absence of PDB is like presence of a PDB tied to " Thus we should make the behavior explicit by requiring a PDB whenever you want to limit evictions--at least from the standpoint of this criteria, which I think is an important one for debugability and for administrators to understand what's going on. As for the options, it seems like (3) and (4) are clearly better than (1) and (2), or maybe I am missing something? I think maybe the right way to think about this feature in general is that one of the characteristics of a controller should be that it has a default policy for disruptions. Most controllers probably won't do anything; the default is "eviction is always allowed." Some, like PetSet, might want a policy (e.g. at most one down from the set at a time), and they can use an admission controller to apply it. The main issue I can see with this idea of requiring the default policies to be materialized, is that if the user wants their own policy, then they would presumably have to delete the PDB that was created by the admission controller before adding their own, which is annoying. The benefit of building a default policy into the PDB controller itself is that the user-supplied PDB can just override the default one that's built into the PDB, and the user doesn't have to delete anything to apply their own. But if you build the default policy into the PDB controller, you can't easily customize it on a per-kind basis (creator of a new kind would have to modify the PDB controller when they build a custom controller...ugh). So both options (absence of PDB means unlimited evictions, and absence of PDB means some default policy is applied) seem to have downsides. But because I think the default policy should probably depend on the type of controller, I think (3) is better than (4). |
A couple of other reasons why (3) seems better than (4)
|
I think (3) provides a sane, easily comprehensible, default behavior
|
I can think of cases where this default would be a pain in the neck, so I agree with @smarterclayton that you should be able to toggle with a flag. I think opt-out (i.e. default to on) is fine. Long run, we had previously discussed a way of generating template PDBs that some controller would use to generate real PDBs for matching controllers. The template PDBs could be thought of as specific service levels. E.g., for batch workloads an admin might want maxUnavailable of Is it worth it to implement this system instead of these hardcoded defaults and a flag? |
You mean toggle on a per-cluster basis? Isn't that going to cause a support nightmare?
I admit I don't remember this discussion. Is the idea basically the same as (3), where you can set a per-kind default PDB rule?
If each kind has an admission controller that creates the default PDB, you could set the default policy there, right? |
@mml What behavior would you expect on opt-out? That eviction with no PDB is equivalent to deletion? |
What class of things run as rc=1, ps=1? Important: The B of an A/B testNot important: In large clusters the latter are going to dwarf the former (many OpenShift The only downside today is that eviction can be surprising (if you are On Fri, Oct 21, 2016 at 4:09 PM, David Oppenheimer <[email protected]
|
Sure - my default position is do not make this change, but I can
understand why someone would want implicit PDB. But a better argument
would be a simple controller that creates a PDB for everything, rather
than having custom core behavior.
|
Is 3 disruptive to anyone with a large cluster today? Trying to consider On Mon, Oct 24, 2016 at 2:35 PM, Kenneth Owens [email protected]
|
Agree 3 sounds better than 4, and that both sound (at least naively) as On Sat, Oct 22, 2016 at 6:04 AM, David Oppenheimer <[email protected]
|
Yeah 3 addresses the large dense clusters case pretty well. I think it's a I think 3 is probably good enough I don't need the default, but defer to On Mon, Oct 24, 2016 at 5:28 PM, Kenneth Owens [email protected]
|
It doesn't look like we are going to be able to decide on a default behavior in time for 1.5, and talking about this more, I'm less sure that there is a single default that makes sense for everyone. Still, I'd like it if we can leave open the possibility of a default behavior in the future. |
Sgtm On Oct 26, 2016, at 7:11 PM, Eric Tune [email protected] wrote: It doesn't look like we are going to be able to decide on a default Still, I'd like it if we can leave open the possibility of a default — |
We will recommend that users in production scenarios create their own PDB as part of the petset config bundle. We will recommend that template authors provide a PDB too. Assuming #34776 is implemented, this can be independent of the scale for scalable petsets. |
Presonally I don't think there is a reasonable default PDB behavior that applies across all controllers. If PDBs are to be auto-generated, I think we need to figure out which controller auto-generates them. (I do like @erictune's idea to have controllers generate them better than my idea to have an admission controller do it.) For StatefulSet it seems pretty straightforward. And the user could presumably provide a hint in the Spec if necessary. For stateless, I'm not sure if the ReplicaSet, Deployment, or Service should be generating the PDB. |
Do we have an easy create command / explanation for PDB yet? Some of this
Making that work well goes a long way, and it's trivial to create an On Nov 6, 2016, at 11:46 AM, David Oppenheimer [email protected] Presonally I don't think there is a reasonable default PDB behavior that If PDBs are to be auto-generated, I think we need to figure out which For StatefulSet it seems pretty straightforward. And the user could For stateless, I'm not sure if the ReplicaSet, Deployment, or Service — |
[MILESTONENOTIFIER] Milestone Labels Incomplete Action required: This issue requires label changes. If the required changes are not made within 3 days, the issue will be moved out of the v1.8 milestone. kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup'].
Additional instructions available here
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I can see how specifying a PDB specifies behaviour, but looking through the docs (and this issue) I can't work out what the default behaviour is in absence of a PDB. By default is there no limitation to disruption? I spent some time looking for it, but hopefully I just missed it - where could I have found this info in the docs? |
There are no default PDBs so full disruption is allowed. We use this to provide sensible defaults for our users: https://github.com/mikkeloscar/pdb-controller |
I'm not a k8es pro, but my feeling is that without PDB no disruption is allowed. I.e., I've noticed the autoscaler not scale down otherwise empty nodes when the PODs were missing PDBs. But then I might also be entirely wrong... |
The autoscaler might handle this in a special way by itself. For instance in the case where a pod doesn't have a "parent" like a replicaset or statefulset etc. then it will not terminate the pod, even though the eviction API would allow it. But this is special logic in the autoscaler. |
No, when you haven't assigned a PDB to a workload, any disruption is allowed during a node upgrade.
…-------- Original Message --------
On 20 Jun 2019, 17:06, andig wrote:
> There are no default PDBs so full disruption is allowed.
I'm not a k8es pro, but my feeling is that without PDB no disruption is allowed. I.e., I've noticed the autoscaler not scale down otherwise empty nodes when the PODs were missing PDBs. But then I might also be entirely wrong...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, [view it on GitHub](#35318?email_source=notifications&email_token=AA5YK7ZB6PFY7SFY73UOXULP3OTHJA5CNFSM4CTWC352YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYF3WDQ#issuecomment-504085262), or [mute the thread](https://github.com/notifications/unsubscribe-auth/AA5YK72TKZU53NOPL3HLZIDP3OTHJANCNFSM4CTWC35Q).
|
We are moving to have kubectl drain use the eviction subresource (#34277), and honor
PodDisruptionBudget
.From comment by @erictune which better captures this requirement.
There should be a default behavior for disruption budgets.
The default should optimize for these things:
Some options:
The actual implementation could be to have a controller that makes PDBs for things without them and then cleans them up when not needed anymore, or for the eviction logic to just compute these sets, but not expose them.
cc @erictune @davidopp @kow3ns @janetkuo @caesarxuchao @ymqytw @kubernetes/sig-apps
The text was updated successfully, but these errors were encountered: