support overprovsioning without pending pods #5377

grosser · 2022-12-20T22:21:38Z

Which component are you using?:

cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

we use pending pods for overprovisioning atm, but that results in there not being "real" gaps,
so when an az preferred workload wants to get scheduled it always goes to the open az instead of kicking out an overprovisioned pod

Describe the solution you'd like.:

make CA pretend there are pending pods when doing the math and scale based on that

Describe any alternative solutions you've considered.:

current recommended approahc

Additional context.:

#4384 describes the same issue but lacks context

/cc @mattyev87 @MaciekPytel

vadasambar · 2023-03-13T06:31:23Z

@grosser would the following not work for you?

Set --expendable-pods-priority-cutoff (default is -10 ref)
Use a PriorityClass value lower than the value set for --expendable-pods-priority-cutoff for over-provisioning/dummy pods
cluster-autoscaler will kick out the over-provisioning pods to make space for the workload you want to schedule

grosser · 2023-03-14T01:01:26Z

That does not work.

descheduler does not deschedule pods when there is no room
scheduler does not kick out ScheduleAnyway pods when there is room on other nodes

vadasambar · 2023-03-14T04:28:29Z

descheduler does not deschedule pods when there is no room

I am not sure about the behavior of descheduler with cluster-autoscaler so I will let someone else answer this.

scheduler does not kick out ScheduleAnyway pods when there is room on other nodes

This should work. I wonder if the pods you want to schedule have a higher PriorityClass (default preemptionPolicy should be PreemptLowerPriority) than overprovisioning pods. If they don't, you might see an issue like this. If they do and you are seeing this issue, it might be an issue with the scheduler (you might know this already but, please make sure your cluster-autoscaler version matches with your Kubernetes version).

grosser · 2023-03-17T20:27:56Z

Re scheduler: it does not preempt

setup:

3 nodes with 1 being full
schedule 4 pods with node-critical priorityclass (higher then workload on full node)

kubectl get pods -l role=scheduleanyway -L topology.kubernetes.io/zone
NAME                              READY   STATUS    RESTARTS   AGE   ZONE
scheduleanyway-6fc7b888c7-5wwxp   1/1     Running   0          7s    us-west-2a
scheduleanyway-6fc7b888c7-cppls   1/1     Running   0          7s    us-west-2a
scheduleanyway-6fc7b888c7-f6nkv   1/1     Running   0          7s    us-west-2c
scheduleanyway-6fc7b888c7-lspxr   1/1     Running   0          7s    us-west-2c

workloads on the full node were not preempted

grosser · 2023-03-17T20:30:04Z

so that's why "imaginary pods" would allow the scheduler to use the right az for ScheduleAnyway

vadasambar · 2023-03-20T06:15:45Z

@grosser when you say az preferred workload, are you using node affinity with preferredDuringSchedulingIgnoredDuringExecution ? If yes, note that CA doesn't respect soft constraints

However, CA does not consider "soft" constraints like preferredDuringSchedulingIgnoredDuringExecution when selecting node groups. That means that if CA has two or more node groups available for expansion, it will not use soft constraints to pick one node group over another.

Ref: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#does-ca-respect-node-affinity-when-selecting-node-groups-to-scale-up

You might want to use requiredDuringSchedulingIgnoredDuringExecution or nodeSelector

grosser · 2023-03-20T06:31:57Z

I'm using

            topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  project: test
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway

and with the scheduler not pre-empting other workloads it does not matter if we have a buffer
it would only work if there was free space
I know I can use DoNotSchedule but that has it's own issues

vadasambar · 2023-03-21T05:44:06Z

I found a hit for tests around TopologySpreadConstraint this in the code and a related issue which seems to suggest CA does support TopologySpreadConstraint.

This might need a deeper look.

grosser · 2023-03-21T05:52:45Z

It supports it, but only for DoNotScheule

…

On Mon, Mar 20, 2023 at 10:44 PM Suraj Banakar(बानकर) | スラジ < ***@***.***> wrote: I found a hit for tests around TopologySpreadConstraint this in the code <https://github.com/kubernetes/autoscaler/blob/18f2e67c4f1bed6bbb1e3273bfcd387505a090aa/cluster-autoscaler/estimator/binpacking_estimator_test.go#L63-L77> and a related issue which seems to suggest CA does support TopologySpreadConstraint <#3879>. This might need a deeper look. — Reply to this email directly, view it on GitHub <#5377 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACYZ3W3SAFW4QGQDEV5V3W5E53BANCNFSM6AAAAAATE7LEOQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

vadasambar · 2023-03-22T05:34:11Z

Interesting.

if you select whenUnsatisfiable: ScheduleAnyway, the scheduler gives higher precedence to topologies that would help reduce the skew.

I am not sure if cluster-autoscaler should be supporting this since it seems like a soft constraint (related). I wonder what problems DoNotSchedule create.

grosser · 2023-03-22T06:23:15Z

problems that DoNotSchedule creates:

capacity waits for new nodes to spin up (whereas ScheduleAnyway can start immediately and later be descheduled)
it can end up creating lots of pending pods when 1 az is in trouble (for example 2/9 running and 7/9 pending when 1 az is unavailable)

vadasambar · 2023-03-23T04:26:33Z

I would expect DoNotSchedule would make CA (cluster-autoscaler) preempt the dummy/imaginary pods used for over provisioning (provided they have lower priority class) and schedule them somewhere else (different az nodes). If that is not happening, it might be a possible bug.

grosser · 2023-03-23T04:55:56Z

That does work, the scheduler will kick out the overprovisioned pods.

grosser · 2023-03-24T04:50:40Z

here is a PR to support this #5611

vadasambar · 2023-03-24T05:24:48Z

You might want to check

cluster-autoscaler : KubeSchedulerConfiguration plugin configuration PodTopologySpread #3879 (comment)
Getting the CA to play well with a custom scheduler #1406 (comment)

grosser · 2023-03-24T05:30:43Z

thx, but they don't really help me ... I want to make ScheduleAnyway behave nicer to reduce our skew.
... I'll test-run the PR and see if that helps, at least a simple POC worked

k8s-triage-robot · 2023-06-22T06:21:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vadasambar · 2023-07-03T06:17:22Z

@grosser you might find https://github.com/kubernetes/autoscaler/pull/5848/files interesting.

grosser · 2023-07-03T18:37:33Z

thx, looks interesting

Shubham82 · 2023-11-28T12:45:13Z

/remove-lifecycle stale

k8s-triage-robot · 2024-02-26T12:46:27Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-04-20T13:51:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-05-20T14:10:26Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-05-20T14:10:30Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

grosser · 2024-09-20T18:03:55Z

/reopen

I tried ProvisionRequest, but it's not what I want:

it is only usable by 1 namespace
it expires after 10m

... what I want is a "ProvisionRequest" that is global + never expires

k8s-ci-robot · 2024-09-20T18:04:01Z

@grosser: Reopened this issue.

In response to this:

/reopen

I tried ProvisionRequest, but it's not what I want:

it is only usable by 1 namespace

it expires after 10m

... what I want is a "ProvisionRequest" that is global + never expires

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

grosser · 2024-09-23T18:26:12Z

any thoughts on reading a CRD that we than convert to an in-memory ProvisionRequest so CA makes room, but the actual scheduler ignores it ?
(or maybe a sub-type of ProvisionRequest)

k8s-triage-robot · 2024-10-23T18:50:46Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-10-23T18:50:52Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

grosser added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 20, 2022

jbartosik added the area/cluster-autoscaler label Jan 9, 2023

vadasambar mentioned this issue Mar 20, 2023

Mar 2023 vadafoss/daily-updates#7

Closed

grosser mentioned this issue Mar 24, 2023

leave a buffer of underutilized nodes when scaling down #5611

Closed

vadasambar mentioned this issue Apr 3, 2023

Apr 2023 vadafoss/daily-updates#8

Closed

vadasambar mentioned this issue May 2, 2023

May 2023 vadafoss/daily-updates#9

Closed

vadasambar mentioned this issue Jun 1, 2023

Jun 2023 vadafoss/daily-updates#10

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2023

vadasambar mentioned this issue Jul 3, 2023

Jul 2023 vadafoss/daily-updates#11

Closed

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 26, 2024

towca added the area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. label Mar 21, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024

k8s-ci-robot reopened this Sep 20, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support overprovsioning without pending pods #5377

support overprovsioning without pending pods #5377

grosser commented Dec 20, 2022

vadasambar commented Mar 13, 2023

grosser commented Mar 14, 2023

vadasambar commented Mar 14, 2023

grosser commented Mar 17, 2023

grosser commented Mar 17, 2023

vadasambar commented Mar 20, 2023 •

edited

Loading

grosser commented Mar 20, 2023

vadasambar commented Mar 21, 2023

grosser commented Mar 21, 2023 via email

vadasambar commented Mar 22, 2023

grosser commented Mar 22, 2023

vadasambar commented Mar 23, 2023

grosser commented Mar 23, 2023

grosser commented Mar 24, 2023

vadasambar commented Mar 24, 2023

grosser commented Mar 24, 2023

k8s-triage-robot commented Jun 22, 2023

vadasambar commented Jul 3, 2023

grosser commented Jul 3, 2023

Shubham82 commented Nov 28, 2023

k8s-triage-robot commented Feb 26, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-triage-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

grosser commented Sep 20, 2024

k8s-ci-robot commented Sep 20, 2024

grosser commented Sep 23, 2024

k8s-triage-robot commented Oct 23, 2024

k8s-ci-robot commented Oct 23, 2024

support overprovsioning without pending pods #5377

support overprovsioning without pending pods #5377

Comments

grosser commented Dec 20, 2022

vadasambar commented Mar 13, 2023

grosser commented Mar 14, 2023

vadasambar commented Mar 14, 2023

grosser commented Mar 17, 2023

grosser commented Mar 17, 2023

vadasambar commented Mar 20, 2023 • edited Loading

grosser commented Mar 20, 2023

vadasambar commented Mar 21, 2023

grosser commented Mar 21, 2023 via email

vadasambar commented Mar 22, 2023

grosser commented Mar 22, 2023

vadasambar commented Mar 23, 2023

grosser commented Mar 23, 2023

grosser commented Mar 24, 2023

vadasambar commented Mar 24, 2023

grosser commented Mar 24, 2023

k8s-triage-robot commented Jun 22, 2023

vadasambar commented Jul 3, 2023

grosser commented Jul 3, 2023

Shubham82 commented Nov 28, 2023

k8s-triage-robot commented Feb 26, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-triage-robot commented May 20, 2024

k8s-ci-robot commented May 20, 2024

grosser commented Sep 20, 2024

k8s-ci-robot commented Sep 20, 2024

grosser commented Sep 23, 2024

k8s-triage-robot commented Oct 23, 2024

k8s-ci-robot commented Oct 23, 2024

vadasambar commented Mar 20, 2023 •

edited

Loading