Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support overprovsioning without pending pods #5377

Closed
grosser opened this issue Dec 20, 2022 · 29 comments
Closed

support overprovsioning without pending pods #5377

grosser opened this issue Dec 20, 2022 · 29 comments
Labels
area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@grosser
Copy link
Contributor

grosser commented Dec 20, 2022

Which component are you using?:

cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

we use pending pods for overprovisioning atm, but that results in there not being "real" gaps,
so when an az preferred workload wants to get scheduled it always goes to the open az instead of kicking out an overprovisioned pod

Describe the solution you'd like.:

make CA pretend there are pending pods when doing the math and scale based on that

Describe any alternative solutions you've considered.:

current recommended approahc

Additional context.:

#4384 describes the same issue but lacks context

/cc @mattyev87 @MaciekPytel

@grosser grosser added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 20, 2022
@vadasambar
Copy link
Member

@grosser would the following not work for you?

  1. Set --expendable-pods-priority-cutoff (default is -10 ref)
  2. Use a PriorityClass value lower than the value set for --expendable-pods-priority-cutoff for over-provisioning/dummy pods
  3. cluster-autoscaler will kick out the over-provisioning pods to make space for the workload you want to schedule

@grosser
Copy link
Contributor Author

grosser commented Mar 14, 2023

That does not work.

  • descheduler does not deschedule pods when there is no room
  • scheduler does not kick out ScheduleAnyway pods when there is room on other nodes

@vadasambar
Copy link
Member

descheduler does not deschedule pods when there is no room

I am not sure about the behavior of descheduler with cluster-autoscaler so I will let someone else answer this.

scheduler does not kick out ScheduleAnyway pods when there is room on other nodes

This should work. I wonder if the pods you want to schedule have a higher PriorityClass (default preemptionPolicy should be PreemptLowerPriority) than overprovisioning pods. If they don't, you might see an issue like this. If they do and you are seeing this issue, it might be an issue with the scheduler (you might know this already but, please make sure your cluster-autoscaler version matches with your Kubernetes version).

@grosser
Copy link
Contributor Author

grosser commented Mar 17, 2023

Re scheduler: it does not preempt

setup:

  • 3 nodes with 1 being full
  • schedule 4 pods with node-critical priorityclass (higher then workload on full node)
kubectl get pods -l role=scheduleanyway -L topology.kubernetes.io/zone
NAME                              READY   STATUS    RESTARTS   AGE   ZONE
scheduleanyway-6fc7b888c7-5wwxp   1/1     Running   0          7s    us-west-2a
scheduleanyway-6fc7b888c7-cppls   1/1     Running   0          7s    us-west-2a
scheduleanyway-6fc7b888c7-f6nkv   1/1     Running   0          7s    us-west-2c
scheduleanyway-6fc7b888c7-lspxr   1/1     Running   0          7s    us-west-2c

workloads on the full node were not preempted

@grosser
Copy link
Contributor Author

grosser commented Mar 17, 2023

so that's why "imaginary pods" would allow the scheduler to use the right az for ScheduleAnyway

@vadasambar
Copy link
Member

vadasambar commented Mar 20, 2023

@grosser when you say az preferred workload, are you using node affinity with preferredDuringSchedulingIgnoredDuringExecution ? If yes, note that CA doesn't respect soft constraints

However, CA does not consider "soft" constraints like preferredDuringSchedulingIgnoredDuringExecution when selecting node groups. That means that if CA has two or more node groups available for expansion, it will not use soft constraints to pick one node group over another.

Ref: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#does-ca-respect-node-affinity-when-selecting-node-groups-to-scale-up

You might want to use requiredDuringSchedulingIgnoredDuringExecution or nodeSelector

@grosser
Copy link
Contributor Author

grosser commented Mar 20, 2023

I'm using

            topologySpreadConstraints:
            - labelSelector:
                matchLabels:
                  project: test
              maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway

and with the scheduler not pre-empting other workloads it does not matter if we have a buffer
it would only work if there was free space
I know I can use DoNotSchedule but that has it's own issues

@vadasambar
Copy link
Member

@grosser
Copy link
Contributor Author

grosser commented Mar 21, 2023 via email

@vadasambar
Copy link
Member

Interesting.

if you select whenUnsatisfiable: ScheduleAnyway, the scheduler gives higher precedence to topologies that would help reduce the skew.

I am not sure if cluster-autoscaler should be supporting this since it seems like a soft constraint (related). I wonder what problems DoNotSchedule create.

@grosser
Copy link
Contributor Author

grosser commented Mar 22, 2023

problems that DoNotSchedule creates:

  • capacity waits for new nodes to spin up (whereas ScheduleAnyway can start immediately and later be descheduled)
  • it can end up creating lots of pending pods when 1 az is in trouble (for example 2/9 running and 7/9 pending when 1 az is unavailable)

@vadasambar
Copy link
Member

I would expect DoNotSchedule would make CA (cluster-autoscaler) preempt the dummy/imaginary pods used for over provisioning (provided they have lower priority class) and schedule them somewhere else (different az nodes). If that is not happening, it might be a possible bug.

@grosser
Copy link
Contributor Author

grosser commented Mar 23, 2023

That does work, the scheduler will kick out the overprovisioned pods.

@grosser
Copy link
Contributor Author

grosser commented Mar 24, 2023

here is a PR to support this #5611

@grosser
Copy link
Contributor Author

grosser commented Mar 24, 2023

thx, but they don't really help me ... I want to make ScheduleAnyway behave nicer to reduce our skew.
... I'll test-run the PR and see if that helps, at least a simple POC worked

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2023
@vadasambar
Copy link
Member

@grosser
Copy link
Contributor Author

grosser commented Jul 3, 2023

thx, looks interesting

@Shubham82
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 26, 2024
@towca towca added the area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. label Mar 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@grosser
Copy link
Contributor Author

grosser commented Sep 20, 2024

/reopen

I tried ProvisionRequest, but it's not what I want:

  • it is only usable by 1 namespace
  • it expires after 10m

... what I want is a "ProvisionRequest" that is global + never expires

@k8s-ci-robot
Copy link
Contributor

@grosser: Reopened this issue.

In response to this:

/reopen

I tried ProvisionRequest, but it's not what I want:

  • it is only usable by 1 namespace
  • it expires after 10m

... what I want is a "ProvisionRequest" that is global + never expires

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 20, 2024
@grosser
Copy link
Contributor Author

grosser commented Sep 23, 2024

any thoughts on reading a CRD that we than convert to an in-memory ProvisionRequest so CA makes room, but the actual scheduler ignores it ?
(or maybe a sub-type of ProvisionRequest)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants