Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reclaim consumed resources once quit from cohort #1149

Open
kerthcet opened this issue Sep 22, 2023 · 5 comments
Open

Reclaim consumed resources once quit from cohort #1149

kerthcet opened this issue Sep 22, 2023 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@kerthcet
Copy link
Contributor

What happened:

image

When clusterQueue quits from the cohort, the resources consumed will not be reclaimed.

  1. rayjob-high and rayjob-low both submitted to the cluster-queue-1
  2. cluster-queue-1 and cluster-queue-2 belong to the same cohort cohort-pool at first
  3. resources(rayjob-high) + resources(rayjob-low) > capacity(cluster-queue-1)
  4. then remove the cohort from cluster-queue-1
  5. rayjob-high and rayjob-low will keep running

What you expected to happen:

high priority rayjob will preempt the lower one.

How to reproduce it (as minimally and precisely as possible):

The process described above.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kueue version (use git describe --tags --dirty --always): main
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@kerthcet kerthcet added the kind/bug Categorizes issue or PR as related to a bug. label Sep 22, 2023
@alculquicondor
Copy link
Contributor

This is somewhat analogous to scheduler vs descheduler. We don't have a process that checks that the decisions taken before still hold.
So far, the effect is that no new workloads would be admitted in this scenario.
I would label this a feature request instead.

@kerthcet
Copy link
Contributor Author

Agree, more like a feature.
/remove-kind bug
/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 21, 2023
@tenzen-y
Copy link
Member

I think that this feature request isn't only for RayJob.

/retitle Reclaim consumed resources once quit from cohort

@k8s-ci-robot k8s-ci-robot changed the title [RayJob] Quit from cohort will not reclaim consumed resources Reclaim consumed resources once quit from cohort Dec 28, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 27, 2024
@kerthcet
Copy link
Contributor Author

/lifecycle frozen
Wait for more feedbacks.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

5 participants