-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage number of pods in ClusterQueues #485
Comments
One scenario I can think of is scheduling tiny jobs for best effort like we do for pods, but as you said we may have no flavors for them unless we have a default one injected. Following YAGNI, I agree to start with a strict restriction. Instead of marking job inadmissible, or reject the management by kueue directly and tell users the reasons. I found job's resources is immutable after creation, if so, make it inadmissible seems useless, we have to recreate them. |
what is the behavior now for those jobs? |
A good practice there would be to give them at least some CPU and memory, like 50m cores and 1Mi, for example.
Not sure what are you suggesting. I would have liked a webhook, but as I said, it might not work well with users' own webhooks. Or maybe it's ok if it can be disabled (but enabled by default).
They are admitted with no flavors assigned, as there are no resources to assign the flavors to. |
I mean job's resources is immutable now, so make it inadmissible doesn't make sense, we have to recreate them if we want to add resources. https://github.com/kubernetes/kubernetes/blob/d2504c94a0116d4432a8a73fc17a0ec8d003d37b/pkg/apis/batch/validation/validation.go#LL409 |
Right, I understand that. My question was whether you have an alternative solution. |
If a job without request is OK from the K8S validation point of view then I guess Kueue, if nothing extra is set, should allow them. Otherwise there will be a quite surprising situation when a job with 1 millicore and 1MB request is fine and the other that doesn't put this BS settings is not. |
LimitRange works on Pods only (and not pod templates), right? Maybe we should be paying attention to it in Kueue admission. |
That might be an interesting option to explore, together with Kueue paying attention to VPA recommendations as well. |
/close |
@alculquicondor: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen @mwielgus #688 is flakying because a Workload can be admitted before the LimitRange is available in the cache. There is no really a way to control whether a cache is "up-to-date". However, if there was a feature "do-not-admit workloads without requests", then we could use that in the test to avoid the flakyness. Maybe we could implement this as a configuration option. WDYT? |
@alculquicondor: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
/unassign @mcariatm This is not decided yet |
Discussed offline: To prevent admitting an infinite number of Jobs, we can make ClusterQueues be able to define quota for number of pods. |
/retitle Control number of pods in ClusterQueues |
However, this seems tricky, what's the criteria to define the number? |
In the job/workload, it's just the number of replicas. The ClusterQueue can define a resource |
For this I'm thinking of adding a special resource type say Doing so we only need to change the flavor assignment, no API or other major changes should be necessary. |
/assign |
#732 has the implementation for my proposal (some cleanup and maybe integration test are needed) |
/retitle Manage number of pods in ClusterQueues |
What would you like to be added:
Mark jobs that don't have requests as inadmissible.
Why is this needed:
Kueue is a system that manages resource usage. It doesn't really make sense to have jobs with no requests.
Also, it can lead to user errors, such as having a resource flavor with nodeSelectors that are never applied to jobs. This is WAI: the job has no requests, so it gets no flavors and no injected node selectors. But it's confusing. Simply requiring that the jobs have requests would prevent that.
I thought about doing this via webhooks. However, this has the problem that in some clusters requests might be added via webhooks. Since it's not possible to reliably establish the order in which webhooks run, we might be breaking this use cases.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: