Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP] Support new ProvisioningRequest's conditions #2042

Merged
merged 4 commits into from
May 8, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions keps/1136-provisioning-request-support/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Risks and Mitigations](#risks-and-mitigations)
- [BookingExpired condition](#bookingexpired-condition)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
Expand Down Expand Up @@ -77,6 +78,21 @@ There doesn't seem to be much risks or mitigations.
[Two phase admission process](https://github.com/kubernetes-sigs/kueue/tree/main/keps/993-two-phase-admission)
PBundyra marked this conversation as resolved.
Show resolved Hide resolved
was added specifically for use cases like this.

#### BookingExpired condition

Kueue's support for the BookingExpired condition in ProvisioningRequest poses a risk. The Cluster Autoscaler may set `BookingExpired=true`,
potentially ceasing to guarantee the capacity before all pods are scheduled. This can occur in two scenarios:

- **Other AdmissionChecks**: If other AdmissionChecks are used, they might delay pod creation, causing the Cluster Autoscaler to expire the booking.
- **Massive jobs**: When a very large job is created, the controller responsible for pod creation might not be able to
keep pace, again leading to the booking expiring before all pods are scheduled.

This could result in the scheduling of only a subset of pods. To mitigate the first scenario, users can utilize the
[`WaitForPodsReady`](https://github.com/kubernetes-sigs/kueue/tree/main/keps/349-all-or-nothing) field. This ensures
a Workload is evicted if not all of its pods are scheduled after a specified timeout. For the second scenario, cluster
administrators should ensure their control plane is adequately provisioned with sufficient resources - larger VMs and/or
higher qps for the pod-creating controller) to handle large jobs efficiently.

## Design Details

The new ProvisioningRequest controller will:
Expand All @@ -95,11 +111,15 @@ The `ProvisioningRequest` should have the owner reference set to the workload.
To understand what details should it put into `ProvisioningRequest` the controller
will also need to watch `ProvisioningRequestConfigs`.

* Watch all changes CA makes to `ProvisioningRequests`. If the `Provisioned`
or `CapacityAvailable` condition is set to `True` then finish the `AdmissionCheck`
with success (and propagate the information about `ProvisioningRequest` name to
* Watch all changes CA makes to `ProvisioningRequests`. If the `ProvisioningRequest's` conditions are set to:
- `Provisioned=false` controller should surface information about ProvisioningRequest's ETA. It should emit an event regarding that and for every ETA change.
- `Provisioned=true` controller should mark the AdmissionCheck as `Ready` and propagate the information about `ProvisioningRequest` name to
workload pods - [KEP #1145](https://github.com/kubernetes-sigs/kueue/blob/main/keps/1145-additional-labels/kep.yaml) under `"cluster-autoscaler.kubernetes.io/consume-provisioning-request"`.
If the `ProvisioningRequest` fails, fail the `AdmissionCheck`.
- `Failed=true` controller should retry AdmissionCheck with respect to the `RetryConfig` configuration, or mark the AdmissionCheck as `Rejected`
- `BookingExpired=true` if a Workload is not `Admitted` controller should act the same as for `Failed=true`.
PBundyra marked this conversation as resolved.
Show resolved Hide resolved
- `CapacityRevoked=true` if a Workload is not `Finished` controller should mark it as `Inactive`, which will evict it.
PBundyra marked this conversation as resolved.
Show resolved Hide resolved
Additionally, an event should be emitted to signalize this happening. This can happen only if a user uses `batch.v1/Job` and
PBundyra marked this conversation as resolved.
Show resolved Hide resolved
sets `.spec.backOffLimit > 0`.

* Watch the admission of the workload - if it is again suspended or finished,
the provisioning request should also be deleted (the last one can be achieved via
Expand Down