Skip to content

Commit

Permalink
Review remarks
Browse files Browse the repository at this point in the history
Co-authored-by: Yuki Iwai <[email protected]>
Co-authored-by: Patryk Bundyra <[email protected]>
  • Loading branch information
3 people committed Feb 8, 2024
1 parent a7a599d commit 317ac6a
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 25 deletions.
52 changes: 37 additions & 15 deletions site/content/en/docs/tasks/run_plain_pods.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@ title: "Run Plain Pods"
date: 2023-09-27
weight: 6
description: >
Run Jobs represented by plain pods, either single pods, or pod groups.
Run a single Pod, or a group of Pods as a Kueue-managed job.
---

This page shows how to leverage Kueue's scheduling and resource management capabilities when running plain Pods.
This page shows how to leverage Kueue's scheduling and resource management
capabilities when running plain Pods. Kueue supports management of both
[individual Pods](#running-a-single-pod-admitted-by-kueue), or
[Pod groups](#running-a-group-of-pods-to-be-admitted-together).

This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).

Expand Down Expand Up @@ -60,7 +63,7 @@ This guide is for [batch users](/docs/tasks#batch-user) that have a basic unders

4. Check [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial Kueue setup.

## Pod definition
## Running a single Pod admitted by Kueue

When running Pods on Kueue, take into consideration the following aspects:

Expand Down Expand Up @@ -106,7 +109,7 @@ You can create the Pod using the following command:
kubectl apply -f kueue-pod.yaml
```

## Pod Group definition
## Running a group of Pods to be admitted together

In order to run a set of pods as a single unit, called Pod Group, add the
"pod-group-name" label, and the "pod-group-total-count" annotation to all
Expand All @@ -122,30 +125,49 @@ metadata:

## Feature limitations

Kueue provides only the minimal required functionallity of running pod groups,
Kueue provides only the minimal required functionality of running pod groups,
just for the need of environments where the pods are managed by external
controllers directly, without a Job-level CRD.

As a consequence of this design decision Kueue does not re-implement core
functionalities that are available at the Job-level API, such as advanced retry
As a consequence of this design decision, Kueue does not re-implement core
functionalities that are available in the Kubernetes Job API, such as advanced retry
policies. In particular, Kueue does not re-create failed pods.

Note that, this design choice impacts the scenario of
This design choice impacts the scenario of
[preemption](/docs/concepts/cluster_queue/#preemption).
When a workload represented by the pod group is preempted all of its pods
are killed by Kueue (by delete requests). However, later, when the workload is
re-admitted, Kueue will not re-create the terminated pods. This task is left to
the user (or the external controller).
When a Kueue needs to preempt a workload that represents a pod group, kueue sends
delete requests for all of the pods in the group. It is the responsibility of the
user or controller that created the original pods to create replacement Pods.

**NOTE:** We recommend migration to using Job-level APIs for managing sets of pods.
**NOTE:** We recommend using the kubernetes Job API or similar CRDs such as
JobSet, MPIJob, RayJob, etc.

## Termination

Kueue considers a Pod group as successful, and marks the associated Workload as
finished, when the number of succeeded pods equals the pod group size.

If a Pod group is not successful, there are two ways you may want to use to
terminate execution of a Pod group to free the reserved resources:
1. Issue a Delete request for the Workload object. Kueue will terminate all
remaining pods.
2. Set the `kueue.x-k8s.io/retriable-in-group: false` annotation on at least
one pod in the group (can be a replacement pod). Kueue will mark the workload
as finished once all pods are terminated.

## Example Pod Group

Here is a sample Pod that just sleeps for a few seconds:
Here is a sample Pod Group that just sleeps for a few seconds:

{{< include "examples/pods-kueue/kueue-pod-group.yaml" "yaml" >}}

You can create the Pod using the following command:
You can create the Pod Group using the following command:
```sh
kubectl apply -f kueue-pod-group.yaml
```

The name of the associated Workload created by Kueue equals the name of the Pod
group. In this example it is `sample-group`, you can inspect the workload using:
```sh
kubectl describe workload/sample-group
```
16 changes: 6 additions & 10 deletions site/static/examples/pods-kueue/kueue-pod-group.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,37 @@
apiVersion: v1
kind: Pod
metadata:
generateName: sample-pod-
generateName: sample-leader-
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/pod-group-name: "sample-group"
annotations:
kueue.x-k8s.io/pod-group-total-count: "2"
spec:
restartPolicy: Never
containers:
- name: sleep
image: busybox
command:
- sleep
args:
- 3s
command: ["sh", "-c", 'echo "hello world from the leader pod" && sleep 3']
resources:
requests:
cpu: 3
---
apiVersion: v1
kind: Pod
metadata:
generateName: sample-pod-
generateName: sample-worker-
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/pod-group-name: "sample-group"
annotations:
kueue.x-k8s.io/pod-group-total-count: "2"
spec:
restartPolicy: Never
containers:
- name: sleep
image: busybox
command:
- sleep
args:
- 3s
command: ["sh", "-c", 'echo "hello world from the worker pod" && sleep 2']
resources:
requests:
cpu: 3

0 comments on commit 317ac6a

Please sign in to comment.