Skip to content

Commit

Permalink
[docs] Updated Volcano integration documentation (ray-project#776)
Browse files Browse the repository at this point in the history
  • Loading branch information
tgaddair authored Dec 1, 2022
1 parent 2898757 commit 9e362d5
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/guidance/volcano-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ If autoscaling is enabled, `minReplicas` will be used for gang scheduling, other

In this example, we'll walk through how gang scheduling works with Volcano and KubeRay.

First, let's create a queue with a capacity of 4 CPUs and 4Gi of RAM:
First, let's create a queue with a capacity of 4 CPUs and 6Gi of RAM:

```
$ kubectl create -f - <<EOF
Expand All @@ -90,6 +90,10 @@ spec:
EOF
```

The **weight** in the definition above indicates the relative weight of a queue in cluster resource division. This is useful in cases where the total **capability** of all the queues in your cluster exceeds the total available resources, forcing the queues to share among themselves. Queues with higher weight will be allocated a proportionally larger share of the total resources.

The **capability** is a hard constraint on the maximum resources the queue will support at any given time. It can be updated as needed to allow more or fewer workloads to run at a time.

Next we'll create a RayCluster with a head node (1 CPU + 2Gi of RAM) and two workers (1 CPU + 1Gi of RAM each), for a total of 3 CPU and 4Gi of RAM:

```
Expand Down Expand Up @@ -142,7 +146,7 @@ spec:
EOF
```

Because our queue has a capacity of 4 CPU and 4Gi of RAM, this resource should schedule successfully without any issues. We can verify this by checking the status of our cluster's Volcano PodGroup to see that the phase is `Running` and the last status is `Scheduled`:
Because our queue has a capacity of 4 CPU and 6Gi of RAM, this resource should schedule successfully without any issues. We can verify this by checking the status of our cluster's Volcano PodGroup to see that the phase is `Running` and the last status is `Scheduled`:

```
$ kubectl get podgroup ray-test-cluster-0-pg -o yaml
Expand Down
17 changes: 17 additions & 0 deletions ray-operator/controllers/ray/batchscheduler/interface/interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,32 @@ import (
"sigs.k8s.io/controller-runtime/pkg/builder"
)

// BatchScheduler manages submitting RayCluster pods to a third-party scheduler.
type BatchScheduler interface {
// Name corresponds to the schedulerName in Kubernetes:
// https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
Name() string

// DoBatchSchedulingOnSubmission handles submitting the RayCluster to the batch scheduler on creation / update
// For most batch schedulers, this results in the creation of a PodGroup.
DoBatchSchedulingOnSubmission(app *rayiov1alpha1.RayCluster) error

// AddMetadataToPod enriches Pod specs with metadata necessary to tie them to the scheduler.
// For example, setting labels for queues / priority, and setting schedulerName.
AddMetadataToPod(app *rayiov1alpha1.RayCluster, pod *v1.Pod)
}

// BatchSchedulerFactory handles initial setup of the scheduler plugin by registering the
// necessary callbacks with the operator, and the creation of the BatchScheduler itself.
type BatchSchedulerFactory interface {
// New creates a new BatchScheduler for the scheduler plugin.
New(config *rest.Config) (BatchScheduler, error)

// AddToScheme adds the types in this scheduler to the given scheme (runs during init).
AddToScheme(scheme *runtime.Scheme)

// ConfigureReconciler configures the RayCluster Reconciler in the process of being built by
// adding watches for its scheduler-specific custom resource types, and any other needed setup.
ConfigureReconciler(b *builder.Builder) *builder.Builder
}

Expand Down

0 comments on commit 9e362d5

Please sign in to comment.