[docs] Updated Volcano integration documentation (ray-project#776)

lowang-bh · Dec 1, 2022 · 9e362d5 · 9e362d5
1 parent 2898757
commit 9e362d5
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 2 deletions.
diff --git a/docs/guidance/volcano-integration.md b/docs/guidance/volcano-integration.md
@@ -74,7 +74,7 @@ If autoscaling is enabled, `minReplicas` will be used for gang scheduling, other
 
 In this example, we'll walk through how gang scheduling works with Volcano and KubeRay.
 
-First, let's create a queue with a capacity of 4 CPUs and 4Gi of RAM:
+First, let's create a queue with a capacity of 4 CPUs and 6Gi of RAM:
 
 ```
 $ kubectl create -f - <<EOF
@@ -90,6 +90,10 @@ spec:
 EOF
 ```
 
+The **weight** in the definition above indicates the relative weight of a queue in cluster resource division. This is useful in cases where the total **capability** of all the queues in your cluster exceeds the total available resources, forcing the queues to share among themselves. Queues with higher weight will be allocated a proportionally larger share of the total resources.
+
+The **capability** is a hard constraint on the maximum resources the queue will support at any given time. It can be updated as needed to allow more or fewer workloads to run at a time.
+
 Next we'll create a RayCluster with a head node (1 CPU + 2Gi of RAM) and two workers (1 CPU + 1Gi of RAM each), for a total of 3 CPU and 4Gi of RAM:
 
 ```
@@ -142,7 +146,7 @@ spec:
 EOF
 ```
 
-Because our queue has a capacity of 4 CPU and 4Gi of RAM, this resource should schedule successfully without any issues. We can verify this by checking the status of our cluster's Volcano PodGroup to see that the phase is `Running` and the last status is `Scheduled`:
+Because our queue has a capacity of 4 CPU and 6Gi of RAM, this resource should schedule successfully without any issues. We can verify this by checking the status of our cluster's Volcano PodGroup to see that the phase is `Running` and the last status is `Scheduled`:
 
 ```
 $ kubectl get podgroup ray-test-cluster-0-pg -o yaml

diff --git a/ray-operator/controllers/ray/batchscheduler/interface/interface.go b/ray-operator/controllers/ray/batchscheduler/interface/interface.go
@@ -8,15 +8,32 @@ import (
 	"sigs.k8s.io/controller-runtime/pkg/builder"
 )
 
+// BatchScheduler manages submitting RayCluster pods to a third-party scheduler.
 type BatchScheduler interface {
+	// Name corresponds to the schedulerName in Kubernetes:
+	// https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
 	Name() string
+
+	// DoBatchSchedulingOnSubmission handles submitting the RayCluster to the batch scheduler on creation / update
+	// For most batch schedulers, this results in the creation of a PodGroup.
 	DoBatchSchedulingOnSubmission(app *rayiov1alpha1.RayCluster) error
+
+	// AddMetadataToPod enriches Pod specs with metadata necessary to tie them to the scheduler.
+	// For example, setting labels for queues / priority, and setting schedulerName.
 	AddMetadataToPod(app *rayiov1alpha1.RayCluster, pod *v1.Pod)
 }
 
+// BatchSchedulerFactory handles initial setup of the scheduler plugin by registering the
+// necessary callbacks with the operator, and the creation of the BatchScheduler itself.
 type BatchSchedulerFactory interface {
+	// New creates a new BatchScheduler for the scheduler plugin.
 	New(config *rest.Config) (BatchScheduler, error)
+
+	// AddToScheme adds the types in this scheduler to the given scheme (runs during init).
 	AddToScheme(scheme *runtime.Scheme)
+
+	// ConfigureReconciler configures the RayCluster Reconciler in the process of being built by
+	// adding watches for its scheduler-specific custom resource types, and any other needed setup.
 	ConfigureReconciler(b *builder.Builder) *builder.Builder
 }