Skip to content

Commit

Permalink
[autoscaler] Improve autoscaler auto-configuration, upstream recent i…
Browse files Browse the repository at this point in the history
…mprovements to Kuberay NodeProvider (#274)

Upstreams recent autoscaler changes from the Ray repo.
  • Loading branch information
DmitriGekhtman authored May 28, 2022
1 parent 80f19eb commit eac60b3
Show file tree
Hide file tree
Showing 12 changed files with 266 additions and 103 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ You can view detailed documentation and guides at [https://ray-project.github.io
#### Nightly version

```
kubectl apply -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources"
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base"
```

#### Stable version

```
kubectl apply -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v0.2.0"
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v0.2.0"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base?ref=v0.2.0"
```

Expand Down
4 changes: 2 additions & 2 deletions docs/deploy/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
#### Nightly version

```
kubectl apply -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources"
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base"
```

#### Stable version

```
kubectl apply -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v0.2.0"
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v0.2.0"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base?ref=v0.2.0"
```
21 changes: 13 additions & 8 deletions docs/guidance/autoscaler.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,14 @@ You can follow below steps for a quick deployment.
```
git clone https://github.com/ray-project/kuberay.git
cd kuberay
kubectl apply -k manifests/cluster-scope-resources
kubectl apply -k manifests/base
kubectl create -k manifests/cluster-scope-resources
kubectl apply -k manifests/overlays/autoscaling
```

> Note: For compatibility with the Ray autoscaler, the KubeRay Operator's entrypoint
> must include the flag `--prioritize-workers-to-delete`. The kustomization overlay
> `manifests/overlays/autoscaling` provided in the last command above adds the necessary flag.
### Deploy a cluster with autoscaling enabled

```
Expand Down Expand Up @@ -60,20 +64,21 @@ Demands:

#### Known issues and limitations

1. operator will recognize following setting and automatically inject preconfigured autoscaler container to head pod.
The service account, role, role binding needed by autoscaler will be created by operator out-of-box.
1. The operator will recognize the following setting and automatically inject a preconfigured autoscaler container to the head pod.
The service account, role, and role binding needed by the autoscaler will be created by the operator out-of-box.
The operator will also configure an empty-dir logging volume for the Ray head pod. The volume will be mounted into the Ray and
autoscaler containers; this is necessary to support the event logging introduced in [Ray PR #13434](https://github.com/ray-project/ray/pull/13434).

```
spec:
rayVersion: 'nightly'
enableInTreeAutoscaling: true
```
2. head and work images are `rayproject/ray:413fe0`. This image was built based on [commit](https://github.com/ray-project/ray/commit/413fe08f8744d50b439717564709bc0af2f778f1) from master branch.
The reason we need to use a nightly version is because autoscaler needs to connect to Ray cluster. Due to ray [version requirements](https://docs.ray.io/en/latest/cluster/ray-client.html#versioning-requirements).
We determine to use nightly version to make sure integration is working.
2. The autoscaler image is `rayproject/ray:448f52` which reflects the latest changes from [Ray PR #24718](https://github.com/ray-project/ray/pull/24718/files) in the master branch.
3. Autoscaler image is `kuberay/autoscaler:nightly` which is built from [commit](https://github.com/ray-project/ray/pull/22689/files).
3. Autoscaling functionality is supported only with Ray versions at least as new as 1.11.0. The autoscaler image used
is compatible with all Ray versions >= 1.11.0.
### Test autoscaling
Expand Down
2 changes: 1 addition & 1 deletion docs/notebook/kuberay-on-kind.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@
}
],
"source": [
"!kubectl apply -k \"github.com/ray-project/kuberay/manifests/cluster-scope-resources\"\n",
"!kubectl create -k \"github.com/ray-project/kuberay/manifests/cluster-scope-resources\"\n",
"!kubectl apply -k \"github.com/ray-project/kuberay/manifests/base\""
]
},
Expand Down
2 changes: 1 addition & 1 deletion manifests/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ resources:
images:
- name: kuberay/apiserver
newName: kuberay/apiserver
newTag: nightly
newTag: nightly
- name: kuberay/operator
newName: kuberay/operator
newTag: nightly
Expand Down
14 changes: 14 additions & 0 deletions manifests/overlays/autoscaling/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# This overlay patches in KubeRay operator configuration
# necessary for Ray Autoscaler support.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
- ../../base
patches:
- path: prioritize_workers_to_delete_patch.json
target:
group: apps
version: v1
kind: Deployment
name: kuberay-operator
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[{
"op":"replace",
"path":"/spec/template/spec/containers/0/args",
"value": ["--prioritize-workers-to-delete"]
}]
13 changes: 7 additions & 6 deletions ray-operator/config/samples/ray-cluster.autoscaler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ metadata:
# An unique identifier for the head node and workers of this cluster.
name: raycluster-autoscaler
spec:
rayVersion: 'nightly'
rayVersion: '1.12.1'
# Ray autoscaler integration is supported only for Ray versions >= 1.11.0
enableInTreeAutoscaling: true
######################headGroupSpecs#################################
# head group template and specs, (perhaps 'group' is not needed in the name)
Expand All @@ -20,7 +21,7 @@ spec:
# logical group name, for this called head-group, also can be functional
# pod type head or worker
# rayNodeType: head # Not needed since it is under the headgroup
# the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
# the following params are used to complete the ray start: ray start --head --block --port=6379 ...
rayStartParams:
# Flag "no-monitor" must be set when running the autoscaler in
# a sidecar container.
Expand All @@ -29,17 +30,17 @@ spec:
node-ip-address: $MY_POD_IP # auto-completed as the head pod IP
block: 'true'
num-cpus: '1' # can be auto-completed from the limits
redis-password: 'LetMeInRay' # Deprecated since Ray 1.11 due to GCS bootstrapping enabled
# Use `resources` to optionally specify custom resource annotations for the Ray node.
# The value of `resources` is a string-integer mapping.
# Currently, `resources` must be provided in the unfortunate format demonstrated below.
# Currently, `resources` must be provided in the unfortunate format demonstrated below:
# resources: '"{\"Custom1\": 1, \"Custom2\": 5}"'
#pod template
template:
spec:
containers:
# The Ray head pod
- name: ray-head
image: rayproject/ray:413fe0
image: rayproject/ray:1.12.1
imagePullPolicy: Always
env:
- name: CPU_REQUEST
Expand Down Expand Up @@ -124,7 +125,7 @@ spec:
command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
containers:
- name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:413fe0
image: rayproject/ray:1.12.1
# environment variables to set in the container.Optional.
# Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
env:
Expand Down
Loading

0 comments on commit eac60b3

Please sign in to comment.