-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autoscaler] Improve autoscaler auto-configuration, upstream recent improvements to Kuberay NodeProvider #274
Changes from 37 commits
4245265
8feae85
ad3e463
076fa54
ba25502
420f4d6
77e77df
e63c2ce
1643dee
abdaac5
dd587bf
1c7c6bd
fe8619a
ca422ae
cfea869
16bafbd
6658b8c
88848ff
0dfa759
91a04e1
b571e25
5bacaef
588e30f
189b1bd
40e1579
a54ddd0
23f631c
7cbdcff
e970fab
fe6da32
26ff42c
d0f98ce
8f7d64e
c895130
f601cb8
704519d
05a18d2
0cef0cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,10 +10,14 @@ You can follow below steps for a quick deployment. | |
``` | ||
git clone https://github.com/ray-project/kuberay.git | ||
cd kuberay | ||
kubectl apply -k manifests/cluster-scope-resources | ||
kubectl apply -k manifests/base | ||
kubectl create -k manifests/cluster-scope-resources | ||
kubectl apply -k manifests/overlays/autoscaling | ||
``` | ||
|
||
> Note: For compatibility with the Ray autoscaler, the KubeRay Operator's entrypoint | ||
> must include the flag `--prioritize-workers-to-delete`. The kustomization overlay | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make sure to have a plan to remove this flag and make it the default behavior. |
||
> `manifests/overlays/autoscaling` provided in the last command above adds the necessary flag. | ||
|
||
### Deploy a cluster with autoscaling enabled | ||
|
||
``` | ||
|
@@ -60,20 +64,21 @@ Demands: | |
|
||
#### Known issues and limitations | ||
|
||
1. operator will recognize following setting and automatically inject preconfigured autoscaler container to head pod. | ||
The service account, role, role binding needed by autoscaler will be created by operator out-of-box. | ||
1. The operator will recognize the following setting and automatically inject a preconfigured autoscaler container to the head pod. | ||
The service account, role, and role binding needed by the autoscaler will be created by the operator out-of-box. | ||
The operator will also configure an empty-dir logging volume for the Ray head pod. The volume will be mounted into the Ray and | ||
autoscaler containers; this is necessary to support the event logging introduced in [Ray PR #13434](https://github.com/ray-project/ray/pull/13434). | ||
|
||
``` | ||
spec: | ||
rayVersion: 'nightly' | ||
enableInTreeAutoscaling: true | ||
``` | ||
|
||
2. head and work images are `rayproject/ray:413fe0`. This image was built based on [commit](https://github.com/ray-project/ray/commit/413fe08f8744d50b439717564709bc0af2f778f1) from master branch. | ||
The reason we need to use a nightly version is because autoscaler needs to connect to Ray cluster. Due to ray [version requirements](https://docs.ray.io/en/latest/cluster/ray-client.html#versioning-requirements). | ||
We determine to use nightly version to make sure integration is working. | ||
2. The autoscaler image is `rayproject/ray:448f52` which reflects the latest changes from [Ray PR #24718](https://github.com/ray-project/ray/pull/24718/files) in the master branch. | ||
|
||
3. Autoscaler image is `kuberay/autoscaler:nightly` which is built from [commit](https://github.com/ray-project/ray/pull/22689/files). | ||
3. Autoscaling functionality is supported only with Ray versions at least as new as 1.11.0. The autoscaler image used | ||
is compatible with all Ray versions >= 1.11.0. | ||
|
||
### Test autoscaling | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# This overlay patches in KubeRay operator configuration | ||
# necessary for Ray Autoscaler support. | ||
apiVersion: kustomize.config.k8s.io/v1beta1 | ||
kind: Kustomization | ||
|
||
bases: | ||
- ../../base | ||
patches: | ||
- path: prioritize_workers_to_delete_patch.json | ||
target: | ||
group: apps | ||
version: v1 | ||
kind: Deployment | ||
name: kuberay-operator |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[{ | ||
"op":"replace", | ||
"path":"/spec/template/spec/containers/0/args", | ||
"value": ["--prioritize-workers-to-delete"] | ||
}] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,8 @@ metadata: | |
# An unique identifier for the head node and workers of this cluster. | ||
name: raycluster-autoscaler | ||
spec: | ||
rayVersion: 'nightly' | ||
rayVersion: '1.12.1' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The latest Ray release is compatible with the pinned autoscaler image. |
||
# Ray autoscaler integration is supported only for Ray versions >= 1.11.0 | ||
enableInTreeAutoscaling: true | ||
######################headGroupSpecs################################# | ||
# head group template and specs, (perhaps 'group' is not needed in the name) | ||
|
@@ -20,7 +21,7 @@ spec: | |
# logical group name, for this called head-group, also can be functional | ||
# pod type head or worker | ||
# rayNodeType: head # Not needed since it is under the headgroup | ||
# the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ... | ||
# the following params are used to complete the ray start: ray start --head --block --port=6379 ... | ||
rayStartParams: | ||
# Flag "no-monitor" must be set when running the autoscaler in | ||
# a sidecar container. | ||
|
@@ -29,17 +30,17 @@ spec: | |
node-ip-address: $MY_POD_IP # auto-completed as the head pod IP | ||
block: 'true' | ||
num-cpus: '1' # can be auto-completed from the limits | ||
redis-password: 'LetMeInRay' # Deprecated since Ray 1.11 due to GCS bootstrapping enabled | ||
# Use `resources` to optionally specify custom resource annotations for the Ray node. | ||
# The value of `resources` is a string-integer mapping. | ||
# Currently, `resources` must be provided in the unfortunate format demonstrated below. | ||
# Currently, `resources` must be provided in the unfortunate format demonstrated below: | ||
# resources: '"{\"Custom1\": 1, \"Custom2\": 5}"' | ||
#pod template | ||
template: | ||
spec: | ||
containers: | ||
# The Ray head pod | ||
- name: ray-head | ||
image: rayproject/ray:413fe0 | ||
image: rayproject/ray:1.12.1 | ||
imagePullPolicy: Always | ||
env: | ||
- name: CPU_REQUEST | ||
|
@@ -124,7 +125,7 @@ spec: | |
command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"] | ||
containers: | ||
- name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc' | ||
image: rayproject/ray:413fe0 | ||
image: rayproject/ray:1.12.1 | ||
# environment variables to set in the container.Optional. | ||
# Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ | ||
env: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. We can use create now. The downside of
create
here is cluster-scope-resources hasray-system
namespace as well. If user has the namespace in the cluster, upgrade will fail due to existence.We can definitely move namespace yaml to separate steps but let me check if we can resolve the issue by limiting the crd size which is much more elegant. This looks good to me at this moment