Skip to content

Commit

Permalink
[Feature][Docs] Explain how to specify container command for head pod (
Browse files Browse the repository at this point in the history
…#912)

Users want to execute some commands at two timings: (1) Before `ray start` (2) After `ray start`
  • Loading branch information
kevin85421 authored Feb 22, 2023
1 parent 4714892 commit 0564748
Show file tree
Hide file tree
Showing 4 changed files with 264 additions and 0 deletions.
148 changes: 148 additions & 0 deletions docs/guidance/pod-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Specify container commands for Ray head/worker Pods
You can execute commands on the head/worker pods at two timings:

* (1) **Before `ray start`**: As an example, you can set up some environment variables that will be used by `ray start`.

* (2) **After `ray start` (RayCluster is ready)**: As an example, you can launch a Ray serve deployment when the RayCluster is ready.

## Current KubeRay operator behavior for container commands
* The current behavior for container commands is not finalized, and **may be updated in the future**.
* See [code](https://github.com/ray-project/kuberay/blob/47148921c7d14813aea26a7974abda7cf22bbc52/ray-operator/controllers/ray/common/pod.go#L301-L326) for more details.

## Timing 1: Before `ray start`
Currently, for timing (1), we can set the container's `Command` and `Args` in RayCluster specification to reach the goal.

```yaml
# ray-operator/config/samples/ray-cluster.head-command.yaml
rayStartParams:
...
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.2.0
resources:
...
ports:
...
# `command` and `args` will become a part of `spec.containers.0.args` in the head Pod.
command: ["echo 123"]
args: ["456"]
```
* Ray head Pod
* `spec.containers.0.command` is hardcoded with `["/bin/bash", "-lc", "--"]`.
* `spec.containers.0.args` contains two parts:
* (Part 1) **user-specified command**: A string concatenates `headGroupSpec.template.spec.containers.0.command` from RayCluster and `headGroupSpec.template.spec.containers.0.args` from RayCluster together.
* (Part 2) **ray start command**: The command is created based on `rayStartParams` specified in RayCluster. The command will look like `ulimit -n 65536; ray start ...`.
* To summarize, `spec.containers.0.args` will be `$(user-specified command) && $(ray start command)`.

* Example
```sh
# Prerequisite: There is a KubeRay operator in the Kubernetes cluster.
# Path: kuberay/
kubectl apply -f ray-operator/config/samples/ray-cluster.head-command.yaml
# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head
# Check `spec.containers.0.command` and `spec.containers.0.args`.
kubectl describe pod ${RAYCLUSTER_HEAD_POD}

# Command:
# /bin/bash
# -lc
# --
# Args:
# echo 123 456 && ulimit -n 65536; ray start --head --dashboard-host=0.0.0.0 --num-cpus=1 --block --metrics-export-port=8080 --memory=2147483648
```


## Timing 2: After `ray start` (RayCluster is ready)
We have two solutions to execute commands after the RayCluster is ready. The main difference between these two solutions is users can check the logs via `kubectl logs` with Solution 1.

### Solution 1: Container command (Recommended)
As we mentioned in the section "Timing 1: Before `ray start`", user-specified command will be executed before the `ray start` command. Hence, we can execute the `ray_cluster_resources.sh` in background by updating `headGroupSpec.template.spec.containers.0.command` in `ray-cluster.head-command.yaml`.

```yaml
# ray-operator/config/samples/ray-cluster.head-command.yaml
# Parentheses for the command is required.
command: ["(/home/ray/samples/ray_cluster_resources.sh&)"]
# ray_cluster_resources.sh
apiVersion: v1
kind: ConfigMap
metadata:
name: ray-example
data:
ray_cluster_resources.sh: |
#!/bin/bash
# wait for ray cluster to finish initialization
while true; do
ray health-check 2>/dev/null
if [ "$?" = "0" ]; then
break
else
echo "INFO: waiting for ray head to start"
sleep 1
fi
done
# Print the resources in the ray cluster after the cluster is ready.
python -c "import ray; ray.init(); print(ray.cluster_resources())"
echo "INFO: Print Ray cluster resources"
```

* Example
```sh
# Path: kuberay/
# (1) Update `command` to ["(/home/ray/samples/ray_cluster_resources.sh&)"]
# (2) Comment out `postStart` and `args`.
kubectl apply -f ray-operator/config/samples/ray-cluster.head-command.yaml

# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head

# Check the logs
kubectl logs ${RAYCLUSTER_HEAD_POD}

# INFO: waiting for ray head to start
# .
# . => Cluster initialization
# .
# 2023-02-16 18:44:43,724 INFO worker.py:1231 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
# 2023-02-16 18:44:43,724 INFO worker.py:1352 -- Connecting to existing Ray cluster at address: 10.244.0.26:6379...
# 2023-02-16 18:44:43,735 INFO worker.py:1535 -- Connected to Ray cluster. View the dashboard at http://10.244.0.26:8265
# {'object_store_memory': 539679129.0, 'node:10.244.0.26': 1.0, 'CPU': 1.0, 'memory': 2147483648.0}
# INFO: Print Ray cluster resources
```

### Solution 2: postStart hook
```yaml
# ray-operator/config/samples/ray-cluster.head-command.yaml
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","/home/ray/samples/ray_cluster_resources.sh"]
```
* We execute the script `ray_cluster_resources.sh` via the postStart hook. Based on [this document](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks), there is no guarantee that the hook will execute before the container ENTRYPOINT. Hence, we need to wait for RayCluster to finish initialization in `ray_cluster_resources.sh`.

* Example
```sh
# Path: kuberay/
kubectl apply -f ray-operator/config/samples/ray-cluster.head-command.yaml
# Check ${RAYCLUSTER_HEAD_POD}
kubectl get pod -l ray.io/node-type=head
# Forward the port of Dashboard
kubectl port-forward --address 0.0.0.0 ${RAYCLUSTER_HEAD_POD} 8265:8265
# Open the browser and check the Dashboard (${YOUR_IP}:8265/#/job).
# You shold see a SUCCEEDED job with the following Entrypoint:
#
# `python -c "import ray; ray.init(); print(ray.cluster_resources())"`
```
18 changes: 18 additions & 0 deletions helm-chart/ray-cluster/templates/raycluster-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ spec:
lifecycle:
{{- toYaml .Values.head.lifecycle | nindent 14 }}
{{- end }}
{{- if .Values.head.command }}
command: {{- toYaml .Values.head.command | nindent 14}}
{{- end }}
{{- if .Values.head.args }}
args: {{- toYaml .Values.head.args | nindent 14}}
{{- end }}
{{- if .Values.head.sidecarContainers }}
{{- toYaml .Values.head.sidecarContainers | nindent 10 }}
{{- end }}
Expand Down Expand Up @@ -112,6 +118,12 @@ spec:
lifecycle:
{{- toYaml $values.lifecycle | nindent 14 }}
{{- end }}
{{- if $values.command }}
command: {{- toYaml $values.command | nindent 14}}
{{- end }}
{{- if $values.args }}
args: {{- toYaml $values.args | nindent 14}}
{{- end }}
{{- if $values.sidecarContainers }}
{{- toYaml $values.sidecarContainers | nindent 10 }}
{{- end }}
Expand Down Expand Up @@ -172,6 +184,12 @@ spec:
lifecycle:
{{- toYaml .Values.worker.lifecycle | nindent 14 }}
{{- end }}
{{- if .Values.worker.command }}
command: {{- toYaml .Values.worker.command | nindent 14}}
{{- end }}
{{- if .Values.worker.args }}
args: {{- toYaml .Values.worker.args | nindent 14}}
{{- end }}
{{- if .Values.worker.sidecarContainers }}
{{- toYaml .Values.worker.sidecarContainers | nindent 10 }}
{{- end }}
Expand Down
12 changes: 12 additions & 0 deletions helm-chart/ray-cluster/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,10 @@ head:
# sidecarContainers specifies additional containers to attach to the Ray pod.
# Follows standard K8s container spec.
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for head Pod.
command: []
args: []


worker:
Expand Down Expand Up @@ -141,6 +145,10 @@ worker:
# sidecarContainers specifies additional containers to attach to the Ray pod.
# Follows standard K8s container spec.
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for worker Pod.
command: []
args: []

# The map's key is used as the groupName.
# For example, key:small-group in the map below
Expand Down Expand Up @@ -198,6 +206,10 @@ additionalWorkerGroups:
- mountPath: /tmp/ray
name: log-volume
sidecarContainers: []
# See docs/guidance/pod-command.md for more details about how to specify
# container command for worker Pod.
command: []
args: []

# Configuration for Head's Kubernetes Service
service:
Expand Down
86 changes: 86 additions & 0 deletions ray-operator/config/samples/ray-cluster.head-command.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# This example config is used to describe how does head command work.
# See kuberay/docs/guidance/pod-command.md for more details.
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
labels:
controller-tools.k8s.io: "1.0"
# An unique identifier for the head node and workers of this cluster.
name: raycluster-mini
spec:
rayVersion: '2.2.0' # should match the Ray version in the image of the containers
# Ray head pod template
headGroupSpec:
serviceType: ClusterIP # optional
# the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
rayStartParams:
dashboard-host: '0.0.0.0'
num-cpus: '1' # can be auto-completed from the limits
block: 'true'
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.2.0
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
# `command` and `args` will become a part of `spec.containers.0.args` in the head Pod.
command: ["echo 123"]
args: ["456"]
# The script ray_cluster_resources.sh will wait until the cluster is ready and print
# the resources in the ray cluster. Users can execute the script in either (1) `command`
# or (2) `postStart`.
#
# command: ["(/home/ray/samples/ray_cluster_resources.sh&)"]
lifecycle:
postStart:
exec:
command: ["/bin/sh","-c","/home/ray/samples/ray_cluster_resources.sh"]
volumeMounts:
- mountPath: /home/ray/samples
name: ray-example-configmap
volumes:
- name: ray-example-configmap
configMap:
name: ray-example
defaultMode: 0777
items:
- key: ray_cluster_resources.sh
path: ray_cluster_resources.sh
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ray-example
data:
ray_cluster_resources.sh: |
#!/bin/bash
# wait for ray cluster to finish initialization
while true; do
ray health-check 2>/dev/null
if [ "$?" = "0" ]; then
break
else
echo "INFO: waiting for ray head to start"
sleep 1
fi
done
# Print the resources in the ray cluster after the cluster is ready.
python -c "import ray; ray.init(); print(ray.cluster_resources())"
echo "INFO: Print Ray cluster resources"

0 comments on commit 0564748

Please sign in to comment.