Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Improve RayService doc #1235

Merged
merged 6 commits into from
Jul 12, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 135 additions & 114 deletions docs/guidance/rayservice.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,115 @@
## Ray Services (alpha)
# Ray Services (alpha)

> Note: This is the alpha version of Ray Services. There will be ongoing improvements for Ray Services in the future releases.

### Prerequisites
# Prerequisites

* Ray 2.0 or newer.
This guide focuses solely on the Ray Serve multi-application API, which is available starting from Ray version 2.4.0.

* Ray 2.4.0 or newer.
* KubeRay 0.6.0 or newer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line, let's specify that the nightly will work too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 60ac0ef


### What is a RayService?
# What is a RayService?

A RayService manages 2 things:

* **Ray Cluster**: Manages resources in a Kubernetes cluster.
* **RayCluster**: Manages resources in a Kubernetes cluster.
* **Ray Serve Applications**: Manages users' applications.

### What does the RayService provide?
# What does the RayService provide?

* **Kubernetes-native support for Ray clusters and Ray Serve applications:** After using a Kubernetes config to define a Ray cluster and its Ray Serve applications, you can use `kubectl` to create the cluster and its applications.
* **In-place update for Ray Serve applications:** Users can update the Ray Serve config in the RayService CR config and use `kubectl apply` to update the applications.
* **Zero downtime upgrade for Ray clusters:** Users can update the Ray cluster config in the RayService CR config and use `kubectl apply` to update the cluster. RayService will temporarily create a pending cluster and wait for it to be ready, then switch traffic to the new cluster and terminate the old one.
* **Services HA:** RayService will monitor the Ray cluster and Serve deployments' health statuses. If RayService detects an unhealthy status for a period of time, RayService will try to create a new Ray cluster and switch traffic to the new cluster when it is ready.

### Deploy the Operator
# Example: Serve two simple Ray Serve applications using RayService

Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the nightly KubeRay operator via
Helm. Note that sample RayService in this guide uses `serveConfigV2` to specify a multi-application Serve config.
This will be first supported in Kuberay 0.6.0, and is currently supported only on the nightly KubeRay operator.
## Step 1: Create a Kubernetes cluster with Kind.
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved

Check that the controller is running.
```sh
kind create cluster --image=kindest/node:v1.23.0
```

```shell
$ kubectl get deployments -n ray-system
NAME READY UP-TO-DATE AVAILABLE AGE
ray-operator 1/1 1 1 40s
## Step 2: Install KubeRay operator
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved

$ kubectl get pods -n ray-system
NAME READY STATUS RESTARTS AGE
ray-operator-75dbbf8587-5lrvn 1/1 Running 0 31s
```
Follow [this document](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md) to install the nightly KubeRay operator via Helm.
Note that sample RayService in this guide uses `serveConfigV2` to specify a multi-application Serve config.
This will be first supported in Kuberay 0.6.0, and is currently supported only on the nightly KubeRay operator.

### Run an Example Cluster

In this guide we will be working with the sample RayService defined by [ray_v1alpha1_rayservice.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml).

Let's first take a look at the Ray Serve config embedded in the RayService yaml. At a high level, there are two applications: a fruit stand app and a calculator app. Some details about the fruit stand application:
* The fruit stand application is contained in the `deployment_graph` variable in `fruit.py` in the [test_dag](https://github.com/ray-project/test_dag/tree/41d09119cbdf8450599f993f51318e9e27c59098) repo, so `import_path` in the config points to this variable to tell Serve from where to import the application.
* It is hosted at the route prefix `/fruit`, meaning HTTP requests with routes that start with the prefix `/fruit` will be sent to the fruit stand application.
* The working directory points to the [test_dag](https://github.com/ray-project/test_dag/tree/41d09119cbdf8450599f993f51318e9e27c59098) repo, which will be downloaded at runtime, and your application will be started in this directory. See the [Runtime Environment Documentation](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments) for more details.
* For more details on configuring Ray Serve deployments, see the [Ray Serve Documentation](https://docs.ray.io/en/master/serve/configure-serve-deployment.html).

Similarly, the calculator app is imported from the `conditional_dag.py` file in the same repo, and it's hosted at the route prefix `/calc`.
```yaml
serveConfigV2: |
applications:
- name: fruit_app
import_path: fruit.deployment_graph
route_prefix: /fruit
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
deployments: ...
- name: math_app
import_path: conditional_dag.serve_dag
route_prefix: /calc
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
deployments: ...
```
## Step 3: Install a RayService

To start a RayService and deploy these two applications, run the following command:
```shell
$ kubectl apply -f config/samples/ray_v1alpha1_rayservice.yaml
```sh
# path: ray-operator/config/samples/
kubectl apply -f ray_v1alpha1_rayservice.yaml
```

List running RayServices to check on your new service `rayservice-sample`.
```shell
$ kubectl get rayservice
NAME AGE
rayservice-sample 7s
* Let's first take a look at the Ray Serve config (i.e. `serveConfigV2`) embedded in the RayService yaml. At a high level, there are two applications: a fruit stand app and a calculator app. Some details about the fruit stand application:
kevin85421 marked this conversation as resolved.
Show resolved Hide resolved
* The fruit stand application is contained in the `deployment_graph` variable in `fruit.py` in the [test_dag](https://github.com/ray-project/test_dag/tree/41d09119cbdf8450599f993f51318e9e27c59098) repo, so `import_path` in the config points to this variable to tell Serve from where to import the application.
* It is hosted at the route prefix `/fruit`, meaning HTTP requests with routes that start with the prefix `/fruit` will be sent to the fruit stand application.
* The working directory points to the [test_dag](https://github.com/ray-project/test_dag/tree/41d09119cbdf8450599f993f51318e9e27c59098) repo, which will be downloaded at runtime, and your application will be started in this directory. See the [Runtime Environment Documentation](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments) for more details.
* For more details on configuring Ray Serve deployments, see the [Ray Serve Documentation](https://docs.ray.io/en/master/serve/configure-serve-deployment.html).
* Similarly, the calculator app is imported from the `conditional_dag.py` file in the same repo, and it's hosted at the route prefix `/calc`.
```yaml
serveConfigV2: |
applications:
- name: fruit_app
import_path: fruit.deployment_graph
route_prefix: /fruit
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
deployments: ...
- name: math_app
import_path: conditional_dag.serve_dag
route_prefix: /calc
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/41d09119cbdf8450599f993f51318e9e27c59098.zip"
deployments: ...
```

## Step 4: Verify the Kubernetes cluster status

```sh
# Step 4.1: List all RayService custom resources in the `default` namespace.
kubectl get rayservice

# [Example output]
# NAME AGE
# rayservice-sample 2m42s

# Step 4.2: List all RayCluster custom resources in the `default` namespace.
kubectl get raycluster

# [Example output]
# NAME DESIRED WORKERS AVAILABLE WORKERS STATUS AGE
# rayservice-sample-raycluster-6mj28 1 1 ready 2m27s

# Step 4.3: List all Ray Pods in the `default` namespace.
kubectl get pods -l=ray.io/is-ray-node=yes

# [Example output]
# ervice-sample-raycluster-6mj28-worker-small-group-kg4v5 1/1 Running 0 3m52s
# rayservice-sample-raycluster-6mj28-head-x77h4 1/1 Running 0 3m52s

# Step 4.4: List services in the `default` namespace.
kubectl get services

# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# ...
# rayservice-sample-head-svc ClusterIP 10.96.34.90 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 4m58s
# rayservice-sample-raycluster-6mj28-head-svc ClusterIP 10.96.171.184 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 6m21s
# rayservice-sample-serve-svc ClusterIP 10.96.161.84 <none> 8000/TCP 4m58s
```

The created RayService should include a head pod, a worker pod, and four services.
```shell
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ervice-sample-raycluster-qd2vl-worker-small-group-bxpp6 1/1 Running 0 24m
rayservice-sample-raycluster-qd2vl-head-45hj4 1/1 Running 0 24m

$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 62d
# A head node service maintained by the RayService.
rayservice-sample-head-svc ClusterIP 10.100.34.24 <none> 6379/TCP,8265/TCP,10001/TCP,8000/TCP,52365/TCP 24m
# A dashboard agent service maintained by the RayCluster.
rayservice-sample-raycluster-qd2vl-dashboard-svc ClusterIP 10.100.109.177 <none> 52365/TCP 24m
# A head node service maintained by the RayCluster.
rayservice-sample-raycluster-qd2vl-head-svc ClusterIP 10.100.180.221 <none> 6379/TCP,8265/TCP,10001/TCP,8000/TCP,52365/TCP 24m
# A serve service maintained by the RayService.
rayservice-sample-serve-svc ClusterIP 10.100.39.92 <none> 8000/TCP 24m
```
KubeRay will create a RayCluster based on `spec.rayClusterConfig` defined in the RayService YAML for a RayService custom resource.
Next, after the head Pod is running and ready, KubeRay will submit a request to the head's dashboard agent port (default: 52365) to create the Ray Serve applications defined in `spec.serveConfigV2`.

After the Ray Serve applications are healthy and ready, KubeRay will create a head service and a serve service for the RayService custom resource (e.g., `rayservice-sample-head-svc` and `rayservice-sample-serve-svc` in Step 4.4).
Users can access the head Pod through both the head service managed by RayService (i.e. `rayservice-sample-head-svc`) and the head service managed by RayCluster (i.e. `rayservice-sample-raycluster-6mj28-head-svc`).
However, during a zero downtime upgrade, a new RayCluster will be created, and a new head service will be created for the new RayCluster.
If `rayservice-sample-head-svc` is not used, users will need to update their ingress configuration to point to the new head service.
However, if `rayservice-sample-head-svc` is used, KubeRay will automatically update the selector to point to the new head Pod, eliminating the need for users to update their ingress configuration.


> Note: Default ports and their definitions.

Expand All @@ -109,58 +121,67 @@ rayservice-sample-serve-svc ClusterIP 10.100.39.92
| 8000 | Ray Serve |
| 52365 | Ray Dashboard Agent |

Get information about the RayService using its name.
```shell
$ kubectl describe rayservices rayservice-sample
## Step 5: Verify the status of the Serve applications

```sh
# Step 5.1: Check the status of the RayService.
kubectl describe rayservices rayservice-sample

# Active Service Status:
# Application Statuses:
# fruit_app:
# Health Last Update Time: 2023-07-11T22:21:24Z
# Last Update Time: 2023-07-11T22:21:24Z
# Serve Deployment Statuses:
# fruit_app_DAGDriver:
# Health Last Update Time: 2023-07-11T22:21:24Z
# Last Update Time: 2023-07-11T22:21:24Z
# Status: HEALTHY
# fruit_app_FruitMarket:
# ...
# Status: RUNNING
# math_app:
# Health Last Update Time: 2023-07-11T22:21:24Z
# Last Update Time: 2023-07-11T22:21:24Z
# Serve Deployment Statuses:
# math_app_Adder:
# Health Last Update Time: 2023-07-11T22:21:24Z
# Last Update Time: 2023-07-11T22:21:24Z
# Status: HEALTHY
# math_app_DAGDriver:
# ...
# Status: RUNNING

# Step 5.2: Check the Serve applications in the Ray dashboard.
# (1) Forward the dashboard port to localhost.
# (2) Check the Serve page in the Ray dashboard at http://localhost:8265/#/serve.
kubectl port-forward svc/rayservice-sample-head-svc --address 0.0.0.0 8265:8265
```

### Access User Services
* Refer to [rayservice-troubleshooting.md](https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayservice-troubleshooting.md#observability) for more details on RayService observability.
Below is a screenshot example of the Serve page in the Ray dashboard.
![Ray Serve Dashboard](../images/dashboard_serve.png)

The users' traffic can go through the `serve` service (e.g. `rayservice-sample-serve-svc`).
## Step 6: Send requests to the Serve applications via the Kubernetes serve service

#### Run a Curl Pod
```sh
# Step 6.1: Run a curl Pod.
kubectl run curl --image=radial/busyboxplus:curl -i --tty

```shell
$ kubectl run curl --image=radial/busyboxplus:curl -i --tty
# Step 6.2: Send a request to the fruit stand app.
curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/fruit/ -d '["MANGO", 2]'
# [Expected output]: 6

# Step 6.3: Send a request to the calculator app.
curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/calc/ -d '["MUL", 3]'
# [Expected output]: "15 pizzas please!"
```

Or if you already have a curl pod running, you can login using `kubectl exec -it curl sh`.
* `rayservice-sample-serve-svc` is HA in general. It will do traffic routing among all the workers which have serve deployments and will always try to point to the healthy cluster, even during upgrading or failing cases.

You can query the two applications in your service at their respective endpoints. Query the fruit stand app with the route prefix `/fruit`:
```shell
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000/fruit/ -d '["MANGO", 2]'
> 6
```
Query the calculator app with the route prefix `/calc`:
```shell
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000/calc/ -d '["MUL", 3]'
> 15 pizzas please!
```
You should get the responses `6` and `15 pizzas please!`.

#### Use Port Forwarding
Set up Kubernetes port forwarding.
```shell
$ kubectl port-forward service/rayservice-sample-serve-svc 8000
```
For the fruit example deployment, you can try the following requests:
```shell
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' localhost:8000/fruit/ -d '["MANGO", 2]'
> 6
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' localhost:8000/calc/ -d '["MUL", 3]'
> 15 pizzas please!
```
> Note:
> `serve-svc` is HA in general. It will do traffic routing among all the workers which have serve deployments and will always try to point to the healthy cluster, even during upgrading or failing cases.
> You can set `serviceUnhealthySecondThreshold` to define the threshold of seconds that the serve deployments fail. You can also set `deploymentUnhealthySecondThreshold` to define the threshold of seconds that Ray fails to deploy any serve deployments.

### Access Ray Dashboard
Set up Kubernetes port forwarding for the dashboard.
```shell
$ kubectl port-forward service/rayservice-sample-head-svc 8265
```
Access the dashboard using a web browser at `localhost:8265`.

### Upgrade RayService Using In-Place Update

You can update the configurations for the applications by modifying `serveConfigV2` in the RayService config file. Re-applying the modified config with `kubectl apply` will re-apply the new configurations to the existing Serve cluster.
Expand Down