Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Copy-Edit RayService #607

Merged
merged 1 commit into from
Oct 6, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 69 additions & 72 deletions docs/guidance/rayservice.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,24 @@

### What is a RayService?

The RayService is a new custom resource (CR) supported by KubeRay in v0.3.0.
RayService is a new custom resource (CR) supported by KubeRay in v0.3.0.

A RayService manages 2 things:
* RayCluster: Manages resources in kubernetes cluster.
* Ray Serve Deployment Graph: Manages users' serve deployment graph.
* Ray Cluster: Manages resources in a Kubernetes cluster.
* Ray Serve Deployment Graph: Manages users' deployment graphs.

### What does the RayService provide?

* Kubernetes-native support for Ray cluster and Ray Serve deployment graphs. You can use a kubernetes config to define a ray cluster and its ray serve deployment graphs. Then you can use `kubectl` to create the cluster and its graphs.
* In-place update for ray serve deployment graph. Users can update the ray serve deployment graph config in the RayService CR config and use `kubectl apply` to update the serve deployment graph.
* Zero downtime upgrade for ray cluster. Users can update the ray cluster config in the RayService CR config and use `kubectl apply` to update the ray cluster. RayService will temporarily create a pending ray cluster, wait for the pending ray cluster ready, and then switch traffics to the new ray cluster, terminate the old cluster.
* Services HA. RayService will monitor the ray cluster and serve deployments health status. If RayService detects any unhealthy status lasting for a certain time, RayService will try to create a new ray cluster, and switch traffic to the new cluster when it is ready.
* **Kubernetes-native support for Ray clusters and Ray Serve deployment graphs.** After using a Kubernetes config to define a Ray cluster and its Ray Serve deployment graphs, you can use `kubectl` to create the cluster and its graphs.
* **In-place update for Ray Serve deployment graph.** Users can update the Ray Serve deployment graph config in the RayService CR config and use `kubectl apply` to update the deployment graph.
* **Zero downtime upgrade for Ray clusters.** Users can update the Ray cluster config in the RayService CR config and use `kubectl apply` to update the cluster. RayService will temporarily create a pending cluster and wait for it to be ready, then switch traffic to the new cluster and terminate the old one.
* **Services HA.** RayService will monitor the Ray cluster and Serve deployments' health statuses. If RayService detects an unhealthy status for a period of time, RayService will try to create a new Ray cluster and switch traffic to the new cluster when it is ready.

### Deploy the Operator

`$ kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v0.3.0&timeout=90s"`
```
$ kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v0.3.0&timeout=90s"
```

Check that the controller is running.

Expand All @@ -37,9 +39,9 @@ NAME READY STATUS RESTARTS AGE
ray-operator-75dbbf8587-5lrvn 1/1 Running 0 31s
```

### Run an example cluster
### Run an Example Cluster

There is one example config file to deploy RaySerive included here:
An example config file to deploy RayService is included here:
[ray_v1alpha1_rayservice.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml)

```shell
Expand Down Expand Up @@ -74,7 +76,7 @@ rayservice-sample-raycluster-qd2vl-head-svc ClusterIP 10.100.180.221
rayservice-sample-serve-svc ClusterIP 10.100.39.92 <none> 8000/TCP 24m
```

> Note: Default ports and their definition.
> Note: Default ports and their definitions.
| Port | Definition |
|-------|---------------------|
Expand All @@ -84,98 +86,93 @@ rayservice-sample-serve-svc ClusterIP 10.100.39.92
| 8000 | Ray Serve |
| 52365 | Ray Dashboard Agent |

Get the RayService information with your RayService name.
Get information about the RayService using its name.
```shell
$ kubectl describe rayservices rayservice-sample
$ kubectl describe rayservices rayservice-sample
```

### Access User Services

The users' traffic can go through the `serve` service (for example, `rayservice-sample-serve-svc`).
The users' traffic can go through the `serve` service (e.g. `rayservice-sample-serve-svc`).

#### Run a curl pod
`kubectl run curl --image=radial/busyboxplus:curl -i --tty`
Or if you already have a curl pod running, you can login with `kubectl exec -it curl sh`.
#### Run a Curl Pod

For the fruit example deployment, you can try the following request
```shell
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000 -d '["MANGO", 2]'
6
$ kubectl run curl --image=radial/busyboxplus:curl -i --tty
```
You can get the response as `6`.

Or if you already have a curl pod running, you can login using `kubectl exec -it curl sh`.

For the fruit example deployment, you can try the following request:
```shell
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000 -d '["MANGO", 2]'
> 6
```
You should get the response `6`.

#### Use Port Forwarding
Set up kubernetes port forwarding.
Set up Kubernetes port forwarding.
```shell
$ kubectl port-forward service/rayservice-sample-serve-svc 8000
```
For the fruit example deployment, you can try the following request
For the fruit example deployment, you can try the following request:
```shell
curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
6
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
> 6
```

`serve-svc` is HA in general.
* Note: serve-svc will do traffic routing among all the workers which have serve deployments.
* Note: serve-svc will always try it best to point to the healthy cluster, even during upgrading or failing cases.
* Note: You can set `serviceUnhealthySecondThreshold` to define the threshold of seconds that the serve deployments fail.
* Note: You can set `deploymentUnhealthySecondThreshold` to define the threshold of seconds that the Ray fails to deploy any serve deployments.
> Note:
> `serve-svc` is HA in general. It will do traffic routing among all the workers which have serve deployments and will always try to point to the healthy cluster, even during upgrading or failing cases.
> You can set `serviceUnhealthySecondThreshold` to define the threshold of seconds that the serve deployments fail. You can also set `deploymentUnhealthySecondThreshold` to define the threshold of seconds that Ray fails to deploy any serve deployments.
### Access Ray Dashboard
Set up kubernetes port forwarding for the dashboard.
Set up Kubernetes port forwarding for the dashboard.
```shell
$ kubectl port-forward service/rayservice-sample-head-svc 8265
```
Then you can open your web browser with the url localhost:8265 to see your Ray dashboard page.
Access the dashboard using a web browser at `localhost:8265`.

### Update Ray Serve Deployment Graph

You can update the `serveConfig` in your RayService config file.
For example, if you update the mango price to 4 in [ray_v1alpha1_rayservice.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml).
For example, update the price of mangos to `4` in [ray_v1alpha1_rayservice.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml):
```shell
- name: MangoStand
numReplicas: 1
userConfig: |
price: 4
- name: MangoStand
numReplicas: 1
userConfig: |
price: 4
```

Do a `kubectl apply` to update your RayService.

You can check the kubernetes stats of your RayService. It should show similar:
Use `kubectl apply` to update your RayService and `kubectl describe rayservices rayservice-sample` to take a look at the RayService's information. It should look similar to:
```shell
serveDeploymentStatuses:
- healthLastUpdateTime: "2022-07-18T21:51:37Z"
lastUpdateTime: "2022-07-18T21:51:41Z"
name: MangoStand
status: UPDATING
serveDeploymentStatuses:
- healthLastUpdateTime: "2022-07-18T21:51:37Z"
lastUpdateTime: "2022-07-18T21:51:41Z"
name: MangoStand
status: UPDATING
```

After it finishes deployment, let's send a request again.
After it finishes deployment, let's send a request again. In the curl pod from earlier, run:
```shell
# In the curl pod.
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000 -d '["MANGO", 2]'
8
[ root@curl:/ ]$ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc.default.svc.cluster.local:8000 -d '["MANGO", 2]'
> 8
```
Or
Or if using port forwarding:
```shell
# Using port forwarding.
curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
8
curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
> 8
```
Now you will get `8` as a result.
You should now get `8` as a result.

### Upgrade RayService RayCluster Config
You can update the `rayClusterConfig` in your RayService config file.
For example, you can increase the worker node num to 2.
For example, you can increase the number of workers to 2:
```shell
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 2
```

Do a `kubectl apply` to update your RayService.

You can check the kubernetes stats of your RayService. It should show similar:
Use `kubectl apply` to update your RayService and `kubectl describe rayservices rayservice-sample` to take a look at the RayService's information. It should look similar to:
```shell
pendingServiceStatus:
appStatus: {}
Expand All @@ -185,20 +182,16 @@ You can check the kubernetes stats of your RayService. It should show similar:
rayClusterName: rayservice-sample-raycluster-bshfr
rayClusterStatus: {}
```
You can see RayService is preparing a pending cluster. After the pending cluster is healthy, RayService will switch it as active cluster and terminate the previous cluster.
You can see the RayService is preparing a pending cluster. Once the pending cluster is healthy, the RayService will make it the active cluster and terminate the previous one.

### RayService Observability
You can use `kubectl logs` to check the operator logs or the head/worker nodes logs.
You can also use `kubectl describe rayservices rayservice-sample` to check the states and event logs of your RayService instance.

For ray serve monitoring, you can refer to the [Ray observability documentation](https://docs.ray.io/en/master/ray-observability/state/state-api.html).
To run Ray state APIs, you can log in to the head pod and use the Ray CLI.
`kubectl exec -it <head-node-pod> bash`
Or you can run the command locally:
`kubectl exec -it <head-node-pod> -- <ray state api>`
For example:
`kubectl exec -it <head-node-pod> -- ray summary tasks`
Output
For Ray Serve monitoring, you can refer to the [Ray observability documentation](https://docs.ray.io/en/master/ray-observability/state/state-api.html).
To run Ray state APIs, log in to the head pod by running `kubectl exec -it <head-node-pod> bash` and use the Ray CLI or you can run commands locally using `kubectl exec -it <head-node-pod> -- <ray state api>`.

For example, `kubectl exec -it <head-node-pod> -- ray summary tasks` outputs the following:
```shell
======== Tasks Summary: 2022-07-28 15:10:24.801670 ========
Stats:
Expand All @@ -221,9 +214,13 @@ Table (group by func_name):
7 ServeController.__init__ FINISHED: 1 ACTOR_CREATION_TASK
```

### Delete the RayService instance
`$ kubectl delete -f config/samples/ray_v1alpha1_rayservice.yaml`
### Delete the RayService Instance
```
$ kubectl delete -f config/samples/ray_v1alpha1_rayservice.yaml
```

### Delete the operator
### Delete the Operator

`$ kubectl delete -k "github.com/ray-project/kuberay/ray-operator/config/default"`
```
$ kubectl delete -k "github.com/ray-project/kuberay/ray-operator/config/default"
```