Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SkyServe] Update Documentation #3022

Merged
merged 11 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ Documentation
serving/sky-serve
serving/service-yaml-spec
serving/autoscaling
serving/update

.. toctree::
:maxdepth: 1
Expand Down
132 changes: 132 additions & 0 deletions docs/source/serving/update.rst
MaoZiming marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
.. _serve-update:

Update Your Service
===========

SkyServe supports update for your services. Use ``sky serve update`` to update an existing service:

.. code-block:: console

$ sky serve update service-name new_service.yaml

SkyServe will launch new replicas described by ``new_service.yaml``. When the number of new replicas reaches the minimum number of replicas (``min_replicas``) required for the service, SkyServe will scale down old replicas to save cost. During the whole process, the service is still accessible to users. SkyServe allows users to update ``replica_policy`` parameters, such as ``target_qps_per_replica``. SkyServe also allows users to update ``resources`` parameters, such as ``cpu`` and ``memory``, so that new replicas can be launched on VMs of different types.

For example, suppose we have a running service hosting Llama 2 model with the following resource configuration:

.. code-block:: yaml

resources:
memory: 32+
accelerators: T4

SkyServe supports updating an existing service to a new resource configuration, such as:

.. code-block:: yaml

resources:
memory: 128+
accelerators: A100

SkyServe does not mix traffic from old and new replicas and will not send traffic to new replicas until ``min_replicas`` new replicas are ready to serve user requests. Before that, SkyServe will only send traffic to the old replicas. The SkyServe endpoint will remain the same during the update process and will remain accessible with no downtime.

.. tip::

:code:`sky serve status` will highlight the latest service version and each replica's version.

Example
===========

We first launch an HTTP service:

.. code-block:: console

$ sky serve up examples/serve/http_server/task.yaml -n http-server

We can use :code:`sky serve status http-server` to check the status of the service:

.. code-block:: console

$ sky serve status http-server

Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 1 1m 41s READY 2/2 44.206.240.249:30002

Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 2 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 2 1 52.87.241.103 2 mins ago 1x AWS(vCPU=2) READY us-east-1
MaoZiming marked this conversation as resolved.
Show resolved Hide resolved

Service ``http-server`` has an initial version of 1. Suppose we want to update the service to use 4 CPUs instead of 2, we can update the task yaml ``examples/serve/http_server/task.yaml``, by changing the ``cpu`` parameter from 2 to 4. We can then use :code:`sky serve update` to update the service:
MaoZiming marked this conversation as resolved.
Show resolved Hide resolved

That is, we update the cpus field of the YAML file from 2 to 4:

.. code-block:: yaml
:emphasize-lines: 8

# examples/serve/http_server/task.yaml
service:
readiness_probe: /
replicas: 1

resources:
ports: 8081
cpus: 4+

workdir: .

run: python3 server.py


.. code-block:: console

$ sky serve update http-server examples/serve/http_server/task.yaml

SkyServe will first launch two new replicas with 4 CPUs. When the number of new replicas reaches the ``min_replicas`` (i.e., 2) required for the service, SkyServe will scale down old replicas to save cost. The service's version is updated from 1 to 2. The replicas with ID 3 and 4 are the new replicas with 4 CPUs. The replicas with ID 1 and 2 are the old replicas with 2 CPUs. When the new replicas are still provisioning, SkyServe will only send traffic to the old replicas.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give a workaround for quota issue? e.g. update to 0 replicas first and the update to the desired number of replicas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather complicated. I don't think we should mention them on the doc -- rather keeping a todo to look into in the future.


.. code-block:: console

$ sky serve status http-server

Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 6m 15s READY 2/4 44.206.240.249:30002

Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 6 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 2 1 52.87.241.103 6 mins ago 1x AWS(vCPU=2) READY us-east-1
http-server 3 2 - 21 secs ago 1x AWS(vCPU=4) PROVISIONING us-east-1
http-server 4 2 - 21 secs ago 1x AWS(vCPU=4) PROVISIONING us-east-1

The old replicas will be scaled down when the new replicas are ready. At this point, SkyServe will start sending traffic to the new replicas.

.. code-block:: console

$ sky serve status http-server

Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 10m 4s READY 2/4 44.206.240.249:30002

Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 1 1 54.173.203.169 10 mins ago 1x AWS(vCPU=2) SHUTTING_DOWN us-east-1
http-server 2 1 52.87.241.103 10 mins ago 1x AWS(vCPU=2) SHUTTING_DOWN us-east-1
http-server 3 2 3.93.241.163 1 min ago 1x AWS(vCPU=4) READY us-east-1
http-server 4 2 18.206.226.82 1 min ago 1x AWS(vCPU=4) READY us-east-1

Eventually, we will only have new replicas ready to serve user requests.

.. code-block:: console

$ sky serve status http-server

Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
http-server 2 11m 42s READY 2/2 44.206.240.249:30002

Service Replicas
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
http-server 3 2 3.93.241.163 3 mins ago 1x AWS(vCPU=4) READY us-east-1
http-server 4 2 18.206.226.82 3 mins ago 1x AWS(vCPU=4) READY us-east-1
Loading