Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add docs for Sky Serve #2794

Merged
merged 19 commits into from
Dec 11, 2023
Merged
Binary file modified docs/source/images/sky-serve-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/sky-serve-status-full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/sky-serve-status-tgi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,13 @@ Documentation
reference/kubernetes/index
running-jobs/index

.. toctree::
:maxdepth: 1
:caption: SkyServe: Model Serving

serving/sky-serve
serving/service-yaml-spec

.. toctree::
:maxdepth: 1
:caption: Cutting Cloud Costs
Expand Down
18 changes: 18 additions & 0 deletions docs/source/reference/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,24 @@ Job Queue CLI
:prog: sky cancel
:nested: full

Sky Serve CLI
-------------

.. click:: sky.cli:serve_up
:prog: sky serve up
:nested: full

.. click:: sky.cli:serve_down
:prog: sky serve down
:nested: full

.. click:: sky.cli:serve_status
:prog: sky serve status
:nested: full

.. click:: sky.cli:serve_logs
:prog: sky serve logs
:nested: full

Managed Spot Jobs CLI
---------------------------
Expand Down
76 changes: 76 additions & 0 deletions docs/source/serving/service-yaml-spec.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.. _service-yaml-spec:

Service YAML Specification
==========================

SkyServe provides an intuitive YAML interface to specify a service. It is highly similar to the :ref:`SkyPilot task YAML <yaml-spec>`: with an additional service section in your original task YAML, you could change it to a service YAML.

Available fields:


.. code-block:: yaml

# Additional section to turn your skypilot task.yaml to a service
service:

# Readiness probe (required). This describe how SkyServe determine your
# service is ready for accepting traffic. If the readiness probe get a 200,
# SkyServe will start routing traffic to your service.
readiness_probe:
# Path to probe (required).
path: /v1/models
# Post data (optional). If this is specified, the readiness probe will use
# POST instead of GET, and the post data will be sent as the request body.
post_data: {'model_name': 'model'}
# Initial delay in seconds (optional). Defaults to 1200 seconds (20 minutes).
# Any readiness probe failures during this period will be ignored. This is
# highly related to your service, so it is recommended to set this value
# based on your service's startup time.
initial_delay_seconds: 1200

# We have a simplified version of readiness probe that only contains the
# readiness probe path. If you want to use GET method for readiness probe
# and the default initial delay, you can use the following syntax:
readiness_probe: /v1/models

# One of the two following fields (replica_policy or replicas) is required.

# Replica autoscaling policy. This describes how SkyServe autoscales
# your service based on the QPS (queries per second) of your service.
replica_policy:
# Minimum number of replicas (required).
min_replicas: 1
# Maximum number of replicas (optional). If not specified, SkyServe will
# use fixed number of replicas same as min_replicas and ignore any QPS
# threshold specified below.
max_replicas: 3
# Following thresholds describe when to scale up or down.
# QPS threshold for scaling up (optional). If the QPS of your service
# exceeds this threshold, SkyServe will scale up your service by one
# replica. If not specified, SkyServe will **NOT** scale up your service.
qps_upper_threshold: 10
# QPS threshold for scaling down (optional). If the QPS of your service
# is below this threshold, SkyServe will scale down your service by one
# replica. If not specified, SkyServe will **NOT** scale down your service.
qps_lower_threshold: 2

# Also, for convenience, we have a simplified version of replica policy that
# use fixed number of replicas. Just use the following syntax:
replicas: 2

# Controller resources (optional). This describe the resources to use for
# the controller. Default to a 4+ vCPU instance with 100GB disk.
controller_resources:
cloud: aws
region: us-east-1
instance_type: p3.2xlarge
disk_size: 256

resources:
# Port to run your service (required). This port will be automatically exposed
# by SkyServe. You can access your service at http://<endpoint-ip>:<port>.
ports: 8080
# Other resources config...

# Then comes your SkyPilot task YAML...

Loading