Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing SkyServe #2458

Merged
merged 242 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 241 commits
Commits
Show all changes
242 commits
Select commit Hold shift + click to select a range
9a645a4
init
infwinston Jul 13, 2023
4bb4420
format
infwinston Jul 13, 2023
d52c2b9
format
infwinston Jul 13, 2023
208bdf4
reademe
infwinston Jul 14, 2023
1fa6401
update
infwinston Jul 14, 2023
c56ae22
[SkyServe] add http server example (#2260)
cblmemo Jul 18, 2023
f28ebbb
[SkyServe] `sky serve` CLI prototype (#2276)
cblmemo Jul 26, 2023
8fa4323
[SkyServe] Refactoring, Introducing multiprocess for provisioning and…
cblmemo Aug 4, 2023
05da33f
[SkyServe] Use SSH for Authentication, new replica status, `sky serve…
cblmemo Aug 13, 2023
c32e18d
merge to master
cblmemo Aug 13, 2023
d6bd068
[SkyServe] Final changes for v0 release (#2396)
cblmemo Aug 15, 2023
cd2b44c
Merge remote-tracking branch 'origin/master' into serve
cblmemo Aug 15, 2023
7968de9
fix serve example (#2406)
infwinston Aug 15, 2023
23e7529
surface debug msg (#2407)
infwinston Aug 15, 2023
57b3347
[SkyServe] Fix port failover (#2408)
cblmemo Aug 16, 2023
0a1a7de
[SkyServe] Introducing smoke test and fix bugs (#2411)
cblmemo Aug 18, 2023
f7f33e9
[SkyServe] Add cancel and gorilla example (#2417)
cblmemo Aug 20, 2023
8216e71
[SkyServe] Fix interrupt process group and format (#2449)
cblmemo Aug 24, 2023
1c9e23f
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Aug 24, 2023
d8aaf65
add authentication init
cblmemo Aug 24, 2023
ede0cad
typo fix & remove prehook
cblmemo Aug 24, 2023
4ace061
fix --no-follow in replica log and disable cancel log when skyserve down
cblmemo Aug 24, 2023
14ddf2f
set controller task resources for when controller failed to provision…
cblmemo Aug 25, 2023
8ce9c18
finish llm & interrupt test
cblmemo Aug 25, 2023
6fae8ac
early check cluster name is valid
cblmemo Aug 25, 2023
21c07cc
add hint message for tailing replica job status
cblmemo Aug 25, 2023
377789b
make dict thread safe
cblmemo Aug 25, 2023
b2f2388
upd doc
cblmemo Aug 25, 2023
7bfeec9
rename redirector to lb
cblmemo Aug 26, 2023
2d58994
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Aug 26, 2023
082c88b
rename sky serve controller prefix
cblmemo Aug 26, 2023
a4c8fc2
restore example
cblmemo Aug 26, 2023
63b2601
upd smoke test
cblmemo Aug 26, 2023
14621d3
use asyncio
cblmemo Aug 27, 2023
55d3503
change core with underlying function to avoid usage collection on status
cblmemo Aug 27, 2023
c9d2e28
reatore comma
cblmemo Aug 27, 2023
d236eec
add comment
cblmemo Aug 27, 2023
e3e698c
adopt comments in #2473
cblmemo Aug 28, 2023
b288ba0
Apply suggestions from code review
cblmemo Aug 31, 2023
7fccb8f
Apply suggestions from code review
cblmemo Aug 31, 2023
58a1122
fix example
infwinston Sep 4, 2023
7aef306
Fix serve probe (#2513)
infwinston Sep 5, 2023
d05b42f
[SkyServe] Add option to auto restart (#2518)
cblmemo Sep 6, 2023
5200fbd
[SkyServe] Fix auto restart (#2521)
cblmemo Sep 6, 2023
45e43c3
[SkyServe] Using Multi-service controller (#2489)
cblmemo Sep 9, 2023
fdd6332
Update sky/cli.py
cblmemo Sep 10, 2023
a93d9fa
Update sky/serve/examples/stable_diffusion_service.yaml
cblmemo Sep 10, 2023
87b2079
style for example
cblmemo Sep 10, 2023
3dc4dac
format & add comments & rephrase
cblmemo Sep 10, 2023
f585294
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Sep 10, 2023
9850d40
move serve_down to core and refactor
cblmemo Sep 10, 2023
2fb615d
Update sky/execution.py
cblmemo Sep 10, 2023
aa4cda0
apply suggestions from code review
cblmemo Sep 10, 2023
3db51ca
move examples
cblmemo Sep 10, 2023
788c497
Update sky/cli.py
cblmemo Sep 10, 2023
9bb4334
use _make_task_or_dag_from_entrypoint_with_overrides & minor
cblmemo Sep 10, 2023
bdef682
make sky serve status accept multiple service names
cblmemo Sep 10, 2023
96419ae
minor
cblmemo Sep 10, 2023
d0a3fb8
minor
cblmemo Sep 10, 2023
1eefc54
upd docstring
cblmemo Sep 10, 2023
457c7a2
fix
cblmemo Sep 10, 2023
9a2d27e
better programmatic api
cblmemo Sep 11, 2023
f4bdbdb
ux
cblmemo Sep 12, 2023
78d7342
use flag to control logging
cblmemo Sep 13, 2023
baf62f9
combine reserved prefix & name
cblmemo Sep 14, 2023
4c52014
fix
cblmemo Sep 14, 2023
755114f
expand user
cblmemo Sep 14, 2023
b547b19
better UX for auto restart
cblmemo Sep 14, 2023
f465e32
fix consecutive timeout threshold
cblmemo Sep 15, 2023
c773d70
minor
cblmemo Sep 15, 2023
4eb1d44
nnit
cblmemo Sep 16, 2023
d83f9a2
temp remove
cblmemo Sep 18, 2023
ac3244b
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Sep 18, 2023
dd261d5
add back
cblmemo Sep 18, 2023
6f6d8e7
only open ports used
cblmemo Sep 18, 2023
4037e7e
remove redundant task yaml for load balancer
cblmemo Sep 19, 2023
0578487
move task ports handling to python API
cblmemo Sep 20, 2023
a100433
fix controller generation bug
cblmemo Sep 26, 2023
8f49ff9
UX nits
cblmemo Sep 26, 2023
4dd0e8e
nit
cblmemo Sep 26, 2023
20dcd1b
Fix sky serve down --purge when storage cleanup failed
cblmemo Sep 27, 2023
672840c
ux
cblmemo Sep 27, 2023
a8bec0d
reuse service handle
cblmemo Sep 28, 2023
7bb01f6
revert
cblmemo Sep 28, 2023
9de419b
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Sep 28, 2023
fbd2873
add todo
cblmemo Sep 28, 2023
fbde4aa
[SkyServe] Add Ray Serve example (#2621)
iojw Sep 28, 2023
fa4981a
restore job id type
cblmemo Oct 3, 2023
f36cb42
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 3, 2023
5704d05
Apply suggestions from code review
cblmemo Oct 3, 2023
e47e005
lint
cblmemo Oct 3, 2023
3e81e8b
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 3, 2023
a86523c
move service section to the top
cblmemo Oct 3, 2023
27a29b4
add docstr
cblmemo Oct 3, 2023
ecc0640
make controller port not optinal
cblmemo Oct 3, 2023
9bf337b
remove cpu demand for gpu workloads
cblmemo Oct 3, 2023
b8ce4c4
make ReservedClusterGroup.get_group accept none arg
cblmemo Oct 3, 2023
01bea22
nit
cblmemo Oct 3, 2023
d27dab8
remove yaml_only and task_only in _make_task_or_dag_from_entrypoint_w…
cblmemo Oct 3, 2023
c00fcdf
cli nits
cblmemo Oct 3, 2023
2b43b83
remove get_glob_service_names
cblmemo Oct 3, 2023
65b9bb2
fix pop CPU
cblmemo Oct 4, 2023
28f613d
remove CPU demand for job when presented in CLI
cblmemo Oct 4, 2023
2b38cc6
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 5, 2023
e973442
remove cancel and use os.kill now
cblmemo Oct 5, 2023
1b31110
add db on controller VM, remove job id and use skylet to refresh serv…
cblmemo Oct 5, 2023
8799e09
minor
cblmemo Oct 5, 2023
849ba81
merge controllers with normal clusters
cblmemo Oct 5, 2023
e8160c7
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 5, 2023
4fa8aa0
deprecate controller port adn refresh in infra provider
cblmemo Oct 6, 2023
e0392c7
Apply suggestions from code review
cblmemo Oct 10, 2023
b9b0a78
use enum in serve logs + minor
cblmemo Oct 10, 2023
a4683bd
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 10, 2023
82d01ea
remove stop hint
cblmemo Oct 10, 2023
e2aa2a2
env vars & monir
cblmemo Oct 10, 2023
ef24cda
use cluster regex and remove service regex
cblmemo Oct 10, 2023
4870e73
remove pylint hint & rename app_port -> replica_port
cblmemo Oct 10, 2023
f1406cf
change constants used in _maybe_translate_local_file_mounts_and_sync_up
cblmemo Oct 10, 2023
686ebd9
add --endpoint and ux
cblmemo Oct 11, 2023
9ee2cf0
new termination of controller & lb; minor suggestions
cblmemo Oct 11, 2023
ec3877a
move statuses to serve_state, minors
cblmemo Oct 11, 2023
e97381e
minor
cblmemo Oct 11, 2023
e55958d
speedup terminate
cblmemo Oct 11, 2023
a952249
replica db mypy & pylint passed
cblmemo Oct 12, 2023
688bb20
fix
cblmemo Oct 12, 2023
09fd88e
fix scale down & cleanup failed replica
cblmemo Oct 12, 2023
f13e3ef
minor
cblmemo Oct 12, 2023
aeca540
move controller resources to config.yaml
cblmemo Oct 12, 2023
3155453
fix smoke test
cblmemo Oct 13, 2023
2afdfc3
refactor request information report
cblmemo Oct 13, 2023
1fd1db5
architecture: autoscaler no longer talk with infra provider again
cblmemo Oct 13, 2023
42e1f4a
fix auto restart
cblmemo Oct 13, 2023
3f81cdb
sync down logs and then streaming
cblmemo Oct 13, 2023
692d844
terminate log streaming when service is downed
cblmemo Oct 13, 2023
8d4f64d
refactor autoscaler & argument pass in controller
cblmemo Oct 13, 2023
673c747
move several var from local db
cblmemo Oct 13, 2023
003c655
use a control process to handle signal and terminate service
cblmemo Oct 15, 2023
8d15ef0
move port selection to the controller VM
cblmemo Oct 16, 2023
5668049
minor
cblmemo Oct 16, 2023
2984996
refactor dataabse
cblmemo Oct 16, 2023
cf721a2
comments, docs and function reorder for infra providers
cblmemo Oct 16, 2023
dfcc373
change launch/terminate replica to python API
cblmemo Oct 16, 2023
1fa207f
fix pass task in multiprocessing
cblmemo Oct 16, 2023
e2365e7
minor & fix smoke test
cblmemo Oct 16, 2023
1e7936d
rename infra_provider to replica_manager
cblmemo Oct 16, 2023
1aba238
ux
cblmemo Oct 16, 2023
6d646a6
not count as failure if UP for more than initial_delay_seconds
cblmemo Oct 17, 2023
32aa294
default auto restart to true
cblmemo Oct 18, 2023
fa4fdfa
add todo
cblmemo Oct 18, 2023
297f837
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 18, 2023
756a669
upd test for auto_restart
cblmemo Oct 18, 2023
ea992a0
Use only one controller and remove local database. TODO: change to SE…
cblmemo Oct 21, 2023
6d590b4
minor
cblmemo Oct 21, 2023
8961e59
Apply suggestions from code review
cblmemo Oct 21, 2023
3320919
apply suggestion from code review
cblmemo Oct 21, 2023
05a9f83
mske sky status showing service as well
cblmemo Oct 22, 2023
e2d103d
fix
cblmemo Oct 22, 2023
0ab0dec
replica manager ux; use sky logger for uvicorn
cblmemo Oct 22, 2023
4ee5676
UX, refactoring
cblmemo Oct 23, 2023
ce82675
rephrase hint after sky serve up
cblmemo Oct 23, 2023
a9e706a
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 23, 2023
ce283f1
Update sky/execution.py
cblmemo Oct 23, 2023
05aec69
comments
cblmemo Oct 23, 2023
bd835d4
add service name check before sky serve up
cblmemo Oct 23, 2023
e17d789
rename reserved cluster to controller
cblmemo Oct 24, 2023
743cabd
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 24, 2023
5e07d5b
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Oct 24, 2023
cbfa389
upd schema
cblmemo Oct 24, 2023
3bd6c8c
change to async function for fastapi
cblmemo Oct 27, 2023
f97bd5b
add multiple ports TODO
cblmemo Nov 1, 2023
8773d0c
fix outdated example
cblmemo Nov 2, 2023
2008590
[SkyServe] Serving with Spot (#2749)
MaoZiming Nov 2, 2023
ad40472
fix sky status pool wait
cblmemo Nov 3, 2023
b5557bb
fix sync down logs failed
cblmemo Nov 3, 2023
8414ea7
upd examples
cblmemo Nov 4, 2023
f6c3d70
add gorilla notebook
cblmemo Nov 4, 2023
d036560
add todo for customizable setup commands
cblmemo Nov 5, 2023
8327a2e
add launch log to streaming
cblmemo Nov 7, 2023
6f3f809
move comment position
cblmemo Nov 7, 2023
f52ebb0
catch error and print log
cblmemo Nov 7, 2023
8377b8a
align output
cblmemo Nov 7, 2023
114e751
ux
cblmemo Nov 7, 2023
30b36ba
fix storage cleanup failure
cblmemo Nov 7, 2023
74c83af
fix extra newline
cblmemo Nov 8, 2023
d18badb
comments
cblmemo Nov 8, 2023
c4b8b24
Apply suggestions from code review
cblmemo Nov 8, 2023
e241aff
rename
cblmemo Nov 8, 2023
6ba5c9f
Update sky/core.py
cblmemo Nov 8, 2023
8ffe0be
apply suggestion from code review
cblmemo Nov 8, 2023
bb0542c
apply suggestion from code review
cblmemo Nov 9, 2023
330b893
format
cblmemo Nov 9, 2023
2aa266b
move controller related functions/classes to controller_utils
cblmemo Nov 9, 2023
a521615
apply suggestion from code review
cblmemo Nov 9, 2023
191d2ff
Update sky/exceptions.py
cblmemo Nov 9, 2023
61774bc
Update sky/serve/replica_managers.py
cblmemo Nov 9, 2023
e38fc17
import
cblmemo Nov 9, 2023
01bbac7
Update sky/serve/autoscalers.py
cblmemo Nov 9, 2023
9a198b0
move max #sky.launch to replica manager and limit total # across serv…
cblmemo Nov 9, 2023
4ae8996
refactor autoscaler
cblmemo Nov 9, 2023
84a4120
pass json dict rather than pickle
cblmemo Nov 9, 2023
b6ef448
apply suggestion from code review
cblmemo Nov 9, 2023
d598ed4
Apply suggestions from code review
cblmemo Nov 9, 2023
5170225
apply suggestion from code review
cblmemo Nov 10, 2023
8cea486
bug fix & apply suggestion from code review
cblmemo Nov 10, 2023
8711f45
apply suggestions
cblmemo Nov 10, 2023
6181497
comments
cblmemo Nov 10, 2023
2c0f3c8
move port to resources
cblmemo Nov 10, 2023
f2b4c29
add cancel test
cblmemo Nov 10, 2023
b68a1fd
fix controller resources cloud not specified
cblmemo Nov 10, 2023
81a88d2
ux
cblmemo Nov 10, 2023
3ba4f99
ux for down
cblmemo Nov 10, 2023
f68242c
add smoke retry teardown
cblmemo Nov 10, 2023
c7183c7
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Nov 10, 2023
122de45
resolve conflict
cblmemo Nov 10, 2023
33bd321
add endpoint, move check for name conflict to _execute
cblmemo Nov 10, 2023
4ff9975
apply suggestion from code review
cblmemo Nov 11, 2023
7cff543
fix jinja2 var
cblmemo Nov 11, 2023
aff8250
rename reserved cluster, move controller_utils function back
cblmemo Nov 11, 2023
869eb75
stream logs
cblmemo Nov 11, 2023
9e5c69e
fix not showing controller
cblmemo Nov 11, 2023
d14b5cf
move controllers back to controller_utils
cblmemo Nov 12, 2023
e0049bf
fix wrong controller resources when controller is exist
cblmemo Nov 12, 2023
f2eaabc
use return value to indicate success
cblmemo Nov 12, 2023
86f6d5f
default controller resources & better error handling
cblmemo Nov 12, 2023
9b812d2
stress test passed
cblmemo Nov 12, 2023
78ac155
nits
cblmemo Nov 13, 2023
d3ed24a
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Nov 13, 2023
7a25afa
fix examples
cblmemo Nov 13, 2023
0a2d4bd
Merge remote-tracking branch 'origin/master' into serve-dev
cblmemo Nov 14, 2023
8352aff
refactor: moving some funcs in execution.py to controller_utils
cblmemo Nov 14, 2023
0c8b734
smoke test passed
cblmemo Nov 14, 2023
9c67cd0
remove --target & minor
cblmemo Nov 14, 2023
fa20443
teardown failed services with --purge flag
cblmemo Nov 14, 2023
7793833
move core api to sky/serve/api.py
cblmemo Nov 14, 2023
e87042a
resolve controller_utils circular import
cblmemo Nov 14, 2023
a276a8f
fix spot config
cblmemo Nov 14, 2023
6889119
minor ux
cblmemo Nov 14, 2023
85a1bb0
add todo for default argument for sky serve logs
cblmemo Nov 15, 2023
bcc05ba
resolve circular import.
cblmemo Nov 15, 2023
e3f1f33
fix all circular import
cblmemo Nov 15, 2023
e91b60e
minor
cblmemo Nov 15, 2023
a60be5a
apply suggestion from code review
cblmemo Nov 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/source/images/sky-serve-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 49 additions & 0 deletions examples/serve/gorilla/gorilla.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# SkyServe YAML to run gorilla LLM.
#
# Usage:
# sky serve up -n gorilla examples/serve/gorilla/gorilla.yaml
# Then go to the examples/serve/gorilla/use_gorilla.ipynb
# and follow the instructions there.
# The endpoint will be printed in the console. You
# could also check the endpoint by running:
# sky serve status --endpoint gorilla

service:
readiness_probe:
path: /v1/models
initial_delay_seconds: 1800
replicas: 2

resources:
ports: 8087
accelerators: A100:1
disk_size: 1024
disk_tier: high

setup: |
conda activate chatbot
if [ $? -ne 0 ]; then
conda create -n chatbot python=3.9 -y
conda activate chatbot
fi

# Install dependencies
pip install fschat[model_worker,webui]==0.2.24
pip install protobuf einops

run: |
conda activate chatbot

echo 'Starting controller...'
python -u -m fastchat.serve.controller > ~/controller.log 2>&1 &
sleep 10
echo 'Starting model worker...'
python -u -m fastchat.serve.model_worker \
--model-path gorilla-llm/gorilla-falcon-7b-hf-v0 2>&1 \
| tee model_worker.log &

echo 'Waiting for model worker to start...'
while ! `cat model_worker.log | grep -q 'Uvicorn running on'`; do sleep 1; done

echo 'Starting openai api server...'
python -u -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8087 | tee ~/openai_api_server.log
91 changes: 91 additions & 0 deletions examples/serve/gorilla/use_gorilla.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SkyServe Gorilla Playground\n",
"\n",
"Welcome! Here is the sky serve gorilla playground. You can use this notebook to test your gorilla service.\n",
"\n",
"This notebook is borrowed from [gorilla's colab](https://colab.research.google.com/drive/1DEBPsccVLF_aUnmD0FwPeHFrtdC0QIUP?usp=sharing).\n",
"\n",
"To use this notebook, run `sky serve up examples/serve/gorilla/gorilla.yaml` first and paste the endpoint below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sky_serve_endpoint = '' # Enter your sky serve endpoint here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then run the cell below to test your service!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install openai &> /dev/null\n",
"import openai\n",
"import urllib.parse\n",
"\n",
"openai.api_key = \"EMPTY\" # Key is ignored and does not matter\n",
"openai.api_base = f\"http://{sky_serve_endpoint}/v1\"\n",
"\n",
"# Report issues\n",
"def raise_issue(e, model, prompt):\n",
" issue_title = urllib.parse.quote(\"[bug] Hosted Gorilla: <Issue>\")\n",
" issue_body = urllib.parse.quote(f\"Exception: {e}\\nFailed model: {model}, for prompt: {prompt}\")\n",
" issue_url = f\"https://github.com/ShishirPatil/gorilla/issues/new?assignees=&labels=hosted-gorilla&projects=&template=hosted-gorilla-.md&title={issue_title}&body={issue_body}\"\n",
" print(f\"An exception has occurred: {e} \\nPlease raise an issue here: {issue_url}\")\n",
"\n",
"# Query Gorilla server\n",
"def get_gorilla_response(prompt=\"I would like to translate from English to French.\", model=\"gorilla-falcon-7b-hf-v0\"):\n",
" try:\n",
" completion = openai.ChatCompletion.create(\n",
" model=model,\n",
" messages=[{\"role\": \"user\", \"content\": prompt}]\n",
" )\n",
" return completion.choices[0].message.content\n",
" except Exception as e:\n",
" raise_issue(e, model, prompt)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Gorilla `gorilla-mpt-7b-hf-v1` with code snippets\n",
"# Translation\n",
"prompt = \"I would like to translate 'I feel very good today.' from English to Chinese.\"\n",
"print(get_gorilla_response(prompt, model=\"gorilla-falcon-7b-hf-v0\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Gorilla `gorilla-7b-hf-v1` with code snippets\n",
"# Object Detection\n",
"prompt = \"I want to build a robot that can detecting objects in an image ‘cat.jpeg’. Input: [‘cat.jpeg’]\"\n",
"print(get_gorilla_response(prompt, model=\"gorilla-falcon-7b-hf-v0\"))"
]
}
],
"nbformat": 4,
"nbformat_minor": 2
}
36 changes: 36 additions & 0 deletions examples/serve/http_server/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import argparse
import http.server
import socketserver


class MyHttpRequestHandler(http.server.SimpleHTTPRequestHandler):

def do_GET(self):
# Return 200 for all paths
# Therefore, readiness_probe will return 200 at path '/health'
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
html = """
<html>
<head>
<title>SkyPilot Test Page</title>
</head>
<body>
<h1>Hi, SkyPilot here!</h1>
</body>
</html>
"""
self.wfile.write(bytes(html, 'utf8'))
return


if __name__ == '__main__':
parser = argparse.ArgumentParser(description='SkyServe HTTP Test Server')
parser.add_argument('--port', type=int, required=False, default=8081)
args = parser.parse_args()

Handler = MyHttpRequestHandler
with socketserver.TCPServer(('', args.port), Handler) as httpd:
print('serving at port', args.port)
httpd.serve_forever()
21 changes: 21 additions & 0 deletions examples/serve/http_server/task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# SkyServe YAML to run a simple http server.
#
# Usage:
# sky serve up -n http examples/serve/http_server/task.yaml
# The endpoint will be printed in the console. You
# could also check the endpoint by running:
# sky serve status --endpoint http

service:
readiness_probe:
path: /health
initial_delay_seconds: 20
replicas: 2

resources:
ports: 8081
cpus: 2+

workdir: examples/serve/http_server

run: python3 server.py
43 changes: 43 additions & 0 deletions examples/serve/llama2/chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import json

import openai
import requests

stream = True
model = 'Llama-2-7b-chat-hf'
init_prompt = 'You are a helpful assistant.'
history = [{'role': 'system', 'content': init_prompt}]
endpoint = input('Endpoint: ')
url = f'http://{endpoint}/v1/chat/completions'
openai.api_base = f'http://{endpoint}/v1'
openai.api_key = 'placeholder'

try:
while True:
user_input = input('[User] ')
history.append({'role': 'user', 'content': user_input})
if stream:
resp = openai.ChatCompletion.create(model=model,
messages=history,
stream=True)
print('[Chatbot]', end='', flush=True)
tot = ''
for i in resp:
dlt = i['choices'][0]['delta']
if 'content' not in dlt:
continue
print(dlt['content'], end='', flush=True)
tot += dlt['content']
print()
history.append({'role': 'assistant', 'content': tot})
else:
resp = requests.post(url,
data=json.dumps({
'model': model,
'messages': history
}))
msg = resp.json()['choices'][0]['message']
print('[Chatbot]' + msg['content'])
history.append(msg)
except KeyboardInterrupt:
print('\nBye!')
57 changes: 57 additions & 0 deletions examples/serve/llama2/llama2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# SkyServe YAML to run Llama2 LLM.
#
# Usage: replace the <your-huggingface-token> with
# your huggingface token, and run:
# sky serve up -n llama2 examples/serve/llama2/llama2.yaml
cblmemo marked this conversation as resolved.
Show resolved Hide resolved
# Then run the following command to interact with
# the model:
# python3 examples/serve/llama2/chat.py
# The endpoint will be printed in the console. You
# could also check the endpoint by running:
# sky serve status --endpoint llama2

# TODO(tian): Change usage to `HF_TOKEN=<your-token> sky serve up -n llama2 examples/serve/llama2/llama2.yaml --env HF_TOKEN` once we have `--env` enabled.

service:
readiness_probe: /v1/models
replicas: 2

resources:
ports: 8087
memory: 32+
accelerators: T4:1
disk_size: 1024
disk_tier: high

envs:
MODEL_SIZE: 7
HF_TOKEN: <your-huggingface-token> # TODO: Replace with huggingface token

setup: |
conda activate chatbot
if [ $? -ne 0 ]; then
conda create -n chatbot python=3.9 -y
conda activate chatbot
fi

# Install dependencies
pip install "fschat[model_worker,webui]==0.2.24"
python -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"

run: |
conda activate chatbot

echo 'Starting controller...'
python -u -m fastchat.serve.controller --host 0.0.0.0 > ~/controller.log 2>&1 &
sleep 10
echo 'Starting model worker...'
python -u -m fastchat.serve.model_worker --host 0.0.0.0 \
--model-path meta-llama/Llama-2-${MODEL_SIZE}b-chat-hf \
--num-gpus $SKYPILOT_NUM_GPUS_PER_NODE 2>&1 \
| tee model_worker.log &

echo 'Waiting for model worker to start...'
while ! `cat model_worker.log | grep -q 'Uvicorn running on'`; do sleep 1; done

echo 'Starting openai api server...'
python -u -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8087 | tee ~/openai_api_server.log
39 changes: 39 additions & 0 deletions examples/serve/misc/cancel/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# SkyServe cancel example
cblmemo marked this conversation as resolved.
Show resolved Hide resolved

This example demonstrates the redirect support canceling a request.

## Running the example

Under skypilot root directory, run the following command:

```bash
sky serve up examples/serve/misc/cancel/service.yaml -n skyserve-cancel-test
```

Use `sky serve status` to monitor the status of the service. When its ready, run

```bash
sky serve logs skyserve-cancel-test 1
```

to monitor the logs of the service. Run

```bash
python3 examples/serve/misc/cancel/send_cancel_request.py
```

and enter the endpoint output by `sky serve status`. You should see the following output:

```bash
Computing... step 0
Computing... step 1
Client disconnected, stopping computation.
```

You can also run

```bash
curl -L http://<endpoint>/
```

and manually Ctrl + C to cancel the request and see logs.
36 changes: 36 additions & 0 deletions examples/serve/misc/cancel/send_cancel_request.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import asyncio

import aiohttp

redirector_endpoint = input('Enter redirector endpoint: ')


async def fetch(session, url):
try:
async with session.get(url) as response:
print('Got response!')
return await response.text()
except asyncio.CancelledError:
print('Request was cancelled!')
raise


async def main():
timeout = 2

async with aiohttp.ClientSession() as session:
task = asyncio.create_task(
fetch(session, f'http://{redirector_endpoint}/'))

await asyncio.sleep(timeout)
# We manually cancel requests for test purposes.
# You could also manually Ctrl + C a curl to cancel a request.
task.cancel()

try:
await task
except asyncio.CancelledError:
print('Main function caught the cancelled exception.')


asyncio.run(main())
Loading