-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Add docs for Sky Serve #2794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
docs/source/examples/sky-serve.rst
Outdated
.. note:: | ||
|
||
The :code:`curl` command won't follow the redirect and print the content of the redirected page by default. Since we are using HTTP-redirect, you need to use :code:`curl -L <endpoint-url>`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a section here, e.g., "Example: Text Generation Inference (TGI)", which can consist of the two snippets in https://docs.google.com/document/d/1vVmzLF-EkG3Moj-q47DQBGvFipK4PNfkz0V6LyaPstE/edit#heading=h.gntyowdq9a18 or https://docs.google.com/document/d/1vVmzLF-EkG3Moj-q47DQBGvFipK4PNfkz0V6LyaPstE/edit#heading=h.gr15nxiws63p
The value is it's much shorter --> easier to adapt. It also quickly shows the idea of one endpoint being backed by multiple regions/clouds' replicas.
Can discuss whether to put it in the opening section of this page, like User Docs does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are adding this, do you think we still need the vicuna example? Not sure if it is a little bit redundant if we include TGI...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some redundancy is fine. The main motivation is to make the very first impression about real, useful AI serving. Currently it's HTTP server.
How about we add a "Quickstart: TGI service" section (or change it to vLLM/FastChat etc.), but keep the user doc's concise formatting -- 1 snippet showing YAML, 1 snippet showing service status, then add 1 snippet showing how to CURL it correctly. With minimal text throughout.
We also may want to replace some snippets (e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a chance to go thorugh the doc to learn how SkyServe works. Left some comments on what I found.
Co-authored-by: Doyoung Kim <[email protected]>
Co-authored-by: Doyoung Kim <[email protected]>
Co-authored-by: Doyoung Kim <[email protected]>
docs/source/examples/sky-serve.rst
Outdated
:width: 800 | ||
:align: center | ||
:alt: sky-serve-status-vicuna-ready | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a CURL based command like the user doc. Does something like this work
Then it’s ready to accept traffic!
$ curl -L Y.Y.Y.Y:8082/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
{"generated_text":"\n nobody knows"}
docs/source/examples/sky-serve.rst
Outdated
.. note:: | ||
|
||
The :code:`curl` command won't follow the redirect and print the content of the redirected page by default. Since we are using HTTP-redirect, you need to use :code:`curl -L <endpoint-url>`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some redundancy is fine. The main motivation is to make the very first impression about real, useful AI serving. Currently it's HTTP server.
How about we add a "Quickstart: TGI service" section (or change it to vLLM/FastChat etc.), but keep the user doc's concise formatting -- 1 snippet showing YAML, 1 snippet showing service status, then add 1 snippet showing how to CURL it correctly. With minimal text throughout.
Co-authored-by: Zongheng Yang <[email protected]>
Co-authored-by: Zongheng Yang <[email protected]>
* add doc * Rewording * refactor pic * apply suggestions from code review * Update docs/source/examples/sky-serve.rst Co-authored-by: Doyoung Kim <[email protected]> * Update docs/source/examples/sky-serve.rst Co-authored-by: Doyoung Kim <[email protected]> * Update docs/source/examples/sky-serve.rst Co-authored-by: Doyoung Kim <[email protected]> * upd * fix confusing required * add graph * image size * Apply suggestions from code review Co-authored-by: Zongheng Yang <[email protected]> * apply suggestions from code review * upd * Update docs/source/index.rst Co-authored-by: Zongheng Yang <[email protected]> * upd pic & output * fix * updates --------- Co-authored-by: Zongheng Yang <[email protected]> Co-authored-by: Doyoung Kim <[email protected]>
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh