[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

MaoZiming · 2024-02-26T03:24:14Z

#3194 (comment)

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

cblmemo

Could we explain why we need a retry mechanism?

MaoZiming · 2024-02-26T16:28:57Z

We mentioned "wait until the replicas are provisioned and ready"?

Michaelvll

Thanks for adding the tip in the doc @MaoZiming! Left a question of the autoscaler behavior. : )

Michaelvll · 2024-02-27T00:00:26Z

docs/source/serving/autoscaling.rst

+
+.. tip::
+
+    If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready.


Suggested change

If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready.

If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready, i.e., starting a new replica when there is zero replica available.

Question: how does our autoscaler handle the case for retrying? If the client keeps retrying for failed requests, will those failed requests be considered for calculating the target_num_replicas? Need to think about the case, when zero replica is available and the client keeps retrying, causing the number of requests in the window become significantly large. Will it cause the autoscaler to scale up the service to max_replicas?

Yes. The failed requests will contribute to calculating the target_num_replicas.
It might cause the autoscaler to scale up to max_replicas. As long as the retry frequency is not too big we should be fine

Should we avoid the request to make target_num_replicas to be larger than 1 when there is no ready replicas?

I still think maybe we should not - SkyServe as it is now cannot differentiate between retry requests and actual user requests. When there is no ready replica but accumulated requests, it is possible that there is a big increase in user requests and we might want to scale to more replicas (even though some of them might be retries) to be safe.

Co-authored-by: Zhanghao Wu <[email protected]>

autoscaling doc

6d0fc2d

MaoZiming requested review from cblmemo and Michaelvll February 26, 2024 03:33

minor wording

a8144de

MaoZiming mentioned this pull request Feb 26, 2024

[SkyServe] Support mixture of spot and on-demand #3194

Merged

5 tasks

cblmemo reviewed Feb 26, 2024

View reviewed changes

Michaelvll reviewed Feb 27, 2024

View reviewed changes

Update docs/source/serving/autoscaling.rst

e854a6e

Co-authored-by: Zhanghao Wu <[email protected]>

Michaelvll approved these changes Feb 27, 2024

View reviewed changes

MaoZiming merged commit 7484d98 into master Feb 27, 2024
19 checks passed

MaoZiming deleted the serve-scale-to-zero-doc branch February 27, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

MaoZiming commented Feb 26, 2024

cblmemo left a comment

MaoZiming commented Feb 26, 2024

Michaelvll left a comment

Michaelvll Feb 27, 2024

MaoZiming Feb 27, 2024

Michaelvll Feb 27, 2024

MaoZiming Feb 27, 2024


		.. tip::

		If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready.

[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232

Conversation

MaoZiming commented Feb 26, 2024

cblmemo left a comment

Choose a reason for hiding this comment

MaoZiming commented Feb 26, 2024

Michaelvll left a comment

Choose a reason for hiding this comment

Michaelvll Feb 27, 2024

Choose a reason for hiding this comment

MaoZiming Feb 27, 2024

Choose a reason for hiding this comment

Michaelvll Feb 27, 2024

Choose a reason for hiding this comment

MaoZiming Feb 27, 2024

Choose a reason for hiding this comment