-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] [Doc] Add a tip for retry mechanism in scale-to-zero #3232
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we explain why we need a retry mechanism?
We mentioned "wait until the replicas are provisioned and ready"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tip in the doc @MaoZiming! Left a question of the autoscaler behavior. : )
docs/source/serving/autoscaling.rst
Outdated
|
||
.. tip:: | ||
|
||
If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready. | |
If the scale-to-zero is set, the clients that access the endpoint should make sure to have a retry mechanism to be able to wait until the replicas are provisioned and ready, i.e., starting a new replica when there is zero replica available. |
Question: how does our autoscaler handle the case for retrying? If the client keeps retrying for failed requests, will those failed requests be considered for calculating the target_num_replicas
? Need to think about the case, when zero replica is available and the client keeps retrying, causing the number of requests in the window become significantly large. Will it cause the autoscaler to scale up the service to max_replicas
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The failed requests will contribute to calculating the target_num_replicas
.
It might cause the autoscaler to scale up to max_replicas
. As long as the retry frequency is not too big we should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we avoid the request to make target_num_replicas
to be larger than 1 when there is no ready replicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think maybe we should not - SkyServe as it is now cannot differentiate between retry requests and actual user requests. When there is no ready replica but accumulated requests, it is possible that there is a big increase in user requests and we might want to scale to more replicas (even though some of them might be retries) to be safe.
Co-authored-by: Zhanghao Wu <[email protected]>
#3194 (comment)
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh